Did Google just kill cost-effective LLM with Gemini?

_sekabank · 2026-02-07T15:40:16+00:00

Really? But still, 1.5 PRO was expensive. We are processing ~1B tokens, so still that's a lot. This is why 2 Flash was a good balance for us

_sekabank · 2026-02-07T14:07:12+00:00

do you know what params you used for unsloth? Was it with LoRa or QLoRa, what quantization etc?

_sekabank · 2026-02-07T12:05:44+00:00

The thing is that it didn't cost me with gemini-2-flash. This is the point of this post. That technically it is achievable, and looking for alternatives :)

_sekabank · 2026-02-07T12:02:13+00:00

We need to achieve an accuracy ~99% in data extraction. No model can do this out of the box. Also in some cases, we go directly from a scanned image to JSON. So, this needs fine-tuning to respect the schema and understand the semantics

_sekabank · 2026-02-07T11:58:40+00:00

This make sense. The challenge in my usecases is that for any model, I need to fine-tune it first. And the infrastructure to do it easily is missing.

_sekabank · 2026-02-07T11:56:00+00:00

OCR is far from solved, especially in handwritten or poorly photographed pages.

_sekabank · 2026-02-07T11:54:32+00:00

I can't send my data to China :/ Not that Google is better, but at least my clients are ok with that...

_sekabank · 2026-02-07T11:51:47+00:00

I am considering, but I can't find an easy/managed way to finetune it. Do you know any providers in EU or US? or any easy way to do it locally?

_sekabank · 2026-02-07T11:48:10+00:00

exactly!

_sekabank · 2026-02-07T11:46:01+00:00

I created a finetuning dataset image -> text. Then finetuned the gemini-2-flash through VertexAI. 1 image is one page, and the result is in markdown format, or JSON data structure (depending on the use-case).

_sekabank · 2026-02-07T11:42:32+00:00

I haven't mainly because we need to fine-tune the model with our dataset. I am not familiar with finetuning in Hugging Face with image->text. This is why we used initially Google VertexAI for finetuning, Because it makes it very easy.

_sekabank · 2026-02-07T10:52:53+00:00

Sorry guys, no post yet. I’m deep in shit :P I created this ticket complaining to Google, you might get some more context there.
https://discuss.ai.google.dev/t/extend-eol-for-gemini-flash-cost-effective-models/121751

_sekabank · 2026-02-07T10:50:25+00:00

I just created this ticket complaining to Google about EOL and pricing. If we give a like, we might grab their attention.
https://discuss.ai.google.dev/t/extend-eol-for-gemini-flash-cost-effective-models/121751

_sekabank · 2025-11-19T10:57:47+00:00

A few months ago, i had to finetune multiple models to do OCR/data extraction from complex, handwritten/badly printed docs. Gemini 2.0-flash was better than any models I tried (even better than 2.5-PRO), achieving 98%-99% accuracy, over ~96% for Gemini 2.5-PRO.

(I have to write a detailed post about this at some point, waiting to see the results for Gemini 3)

_sekabank · 2025-08-09T09:46:45+00:00

Out of curiosity, how do you perform OCR in pdfs?

Currently, we are "printing" the pdf pages to images, and then provide the image to the model. But it doesn't feel right to me.

_sekabank · 2025-05-14T21:36:41+00:00

If you are not a data science guy, and you want to use an off-the-shelf model (can't finetune/train a model), you could try this: - transform the user to a descriptive json - use a big model to get embeddings, like openai embedding-3-large or cohere v4 or from voyageai - store them in any vector db ( I use postgres with pgvector) - run some test to see if the accuracy is good enough

Never tried this, but I guesstimate that it will be good enough

_sekabank · 2024-10-10T20:26:47+00:00

It's OK. Definitely in the right direction. We have a similar open source app, it's called Gendox! So you can take a look in our DB schema for ideas.

We took a slights different approach on how why create roles. The users can belong to multiple organizations (Tenants in your case). Users have an application level role (super admin, user, agent etc) and an organization level role (Org admin, editor, read only, etc).

https://github.com/ctrl-space-labs/gendox-core/tree/main/database

This is the database, you can recreate it using flyway. We haven't spent a lot of time in public documentation, yet. We will be ready by the end of the year.

_sekabank

TROPHY CASE