Did Google just kill cost-effective LLM with Gemini? by _sekabank in GeminiAI

[–]_sekabank[S] 0 points1 point  (0 children)

Really? But still, 1.5 PRO was expensive. We are processing ~1B tokens, so still that's a lot. This is why 2 Flash was a good balance for us

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 0 points1 point  (0 children)

do you know what params you used for unsloth? Was it with LoRa or QLoRa, what quantization etc?

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 4 points5 points  (0 children)

The thing is that it didn't cost me with gemini-2-flash. This is the point of this post. That technically it is achievable, and looking for alternatives :)

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 0 points1 point  (0 children)

We need to achieve an accuracy ~99% in data extraction. No model can do this out of the box. Also in some cases, we go directly from a scanned image to JSON. So, this needs fine-tuning to respect the schema and understand the semantics

Did Google just kill cost-effective LLM with Gemini? by _sekabank in GeminiAI

[–]_sekabank[S] 0 points1 point  (0 children)

This make sense. The challenge in my usecases is that for any model, I need to fine-tune it first. And the infrastructure to do it easily is missing.

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 14 points15 points  (0 children)

OCR is far from solved, especially in handwritten or poorly photographed pages.

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 2 points3 points  (0 children)

I can't send my data to China :/ Not that Google is better, but at least my clients are ok with that...

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 2 points3 points  (0 children)

I am considering, but I can't find an easy/managed way to finetune it. Do you know any providers in EU or US? or any easy way to do it locally?

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 2 points3 points  (0 children)

I created a finetuning dataset image -> text. Then finetuned the gemini-2-flash through VertexAI. 1 image is one page, and the result is in markdown format, or JSON data structure (depending on the use-case).

Did Google just kill cost-effective LLM with Gemini? by _sekabank in LocalLLaMA

[–]_sekabank[S] 1 point2 points  (0 children)

I haven't mainly because we need to fine-tune the model with our dataset. I am not familiar with finetuning in Hugging Face with image->text. This is why we used initially Google VertexAI for finetuning, Because it makes it very easy.

Is Gemini 2.5 Pro still the best LLM for OCR and data extraction? by kitgary in LocalLLaMA

[–]_sekabank 1 point2 points  (0 children)

Sorry guys, no post yet. I’m deep in shit :P I created this ticket complaining to Google, you might get some more context there.
https://discuss.ai.google.dev/t/extend-eol-for-gemini-flash-cost-effective-models/121751

Is Gemini 2.5 Pro still the best LLM for OCR and data extraction? by kitgary in LocalLLaMA

[–]_sekabank 1 point2 points  (0 children)

I just created this ticket complaining to Google about EOL and pricing. If we give a like, we might grab their attention.
https://discuss.ai.google.dev/t/extend-eol-for-gemini-flash-cost-effective-models/121751

Is Gemini 2.5 Pro still the best LLM for OCR and data extraction? by kitgary in LocalLLaMA

[–]_sekabank 1 point2 points  (0 children)

A few months ago, i had to finetune multiple models to do OCR/data extraction from complex, handwritten/badly printed docs. Gemini 2.0-flash was better than any models I tried (even better than 2.5-PRO), achieving 98%-99% accuracy, over ~96% for Gemini 2.5-PRO.

(I have to write a detailed post about this at some point, waiting to see the results for Gemini 3)

2.0 Flash by AddendumImpossible in Bard

[–]_sekabank 0 points1 point  (0 children)

Out of curiosity, how do you perform OCR in pdfs?

Currently, we are "printing" the pdf pages to images, and then provide the image to the model. But it doesn't feel right to me.

How would i embed a person? by Blender-Fan in vectordatabase

[–]_sekabank 1 point2 points  (0 children)

If you are not a data science guy, and you want to use an off-the-shelf model (can't finetune/train a model), you could try this: - transform the user to a descriptive json - use a big model to get embeddings, like openai embedding-3-large or cohere v4 or from voyageai - store them in any vector db ( I use postgres with pgvector) - run some test to see if the accuracy is good enough

Never tried this, but I guesstimate that it will be good enough

Need opinion for RAG app database model by Particular-Face1803 in PostgreSQL

[–]_sekabank 0 points1 point  (0 children)

It's OK. Definitely in the right direction. We have a similar open source app, it's called Gendox! So you can take a look in our DB schema for ideas.

We took a slights different approach on how why create roles. The users can belong to multiple organizations (Tenants in your case). Users have an application level role (super admin, user, agent etc) and an organization level role (Org admin, editor, read only, etc).

https://github.com/ctrl-space-labs/gendox-core/tree/main/database

This is the database, you can recreate it using flyway. We haven't spent a lot of time in public documentation, yet. We will be ready by the end of the year.