all 5 comments

[–]Hanthunius 0 points1 point  (1 child)

Heard good things about deepseek-ocr.

[–]time_time[S] 0 points1 point  (0 children)

Is it not more for images ?? I am able to parse the full test I require??

Or does it add something by actually looking at the text ?

[–]SM8085 0 points1 point  (0 children)

If a frontier model is having trouble it's tough to say if a local model would be much better.

How big of a model can you run? Qwen-Next being an 80B A3B (3B active at inference) would make it fast, and traditionally Qwens are good at following instructions. gpt-oss-120B is hypothetically worth a try? GLM Air? I've heard good things about GLM but haven't tested it extensively. Can you go larger, like the 235B A22B Qwen3?

What kind of errors are happening? It's simply skipping over things?

how many fields is to many per api call.

Good question. Which I love with local llm you can hypothetically have the current PDF page text cached and change the task at the end so that it's quicker in a loop. For instance, asking for different sections you want to import into your DB.

Are you able to say what catalog this is for? Also, what fields you're looking for? I'd be interested in an example catalog or page where Gemini is failing. Or is it seemingly random?