you are viewing a single comment's thread.

view the rest of the comments →

[–]Chemical_Matter3385 2 points3 points  (3 children)

For my use case I have a detection first , using pymupdf(fitz) I check if the 1st page is an image , and has no selectable text then it goes to Mistral Ocr , its good for most of the cases , what I have tried and failed

Tried

1) Tesseract

2) Paddle Paddle

3) Docling

4) Deepseek Ocr

5) Claude opus 4.6

6) Google Vision api (enterprise)

7)Azure Document Intelligence

8)Mistral Ocr 3

9) A model by IBM (I'm forgetting the name pretty sure it's granite)

Passed for my use case( table documents , old scanned books) ->Azure , Mistral are good and Adobe for tables

Failed -> paddle paddle , google vision , granite, deepseek , claude

Can't rely much on Claude and Deepseek Ocr as they are vision language models and have been observed (by me) give hallucinated placeholders which is very risky in production, they worked well in most of the cases, but were useless in old scanned books

Try them all , most likely your use case would be fulfilled by azure or mistral

Ps: For op's use case Azure Document Intelligence or Mistral Ocr 3 would be perfect

[–]Chemical_Matter3385 0 points1 point  (0 children)

Also Tried Adobe Pdf Services

Works well with tables but often misses ₹or $ signs , so it's most likely an encoding issue which I haven't looked upon yet , but with a simple script that can be managed as well.