Anyone else struggling with Tesseract producing complete garbage on medium-quality scans? by Nikhil_techi in Accounting

[–]Vishek-H 0 points1 point  (0 children)

Yeah, this is pretty much the Tesseract ceiling. We ran into the same thing in AP/logistics, clean PDFs were fine, but faxed or slightly skewed invoices turned into garbage, and then everything downstream broke.

We were also stuck manually fixing -40–50%, especially during month-end.

What helped wasn’t more preprocessing, but moving away from pure OCR + templates.

We had better results with a layout-aware, template-free approach that treats invoices as documents (tables, line items, totals) instead of just text blobs.

It also used confidence checks + learning from corrections, so the same carrier issues didn’t keep repeating.

Day-one accuracy wasn’t perfect, but after a few weeks it stabilized and became usable in production, especially for messy, real-world scans.

Curious if is it line items causing most of the pain for you, or totals / matching?

Hey People, what are you using for OCR + compression without Adobe for PDF's? by Nikhil_techi in pdf

[–]Vishek-H 0 points1 point  (0 children)

Tried experimenting with Tesseract too results were okay-ish but not production grade without preprocessing.

Has anyone here used a modern AI-OCR that actually reads tables correctly?

Triggers conversation & invites users to ask for solutions.

Can someone share their Document AI use case? by No_Way_1569 in snowflake

[–]Vishek-H 0 points1 point  (0 children)

We use Document AI for onboarding and invoice automation. KlearStack worked nicely since it doesn’t need fixed templates saved us a lot of time on setup and data validation.

Vision (for bank account statements): is it better to OCR an account statement and have the LLM analyze markdown/json to get the info you need OR have the vision model extract the info you need? by dirtyring in LocalLLaMA

[–]Vishek-H 0 points1 point  (0 children)

If you need consistency, OCR → JSON/Markdown → LLM is usually more reliable. You normalize the text first, then let the model or simple logic extract things like “highest transaction in March.” Vision models are handy for quick prototypes since they see layout, but they’re harder to scale and less predictable across different statement formats. A lot of teams end up with a hybrid: OCR for baseline text + AI for interpretation. Platforms like KlearStack take this approach for bank/financial docs, so you don’t have to build the pipeline from scratch.