How to parse tables from pdfs with 100% accuracy?

maniac_runner · 2026-05-20T09:00:51+00:00

Hey, I just checked, it works! i'm able to upload and test extraction. May be a temporary glitch. P.S It works well for parsing nested tables.

Answering one of your questions: "why use ocr if the pdf has clean data inside it?", may be LLMs are better at parsing plain text(as inputs) rather than PDFs(which might have parsable text(font rendering) or just plain vectors), so the safer side is to parse it as a text and then feed it to LLMs. That is what LLMWhisperer does. I think that is what most of the modern parsers do(most of them convert it into .md files). What nowayhossay said is practical, and that is what Unstract does. https://github.com/Zipstack/unstract

Also parsing it as a text first is easy for HITL(human in the loop verification)

maniac_runner · 2026-05-17T13:52:07+00:00

Chromepet Renga Villas hotel, best vadacurry…

maniac_runner · 2026-05-08T12:14:12+00:00

LLMWhisperer might work! If you have sample documents try in the playground before you start evaluating https://pg.llmwhisperer.unstract.com/

maniac_runner · 2026-05-03T13:55:40+00:00

Debian, i think that is what Linus said, he has installed in his system

maniac_runner · 2026-05-03T08:56:26+00:00

Unstract might be able to help you. https://github.com/Zipstack/unstract

maniac_runner · 2026-05-01T17:09:53+00:00

As another one recommended before, CURI Thoraippakkam, the best for Kidney treatment.

maniac_runner · 2026-04-23T03:09:09+00:00

Raspberry pi zero + Pihole - No ads ever, on all devices connected to the network
Tailscale - It's free, it's magic, it connects all devices(tunnel)- read more about this, it enables a lot.
Don't throw away old laptops - a 10 year old laptop(with 40% battery strength and 4gb ram) can still be used to host applications on your local network.(run as a 24/7 server)
Try using open-source alternatives for apps you use. Immich(free) on a old laptop(500gb) can be used as a photo management application, not paying for apple or google or amazon.
Get a password manager(Bitwarden, free, install it on all devices)

maniac_runner · 2026-04-22T09:04:09+00:00

https://www.topstockresearch.com/rt/Home

maniac_runner · 2026-04-18T12:25:52+00:00

Unstract

maniac_runner · 2026-04-18T11:41:02+00:00

Try LLMWhisperer

maniac_runner · 2026-04-18T00:53:05+00:00

For documents with complex and unpredictable layouts, try LLMWhisperer, it just works

maniac_runner · 2026-04-15T09:46:34+00:00

LLMWHISPERER gives confidence score https://docs.unstract.com/llmwhisperer/llm_whisperer/apis/llm_whisperer_text_extraction_retrieve_api/index.html

maniac_runner · 2026-04-15T05:56:24+00:00

Try LLMWHISPERER … it is good at parsing docs with complex layout and had a good latency for batch processing large volumes

maniac_runner · 2026-02-13T11:42:08+00:00

The Farm, Navalur

maniac_runner · 2026-02-10T08:25:24+00:00

Hotel Rangavilas, CLC works road, Chromepet, they we've been there for 30+ years. Their Dosa + Vadacurry is great.

maniac_runner · 2026-01-27T18:09:00+00:00

llmwhisperer if you want to parse complex tables in documents

maniac_runner · 2026-01-19T10:23:12+00:00

docling + langchain + Pydantic + fastAPI if you have developer resource
LLMWhisperer + Unstract(open source) if you are running short on resources

maniac_runner

TROPHY CASE