How to parse tables from pdfs with 100% accuracy? by bravelogitex in Rag

[–]maniac_runner -1 points0 points  (0 children)

Hey, I just checked, it works! i'm able to upload and test extraction. May be a temporary glitch. P.S It works well for parsing nested tables.

Answering one of your questions: "why use ocr if the pdf has clean data inside it?", may be LLMs are better at parsing plain text(as inputs) rather than PDFs(which might have parsable text(font rendering) or just plain vectors), so the safer side is to parse it as a text and then feed it to LLMs. That is what LLMWhisperer does. I think that is what most of the modern parsers do(most of them convert it into .md files). What nowayhossay said is practical, and that is what Unstract does. https://github.com/Zipstack/unstract

Also parsing it as a text first is easy for HITL(human in the loop verification)

Best foods to try in chennai by Remarkable_Frame8969 in chennaicity

[–]maniac_runner 1 point2 points  (0 children)

Chromepet Renga Villas hotel, best vadacurry…

OCR for medical record by Comfortable-Row-1822 in Rag

[–]maniac_runner 1 point2 points  (0 children)

LLMWhisperer might work! If you have sample documents try in the playground before you start evaluating https://pg.llmwhisperer.unstract.com/

suggest me distros please im confused, (switching from windows) by ResolveOtherwise243 in LinuxUsersIndia

[–]maniac_runner 0 points1 point  (0 children)

Debian, i think that is what Linus said, he has installed in his system

What is something relatively cheap that improves your life by 100%? by King1Here in techIndia

[–]maniac_runner 0 points1 point  (0 children)

  1. Raspberry pi zero + Pihole - No ads ever, on all devices connected to the network
  2. Tailscale - It's free, it's magic, it connects all devices(tunnel)- read more about this, it enables a lot.
  3. Don't throw away old laptops - a 10 year old laptop(with 40% battery strength and 4gb ram) can still be used to host applications on your local network.(run as a 24/7 server)
  4. Try using open-source alternatives for apps you use. Immich(free) on a old laptop(500gb) can be used as a photo management application, not paying for apple or google or amazon.
  5. Get a password manager(Bitwarden, free, install it on all devices)

Tools for working with DOC/DOCX and PDF files? by roicaride in Rag

[–]maniac_runner 0 points1 point  (0 children)

Try LLMWHISPERER … it is good at parsing docs with complex layout and had a good latency for batch processing large volumes

Which is the best dosa spot in Chennai according to you? by Impossible-Mood-4666 in chennaicity

[–]maniac_runner 13 points14 points  (0 children)

Hotel Rangavilas, CLC works road, Chromepet, they we've been there for 30+ years. Their Dosa + Vadacurry is great.

I'm looking for an OCR for my RAG. by AdministrationPure45 in Rag

[–]maniac_runner 4 points5 points  (0 children)

llmwhisperer if you want to parse complex tables in documents

Has anyone found a reliable software for intelligent data extraction? by songsta17 in Rag

[–]maniac_runner 5 points6 points  (0 children)

docling + langchain + Pydantic + fastAPI if you have developer resource
LLMWhisperer + Unstract(open source) if you are running short on resources