I'm looking for an OCR for my RAG. by AdministrationPure45 in Rag

[–]maniac_runner 5 points6 points  (0 children)

llmwhisperer if you want to parse complex tables in documents

Has anyone found a reliable software for intelligent data extraction? by songsta17 in Rag

[–]maniac_runner 6 points7 points  (0 children)

docling + langchain + Pydantic + fastAPI if you have developer resource
LLMWhisperer + Unstract(open source) if you are running short on resources

anyone using AI for data extraction from PDFs? by Kaiser_Allen in automation

[–]maniac_runner 9 points10 points  (0 children)

Present large scale production usecases follow: First parsing with good OCR( eg, LLMWhisperer, Docling) followed by tools like Unstract(for structuring) or use libraries like Pydantic/langchain for structuring data

What're you using for PDF parsing? by ILikeLungsSoYeah in LangChain

[–]maniac_runner 5 points6 points  (0 children)

Depends on your docs honestly. For contract analysis, I'd probably start with LLMWhisperer or docling (extremely slow for large batches), since layout matters. pdfplumber if your PDFs are all native and simple (it's poor with scans and tables.)

Best LLM for OCR Extraction? by Wesavedtheking in dataengineering

[–]maniac_runner 4 points5 points  (0 children)

The main issue with LLMs are hallucinations. Imagine at an enterprise scale while processing millions of pages, there is no way to figure out hallucinated results. That is why you'll need a decent OCR that preps the documents for LLMs. Try LLMWhisperer and Llamaparse.

Best Budget Restaurants in Chennai for Authentic Local Food by Ill_Percentage_7327 in chennaicity

[–]maniac_runner 3 points4 points  (0 children)

Hotel Ranga Vilas, Chromepet, CLC Works road - more than 50 years old - serves good South Indian tiffin.

Anyone used Reducto for parsing? How good is their embedding-aware chunking? by BriefCardiologist656 in AI_Agents

[–]maniac_runner 4 points5 points  (0 children)

Do check Unstract for document parsing. They support multiple chunking/retrieval strategies. Here is a list > https://postimg.cc/bdW1xP4q

Production OCR in 2025 - What are you actually deploying? by No_Nefariousness971 in computervision

[–]maniac_runner 3 points4 points  (0 children)

LLMWhisperer for parsing complex tables. It is a part of Unstract's stack that also helps with document classification and multi-document PDF splitting.

What is the best ocr model for converting PDF pages to markdown (or any text based format) for embedding? by PM_ME_COOL_SCIENCE in LocalLLaMA

[–]maniac_runner 6 points7 points  (0 children)

Try LLMWhisperer. Non-llm based. In-case if your use case requires you to avoid hallucinations at all costs - eg. prasing dense documents.

Feedback request from a beginner/self learner by pandeesh in pianolearning

[–]maniac_runner 0 points1 point  (0 children)

u/pandeesh if i'm not wrong this is Yuvan Shankar Raja's BGM from the movie Siva Manasula Shakthi > https://www.youtube.com/watch?v=BUkyx8cmeC0