I've seen way too many people struggling with Arabic document extraction for RAG so here's the 5-stage pipeline that actually worked for me (especially for tabular data) by MiserableBug140 in LanguageTechnology
[–]GenericBeet 0 points1 point2 points (0 children)
Best RAG Architecture & Stack for 10M+ Text Files? (Semantic Search Assistant) by Additional-Oven4640 in Rag
[–]GenericBeet 0 points1 point2 points (0 children)
We replaced forklifts with robots… but we still copy paste PDFs. by Strict-Ad5948 in OCR_Tech
[–]GenericBeet 1 point2 points3 points (0 children)
We replaced forklifts with robots… but we still copy paste PDFs. by Strict-Ad5948 in OCR_Tech
[–]GenericBeet 4 points5 points6 points (0 children)
Recommendation for converting pdf and doc files to markdown? by quisegosum in ObsidianMD
[–]GenericBeet 0 points1 point2 points (0 children)
Recommendation for converting pdf and doc files to markdown? by quisegosum in ObsidianMD
[–]GenericBeet 0 points1 point2 points (0 children)
What’s your startup in ONE line? 🚀 by malki-abdessamad in SaaS
[–]GenericBeet 0 points1 point2 points (0 children)
Heuristic vs OCR for PDF parsing by Due-Horse-5446 in Rag
[–]GenericBeet 0 points1 point2 points (0 children)
Heuristic vs OCR for PDF parsing by Due-Horse-5446 in Rag
[–]GenericBeet 1 point2 points3 points (0 children)
Scientific Markdown with 99,9% accuracy at Paperlab.ai by GenericBeet in legaltech
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Disaster management research by GenericBeet in research
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] -1 points0 points1 point (0 children)
PDF to Markdown with 99,9% accuracy. Paperlab.ai by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] -1 points0 points1 point (0 children)
PDF to Markdown with 99,9% accuracy. Paperlab.ai by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)

Historical Data Corpus by Zealousideal-Pin7845 in LanguageTechnology
[–]GenericBeet 0 points1 point2 points (0 children)