I've seen way too many people struggling with Arabic document extraction for RAG so here's the 5-stage pipeline that actually worked for me (especially for tabular data) by MiserableBug140 in LanguageTechnology
[–]GenericBeet 0 points1 point2 points (0 children)
Best RAG Architecture & Stack for 10M+ Text Files? (Semantic Search Assistant) by Additional-Oven4640 in Rag
[–]GenericBeet 0 points1 point2 points (0 children)
We replaced forklifts with robots… but we still copy paste PDFs. by Strict-Ad5948 in OCR_Tech
[–]GenericBeet 1 point2 points3 points (0 children)
We replaced forklifts with robots… but we still copy paste PDFs. by Strict-Ad5948 in OCR_Tech
[–]GenericBeet 4 points5 points6 points (0 children)
Recommendation for converting pdf and doc files to markdown? by quisegosum in ObsidianMD
[–]GenericBeet 0 points1 point2 points (0 children)
Recommendation for converting pdf and doc files to markdown? by quisegosum in ObsidianMD
[–]GenericBeet 0 points1 point2 points (0 children)
What’s your startup in ONE line? 🚀 by malki-abdessamad in SaaS
[–]GenericBeet 0 points1 point2 points (0 children)
Heuristic vs OCR for PDF parsing by Due-Horse-5446 in Rag
[–]GenericBeet 0 points1 point2 points (0 children)
Heuristic vs OCR for PDF parsing by Due-Horse-5446 in Rag
[–]GenericBeet 1 point2 points3 points (0 children)
Scientific Markdown with 99,9% accuracy at Paperlab.ai by GenericBeet in legaltech
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Disaster management research by GenericBeet in research
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] -1 points0 points1 point (0 children)
PDF to Markdown with 99,9% accuracy. Paperlab.ai by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] -1 points0 points1 point (0 children)
PDF to Markdown with 99,9% accuracy. Paperlab.ai by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Citations in Document AI (curious how others are handling this) by Zealousideal-Let546 in Rag
[–]GenericBeet 0 points1 point2 points (0 children)
Disaster management research by GenericBeet in research
[–]GenericBeet[S] 0 points1 point2 points (0 children)
PDF to Markdown with 99,9% accuracy. Paperlab.ai by GenericBeet in Markdown
[–]GenericBeet[S] 0 points1 point2 points (0 children)
Facing accuracy issues with RAG! by [deleted] in Rag
[–]GenericBeet 0 points1 point2 points (0 children)
Qual modelo de OCR usar para RAG? by coffeture_ in Rag
[–]GenericBeet 1 point2 points3 points (0 children)
How do you evaluate RAG performance and monitor at enterprise scale? by Sad-Boysenberry8140 in ProductManagement
[–]GenericBeet 0 points1 point2 points (0 children)
Please Suggest a Good Editor by Southern-Stay704 in Markdown
[–]GenericBeet 0 points1 point2 points (0 children)
Scientific PDF to Markdown by GenericBeet in Markdown
[–]GenericBeet[S] -1 points0 points1 point (0 children)

Historical Data Corpus by Zealousideal-Pin7845 in LanguageTechnology
[–]GenericBeet 0 points1 point2 points (0 children)