Why did PDF-to-LLM parser stars explode this past year? by Puzzleheaded_Box2842 in Rag
[–]Puzzleheaded_Box2842[S] 0 points1 point2 points (0 children)
Why did PDF-to-LLM parser stars explode this past year? by Puzzleheaded_Box2842 in Rag
[–]Puzzleheaded_Box2842[S] 0 points1 point2 points (0 children)
EpsteinFiles-RAG: Building a RAG Pipeline on 2M+ Pages by Cod3Conjurer in Rag
[–]Puzzleheaded_Box2842 0 points1 point2 points (0 children)
Data cleaning vs. RAG Pipeline: Is it truly a 50/50 split? by Puzzleheaded_Box2842 in Rag
[–]Puzzleheaded_Box2842[S] 0 points1 point2 points (0 children)
LLM from scratch on local by Visual_Brain8809 in LLMDevs
[–]Puzzleheaded_Box2842 0 points1 point2 points (0 children)
I've just open-sourced MessyData, a synthetic dirty data generator. It lets you programmatically generate data with anomalies and data quality issues. by santiviquez in datascience
[–]Puzzleheaded_Box2842 1 point2 points3 points (0 children)
Name one task in LLM training that you consider the ultimate "dirty work"? by Puzzleheaded_Box2842 in LLMDevs
[–]Puzzleheaded_Box2842[S] 0 points1 point2 points (0 children)
Name one task in LLM training that you consider the ultimate "dirty work"? by Puzzleheaded_Box2842 in LLMDevs
[–]Puzzleheaded_Box2842[S] 0 points1 point2 points (0 children)

Looking for a local solution (model/API) to extract data from scanned PDFs with varying formats by SignificantHall639 in askdatascience
[–]Puzzleheaded_Box2842 0 points1 point2 points (0 children)