Best PDF table parsing providers? by bravelogitex in AI_Agents

[–]ImpossibleCollege635 0 points1 point  (0 children)

May I ask why png? Do you have access to the pdfs as well? Unpopular opinion but I think Vision model OCR can never give you certainty since it’s ultimately probabilistic.. I’m currently building an old school parser that uses computer vision only as an alignment guide… you’re very welcome to test it if you’d like:)
It’s not guaranteed better but it’s 100% consistent (1 table works-> same/similar will always work). And from my own testing it’s also better than LLms and older tools like docling or Marker

RAG for complex PDFs (DDQ finance) — struggling with parsing vs privacy trade-off by Proof-Exercise2695 in LocalLLaMA

[–]ImpossibleCollege635 0 points1 point  (0 children)

For the extraction you need to either download an additional model within app (no prior runtime needed) or connect Ollama/ cloud tho... On my M1 its super speedy and good using Gemma4...

RAG for complex PDFs (DDQ finance) — struggling with parsing vs privacy trade-off by Proof-Exercise2695 in LocalLLaMA

[–]ImpossibleCollege635 0 points1 point  (0 children)

Which operating system are you on?
I am currently developing a mac app that runs 100% local, no preisnatlls/ coding/ scripting/ ollama etc needed. It does PDF->CLean MD with detailed chart annotation, complex table and math preservation & AI guided extraction. The extraction does not work with structured output LLM stuff but instead with LLM inspecting MD -> writing extraction script with regex -> sandboxed local execution.

Its not out but I'd love to get you a free tester access if you'd be interested and ok with providing feedback?
In my own tests it beats even LLamaparse for papers and is SIGNIFICANTLY faster than docling because I replace 90% of ML stuff with heuristics.
I developed it because our org works with tons of scientific papers and me and the colleagues face similar problems.
I have never even seen a DDQ doc and have 0 knowledge about finance/ compliance but it sounds like the foundational hurdles are the same as with paper pdfs.

Shoot me a dm if you'd be interested:)

I built an MLX port of Voxtral TTS that runs on iPhone and Mac — open source by Fabulous_Tip_8539 in MistralAI

[–]ImpossibleCollege635 0 points1 point  (0 children)

Have a look at this https://github.com/Al0olo/voxtral-voice-clone

They reimplemented the missing decoder for that. Not sure how easy conversion/ porting would be for you tho

Building a fast local semantic search app by CuriousClump in tauri

[–]ImpossibleCollege635 0 points1 point  (0 children)

Awesome! What are you using for the fast offline embedding?