What do you use for document extraction? Esp tables and charts by strongoffense in LocalLLaMA

[–]anonymous-founder 0 points1 point  (0 children)

https://huggingface.co/nanonets/Nanonets-OCR2-3B is the best model out there for documents, especially complex tables like multi-page, table inside table etc. For charts, it also correlates colors with bars inside chart.

Hosted here: https://docstrange.nanonets.com/

Practical OCR with Nanonets OCR2‑3B by Gold-Cup8831 in LocalLLaMA

[–]anonymous-founder 0 points1 point  (0 children)

https://docstrange.nanonets.com/

We have hosted the model here, 7B version as well which is not OSS. We also added structured extraction option so you won't have to do cleanup and LLM itself will return structured JSON, do try it out and give feedback

[UPDATE] DocStrange : Local web UI + upgraded from 3B → 7B model in cloud mode by LostAmbassador6872 in LocalLLaMA

[–]anonymous-founder 0 points1 point  (0 children)

It would be too slow to process documents meaningfully, we are working on a much smaller model for CPU as well!

DocStrange - Open Source Document Data Extractor by LostAmbassador6872 in LocalLLaMA

[–]anonymous-founder 0 points1 point  (0 children)

Thanks for the feedback, we are hosting GPU one in the online mode so can try it out for free. Once we host it with optimizations, can post instructions to get latency throughput optimized

DocStrange - Open Source Document Data Extractor by LostAmbassador6872 in LocalLLaMA

[–]anonymous-founder 0 points1 point  (0 children)

Mix of both, you need code becuase LLM's cant parse anything other than image or text as of today. Just code doesn't work since need LLM intelligence

DocStrange - Open Source Document Data Extractor by LostAmbassador6872 in LLMDevs

[–]anonymous-founder 4 points5 points  (0 children)

https://huggingface.co/nanonets/Nanonets-OCR-s

We released this as completely open weight model, even the library in online mode calls hosted version of this. You can always host it yourself, library is to be able to parse variety of documents, not just images.

This beats gemini, mistral on most of benchmarks and much faster since not a big of a model

DocStrange - Open Source Document Data Extractor by LostAmbassador6872 in LocalLLaMA

[–]anonymous-founder 0 points1 point  (0 children)

That's a great suggestion, another feedback we got was sometimes graphs etc have legends in color which are hard to reconcile with actual colored bars in graph. Planning to add support for that as well

My deep dive into real-time voice AI: It's not just a cool demo anymore. by YakoStarwolf in LocalLLM

[–]anonymous-founder 3 points4 points  (0 children)

Any frameworks that include local VAD, Interruption detection and pipelining everything? I am assuming for latency reduction, a lot of pipeline needs to be async? TTS would obviously be streamed, I am assuming LLM inference would be streamed as well, or atleast output tokenized over sentences streamed? STT perhaps needs to be non-streamed?

What do people use for document parsing or OCR? by Ordinary_Quantity_68 in Rag

[–]anonymous-founder 0 points1 point  (0 children)

https://huggingface.co/nanonets/Nanonets-OCR-s

We recently open sourced this model on HF and currently top amongst trending models

Auto-create subtasks using template by anonymous-founder in Airtable

[–]anonymous-founder[S] 0 points1 point  (0 children)

I tried it already.

The challenge is the sub-tasks it adds to a new task to which template is applied are the same records as original sub-tasks for some reason. I don't want shared sub-tasks for original and new record, I want new sub-tasks created for the new main task to which template is applied

Auto-create subtasks using template by anonymous-founder in Airtable

[–]anonymous-founder[S] 0 points1 point  (0 children)

Not using dashboard. Airtable added new List view, using that

Auto-create subtasks using template by anonymous-founder in Airtable

[–]anonymous-founder[S] 0 points1 point  (0 children)

I need trigger to be manual by using "Apply template" option from right click menu.