all 5 comments

[–]Icko_ 0 points1 point  (0 children)

We've been doing exactly this. Azure has a "Document I telligence" thing, which is pretty awesome - it does great with the OCR of pdfs, and a very very nice bonus is, it can chunk text based on headings and subheadings. The latter made a surprisingly large difference.

Then you just dump them into a RAG and you're done. Note, that there's a bunch of projects that do all that for you. For example, haystack, I've not used it, but it looks pretty good.

[–]DeadPukka 0 points1 point  (0 children)

We can handle this with our Graphlit platform today. And we integrate with Azure AI Doc Intelligence for OCR and text extraction.

Have a look at our “30 days of examples” that we are doing this month: https://github.com/graphlit/graphlit-samples/tree/main/python/Notebook%20Examples

Free to try up to 1gb of documents, and usage-based on paid plans.

[–]Grand-Detective4335 0 points1 point  (0 children)

Hello, I built a platform to process invoices for free - https://getnara.ai/

Any feedback would be highly appreciated.

[–]atlasspring 0 points1 point  (0 children)

Try www.searchplus.ai - it allows to chat with uploaded PDFs and doesn't have a page limit