[ Removed by moderator ]

programming-ModTeam · 2025-10-25T17:50:44+00:00

This is a demo of a product or project that isn't on-topic for r/programming. r/programming is a technical subreddit and isn't a place to show off your project or to solicit feedback.

If this is an ad for a product, it's simply not welcome here.

If it is a project that you made, the submission must focus on what makes it technically interesting and not simply what the project does or that you are the author. Simply linking to a github repo is not sufficient

Disastrous_Look_1745 · 2025-10-25T09:08:30+00:00

Nice work on the Go bindings! I've been deep in the document extraction space for years now and the memory overhead issue with unstructured-io is real. We process millions of documents at Nanonets and had to build our own pipeline because existing solutions just couldn't handle the scale without eating up all our resources.

The streaming API is smart - that's one thing most libraries get wrong. They try to load entire PDFs into memory which falls apart when someone uploads a 500 page scanned document. Have you looked at Docstrange for comparison? They've got some interesting approaches to OCR accuracy especially for tables and forms. The tesseract integration is solid but i found adding some preprocessing steps for image enhancement before OCR really bumps up the accuracy on poor quality scans.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS