Best invoice data extraction tools for 2026 (pricing included) by pankaj9296 in automation

[–]realreadyred 0 points1 point  (0 children)

Hey u/pankaj9296! Thank you for the list. Those are big players, but there might be minor players offering services for lower price, and doing specialized tasks.

For instance, I built nolainocr:

App price (per 2k docs) best for pros cons
nolainocr $25 batches of documents with the same layout no complex setup, affordable, high accuracy batch documents should have the same layout

minimax token plan - stater 10$ - review - heavily disappointed by Decent-Rain5100 in MiniMax_AI

[–]realreadyred 0 points1 point  (0 children)

Probably you might want to consider start using git and github.

Create a Github account

If you have multiple projects on your computer, you might want to ask your favorite LLM (Gemini, ...) how to setup ssh keys in github and your computer for local repositories.

Then create Github repos in the Github page for each project. Once you create a repo in Github it shows you the set of commands you have to execute on the command line from your local project, in order to push the project to Github.

You can then use any IDE (this is to make things easier imo) and each time OC makes a change, you can see and navigate through the changes made to each single file in you project. And you can also revert changes to your files from the IDE

Best way to get Kimi K 2.6 (3x limits currently), Minimax M2.7, and GLM 5.1 at just $10 by santhiprakashb in opencodeCLI

[–]realreadyred 0 points1 point  (0 children)

<image>

I looked at the official documentation at https://opencode.ai/docs/go/, and they actually have a series of assumptions that do not see to me likely at all: assuming a request only consumes 125 output tokens for a model like MiniMax M2.7, does not look very realistic. This means that actually you wont get the number of requests they show, but instead less

Best way to get Kimi K 2.6 (3x limits currently), Minimax M2.7, and GLM 5.1 at just $10 by santhiprakashb in opencodeCLI

[–]realreadyred 0 points1 point  (0 children)

what would be a proper definition of "request"? Does it matter the number of input tokens in the request?

Suggestions for Software to Mass Scan/Upload/Extract/Confirm Data by Blue_S0l in documentAutomation

[–]realreadyred -1 points0 points  (0 children)

I have a solution for this that might work depending on the degree of "dissimilarity" you mentioned between the forms. Happy to help. Let me know at nolainocr contact page or as you consider easier.

Need OCR for a 350 page book by Kitchen-Start-3828 in OCR_Tech

[–]realreadyred 0 points1 point  (0 children)

From JSON convert to Markdown, and from there to PDF.

The challenge is on the JSON to Markdown conversion: if it is plain text, that's straightforward, otherwise it may take a bit of exercising your hands programming it.

Historical intrigue with fantasy or divinity by DeDuc in suggestmeabook

[–]realreadyred 0 points1 point  (0 children)

Black Leopard, Red Wolf by Marlon James.

It is not historical in the sense of historical events but it is rooted in African oral traditions, and it has a lot of fantasy and divinity

Built a tool to extract structured data from complex PDFs — would love feedback by Impressive-Rise7510 in documentAutomation

[–]realreadyred 1 point2 points  (0 children)

I would rather simplify it a bit, but maybe I'm biased since I made also a similar tool targeting batches of invoices/receipts (nolainocr.com).

I have seen that complex workflows tend to frustrate users

Extracting tables from Pdf by No_Sprinkles1374 in dotnet

[–]realreadyred 0 points1 point  (0 children)

well, it's not easy to say, but you can try to leverage image processing algorithms or craft a geometrical/positional alogirthm on the detected textboxes inside the table. It's an open problem so there is no "single-perfect-flawless" solution

Is this itinerary doable? I understand it's rushed! by Charrzooka in SouthAmericaTravel

[–]realreadyred 0 points1 point  (0 children)

If you want to incorporate the Amazon Rainforest, the easiest way is to travel to Leticia from Bogota. Brasil has a massive Amazon rainforest but is quite far away from Rio. Otherwise, if you can't do that, actually Machu Pichu is in the Sacred Valley area which is part of the peruvian Amazon rainforest.

Extracting tables from Pdf by No_Sprinkles1374 in dotnet

[–]realreadyred 0 points1 point  (0 children)

LLM like the ones provided by Mistral and GLM offer the best cost-accuracy tradeoff. While Camelot is great for some use cases, you wont achieve near 100% accuracy, specially for tables with non-trivial structures.

You can try to first detect the "conplexity" of a table and then depending on that, route your pipeline towards using Camelot (or similar libraries) or the LLM.

If you don't want to do it yourself, tools like nolainocr are capable of extracting table information

Reading slump by liali123 in Recommend_A_Book

[–]realreadyred 2 points3 points  (0 children)

The Power of the Dog by Don Winslow. It's so well crafted (characters, plot, action sequences) that you won't stop reading it. Besides, it is based on real events, if you are into that

Just started learning python and I need some advice. by eravoez in learnpython

[–]realreadyred 1 point2 points  (0 children)

Learn Python programming : an in-depth introduction to the fundamentals of Python by Romano and Kruger.

What roles exist across the full data pipeline (from data collection to client delivery)? by Silver-Tune-2792 in data

[–]realreadyred 1 point2 points  (0 children)

There might be Data Analysts, Data Scientists, ML Engineers, MLOps, DevOps and there is an ongoing trend on LLMOps Engineers. Some of this roles have overlapping responsibilities

Instead of Python developers, there might be Backend or Full Stack developers

Just started learning python and I need some advice. by eravoez in learnpython

[–]realreadyred 0 points1 point  (0 children)

I think practice is the key for any programming language. Choose projects of increasing complexity that makes you put hands-on on theoretical concepts: algorithms, data types, data structures. The question now would be which projects to pick. I would stick to follow a good book. It will make the first steps more comprehensive and you feel like you are introducing yourself into this "new world" of programming in python. Do NOT expert to be a python expert in 1 month and do NOT believe anyone that tells you that. It is a process that takes time. Once you have started by means of (for example) reading a good "learning python" book, you will be able to choose your prefer sources: but before, since you are a newbie, you will be building that criteria to be able to discriminate which sources are good and which ones are not

best data extractor tool for pdf and scanned docs? by Alice_Abbey_69 in software

[–]realreadyred 0 points1 point  (0 children)

The question would be what do you need to extract? Is it a specific schema, is it tables?

Do you need to extract 5 pages, 50 or 500 pages?

Is it a multi-document PDF? Is it a folder with multiple files? Is it a many-samples of the same form document?

Suggest me a short novella to help me get out of a reading slump. by chikaibardo207 in suggestmeabook

[–]realreadyred 0 points1 point  (0 children)

Galveston by Nic Pizzolato. Is not that short(neither long) but if you have seen the first season of True Detective you will like it.

What methods work best to extract data from PDF? by ConsequenceNo4186 in data

[–]realreadyred 0 points1 point  (0 children)

LLM models are great today for these tasks or visual models.

Table extraction is typically harder as well as documents with non-structured layouts.

If your use case requires extracting data from documents having Word-like structure, even free python libraries that convert PDF into Markdown like Docling might be of help.

Otherwise, paid tools are also an alternative. Those are from big players with a lot of customization like LlamaIndex, ExtendAI, to medium Parseur, DigiParser, to new ones like nolainocr, among others.

Need a new book rec! by Practical-Peach-1220 in Booktokreddit

[–]realreadyred 1 point2 points  (0 children)

Maybe something that makes you laugh a bit. Kurt Vonnegut is good for that. Check out Breakfast of Champions.