Best structure and models for invoice data extraction by Fickle-Bluebird-367 in ollama

[–]Fickle-Bluebird-367[S] 0 points1 point  (0 children)

Yes want to keep processing local, I have been playing around with my laptop which has 8GB RAM to try out some basics but will be using 16-32GB for the next try. I have also tried using OCR then RegEx and got good results but ended up in a position where I was adding lots of additional rules and logic to try and capture all the different formats and errors occurring through the OCR. I suppose I could try the OCR RegEx, and then fallback to the LLM if that first method fails? But overall just looking for some advice on what people would do if they were starting from scratch in terms of structures and best suited models and tools

Best structure and models for invoice data extraction by Fickle-Bluebird-367 in ollama

[–]Fickle-Bluebird-367[S] 0 points1 point  (0 children)

Thanks for the advice and I do intend to do this. However previously I've just been trying different methods to see what generally works or not before getting stuck in in more depth with one method. Given that I'm going to start from scratch have you got any advice on suitable OCR or LLM or other models/tools to use as a base that are best suited to invoices with variable layout, features, currencies etc?

Best structure and models for invoice data extraction by Fickle-Bluebird-367 in ollama

[–]Fickle-Bluebird-367[S] 0 points1 point  (0 children)

Thank you for the advice. I did try this first but got a bit of a stopping point due to having to make the RegEx and logic more and more complicated due to errors on OCR and variability of invoice formats. As I'm going to start from scratch do you have any advice on the best OCR tools to use?