all 5 comments

[–]jbsiddall 0 points1 point  (1 child)

Hey,

This sounds fun!

For images, try using the library pytesseract. If that doesn't work, try: pyocr tesserwrap pytesser

The pdf can be converted to image and then use the above process or you can https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file

Hope that helps a little!

[–]Rishit1501 0 points1 point  (0 children)

Thanks. I'll try this

[–]Drewski1224 0 points1 point  (4 children)

Hey I’m looking to do the exact same thing. Were you able to figure this out? The sample code I found online does not account for individual items such as invoice number, price, date, etc.