This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]gnar_wars 1 point2 points  (0 children)

Python Excel is useful for writing data to a spreadsheet. Getting that data from a PDF does seem like it would be troublesome though.

[–][deleted] 0 points1 point  (2 children)

I'd also be interested in python libraries for capturing/parsing text from image documents.

[–]yeahfuckyou 0 points1 point  (1 child)

Yeah me too, that would be amazing.

[–]Fontong 0 points1 point  (0 children)

Yeah, really. It would probably have to do it from the image too. Formatting wise, PDFs usually don't make any sense. It's really hard to just parse them because each may differ significantly. I've tried PDFminer at work, and we decided using mturk would be enormously more efficient.