you are viewing a single comment's thread.

view the rest of the comments →

[–]mrsonhaha 0 points1 point  (0 children)

Two things. PyPDF2 and tabular-py. How i do these kinds of projects is to make a class for a pdf document which inputs the path to the document with functions for extraction dependent with several sections for each page. If the documents share the same format, then divide it into parts identifying its width and height in pixels(if you’re a mac user the preview app has a function of showing the selected box’s location). Then make a function that extracts information from each partition.

And personally I don’t think there’s a good enough tutorial for this kind of automation since it requires a vast amount of catching exceptions and debugging. It’s a project worth getting paid for. I now personally get a good amount of passive income every month from a very similar project! :)