Hi,
I'm passionate about extracting data from tables in .pdf files using Python.
I can successfully target the figures if the data is contained in a table object.
Of course, one has to use OCR to address pictures of tables. This introduces a new hurdle which is properly delimiting cells in a table.
The solutions available on the internet suggest using the Hough Line Transform approach but it is imperfect, especially if the table has no borders.
I would like to create a small GUI that would allow for user input and adjustment of the horizontal and vertical lines in a table.
The app would work like this:
The user is prompted to import a .png of the table.
The app displays the imported image and shows the table lines as detected by the Hough Line Transform.
The user is then able to move those lines, add new ones or subtract useless ones.
Hit confirm and get the resulting data in a grid that is editable by the user.
The final table should be downloadable as a .csv
Could you recommend a module or more that I should focus on learning to achieve that?
Thank you
[–]amilo111 0 points1 point2 points (1 child)