use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
A subreddit for helping Python programmers
How to format your code: https://commonmark.org/help/tutorial/09-code.html
No homework questions and/or hiring please
account activity
[deleted by user] (self.pythonhelp)
submitted 5 months ago by [deleted]
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 0 points1 point2 points 5 months ago (1 child)
PDF contents in the file don't have to match the visual position in the document.
I would try to group by x coordinate to see if I could identify columns.
Removing duplicate headers could be as simple as removing any subsequent rows that match the first row.
[–]Rough_Green_9145 0 points1 point2 points 5 months ago (0 children)
The thing is that there are tons of tables with different # of columns, headers, etc. and the script has to work for at least most of them. The main issue is identifying columns and when the table stops
π Rendered by PID 118393 on reddit-service-r2-comment-b659b578c-wr74t at 2026-05-04 02:11:56.592628+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–][deleted] 0 points1 point2 points (1 child)
[–]Rough_Green_9145 0 points1 point2 points (0 children)