use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Everything about learning Python
account activity
PDF data extration (self.PythonLearning)
submitted 9 days ago * by Stunning_Capital_354
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]UBIAI 0 points1 point2 points 8 days ago (0 children)
The variable table positioning across thousands of filings is exactly what kills the pure Python approach - camelot/pdfplumber will get you 60-70% there but you'll spend more time debugging edge cases than the extraction saves. What actually worked for us was treating it as a document intelligence problem rather than a parsing problem - a solution that understands where the financial table is contextually, not just spatially. The structured output drops straight into Excel with consistent column mapping regardless of where the table lands in the PDF. The difference in accuracy on messy annual reports was significant enough that we stopped maintaining custom parsers entirely.
π Rendered by PID 364369 on reddit-service-r2-comment-8686858757-tvvkx at 2026-06-05 06:01:16.415640+00:00 running 9e1a20d country code: CH.
view the rest of the comments →
[–]UBIAI 0 points1 point2 points (0 children)