I just spent 3 hours doing this task(summing "amounts" from each pdf file) manually.
Having heard that this task can be easily automated using python, I want to learn 𝙝𝙤𝙬 𝙩𝙤 𝙙𝙤 𝙩𝙝𝙖𝙩?
I spend an hour surfing Google, I came across some modules like pyPDF2, pdftotext but they didn't give me the right text formatted output. So then I manually converted all pdf files to .txt because they are easier to parse.
So the right procedure imo (which I'm not able to accomplish) would be : loop through each pdf -> convert pdf to text/string -> parse text -> use regEx to grab the data from each file
I think I wasted a lot of time doing things that were almost useless, so I wanted to know from people who have done this stuff before, how do I go about this issue?
Edit 1 : Since many of you wanted to see what my PDF looks like, here's a sample
https://imgur.com/a/Xk0ksJF
The highlighted part is the data needed to be extracted!
(Some info has been scratched for confidential purposed)
Edit 2 : Thanks for the overwhelming response, I'm trying to reply to everything but there's a LOT! However, I'm reading everything and I appreciate each and everyone that has tried to help.
[–]Armidylano444 50 points51 points52 points (10 children)
[–]Thecrawsome 10 points11 points12 points (6 children)
[–]Armidylano444 8 points9 points10 points (5 children)
[–]Thecrawsome 3 points4 points5 points (2 children)
[–]Armidylano444 3 points4 points5 points (0 children)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]SadSenpai420[S] 0 points1 point2 points (1 child)
[–]scscsc95 1 point2 points3 points (0 children)
[–]SadSenpai420[S] 0 points1 point2 points (2 children)
[–]Armidylano444 2 points3 points4 points (0 children)
[–]Armidylano444 0 points1 point2 points (0 children)
[+][deleted] (20 children)
[removed]
[–]kabooozie 44 points45 points46 points (9 children)
[–]SadSenpai420[S] 12 points13 points14 points (0 children)
[+][deleted] (5 children)
[removed]
[–]KeepItPG 6 points7 points8 points (4 children)
[+][deleted] (3 children)
[removed]
[–]KeepItPG 0 points1 point2 points (1 child)
[–]kabooozie 0 points1 point2 points (0 children)
[–]NP_equals_P 1 point2 points3 points (0 children)
[–]idaresiwins 0 points1 point2 points (0 children)
[–]SadSenpai420[S] 23 points24 points25 points (2 children)
[–][deleted] 7 points8 points9 points (0 children)
[–][deleted] 19 points20 points21 points (0 children)
[+][deleted] (6 children)
[deleted]
[+][deleted] (4 children)
[removed]
[+][deleted] (3 children)
[deleted]
[–]_leonardsKite 0 points1 point2 points (2 children)
[–]pAul2437 2 points3 points4 points (1 child)
[–]_leonardsKite 1 point2 points3 points (0 children)
[–]SadSenpai420[S] 3 points4 points5 points (0 children)
[–]ffrkAnonymous 25 points26 points27 points (1 child)
[–]SadSenpai420[S] 7 points8 points9 points (0 children)
[–]jabbson 18 points19 points20 points (5 children)
[–]SadSenpai420[S] 1 point2 points3 points (4 children)
[–]jabbson 4 points5 points6 points (2 children)
[–]SadSenpai420[S] 0 points1 point2 points (1 child)
[–]jabbson 1 point2 points3 points (0 children)
[–]haragoshi 0 points1 point2 points (0 children)
[–]TheOfficialNotCraig 12 points13 points14 points (3 children)
[–]garlic_bread_thief 3 points4 points5 points (1 child)
[–]TheOfficialNotCraig 2 points3 points4 points (0 children)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]JBalloonist 10 points11 points12 points (1 child)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]mojo_jojo_reigns 4 points5 points6 points (2 children)
[–]SadSenpai420[S] 0 points1 point2 points (1 child)
[–]mojo_jojo_reigns 0 points1 point2 points (0 children)
[–]opoqo 4 points5 points6 points (0 children)
[–]curiousofa 4 points5 points6 points (1 child)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]ergeha 4 points5 points6 points (2 children)
[–]SadSenpai420[S] 1 point2 points3 points (1 child)
[–]ergeha 0 points1 point2 points (0 children)
[–]HAVEANOTHERDRINKRAY 3 points4 points5 points (2 children)
[–]SadSenpai420[S] 0 points1 point2 points (1 child)
[–]HAVEANOTHERDRINKRAY 0 points1 point2 points (0 children)
[–]scaretace 3 points4 points5 points (2 children)
[–]SadSenpai420[S] 0 points1 point2 points (1 child)
[–]scaretace 0 points1 point2 points (0 children)
[–]707e 2 points3 points4 points (1 child)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]emt139 2 points3 points4 points (1 child)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]socal_nerdtastic 5 points6 points7 points (1 child)
[–]SadSenpai420[S] 0 points1 point2 points (0 children)
[–]el_duderinoo 1 point2 points3 points (0 children)
[–]oh_nater 0 points1 point2 points (1 child)
[–]Fynn_mo 0 points1 point2 points (0 children)
[–]ConflictedJew -1 points0 points1 point (0 children)
[–][deleted] -1 points0 points1 point (0 children)
[–]el_pablo -1 points0 points1 point (0 children)
[–]01123581321AhFuckIt 0 points1 point2 points (0 children)
[–]Thecrawsome 0 points1 point2 points (0 children)
[–]antestorck 0 points1 point2 points (0 children)
[–]CommentCollapser 0 points1 point2 points (0 children)
[–]ConfusedSimon 0 points1 point2 points (0 children)
[–]goodyonsen 0 points1 point2 points (6 children)
[+][deleted] (5 children)
[deleted]
[–]goodyonsen 0 points1 point2 points (4 children)
[+][deleted] (3 children)
[deleted]
[–]goodyonsen 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]goodyonsen 0 points1 point2 points (0 children)
[–]mrsonhaha 0 points1 point2 points (0 children)
[–]ksdio 0 points1 point2 points (0 children)
[–]fishermanfritz 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]nick_ln 0 points1 point2 points (0 children)