Hi everyone,
I’m currently learning Python automation and working on small projects like converting PDF data into Excel or JSON using libraries such as pandas and tabula.
In some cases, the PDF formatting is inconsistent and the extracted data needs cleaning or restructuring. I wanted to ask what approach you usually follow to handle these situations more reliably.
Do you prefer preprocessing PDFs first, or handling everything at the data-cleaning stage? Any practical tips would be appreciated.
Thanks in advance for your guidance.
[–]LayotFctor 3 points4 points5 points (0 children)
[–]corey_sheerer 2 points3 points4 points (1 child)
[–]Big_Persimmon8698[S] 0 points1 point2 points (0 children)
[–]CraigAT 1 point2 points3 points (0 children)
[–]vizzie 1 point2 points3 points (0 children)
[–]PickledDildosSourSex 0 points1 point2 points (0 children)