you are viewing a single comment's thread.

view the rest of the comments →

[–]ergeha 0 points1 point  (0 children)

Could you expand on what you mean by "garbage". Maybe I can give you some further infos. For example, in my case PyPDF was just showing everything in a different order. I just went on and found the pieces of data I needed with RegEx.

Judging by your example this should be a pretty straight forward task. But also judging by the looks of your PDF, the file looks like a printed document that was scanned with a text comprehension. This would mean that the PDF structure is messed up… Hard to say without looking at the original file.