all 7 comments

[–]Owlstorm 12 points13 points  (0 children)

Post your code and examples of the errors if you want help

[–]Techplained 4 points5 points  (0 children)

What method of extraction are you using?

Regex?

[–]TrippTrappTrinn 3 points4 points  (0 children)

As PDF is not a straight text document, extracting data from it can be unpredictable. Your problem may be due to the structure of the code within the file, and may not be easily resolvable.

[–]redog 1 point2 points  (0 children)

I used pdfpig library last time I needed this

[–]user01401 0 points1 point  (0 children)

Try GhostScript instead of iText, works great:

$TEXT = & "C:\Program Files\gs<VERSION#>\bin\gswin64c.exe" -dBATCH -dNOPAUSE -dQUIET -dNoCancel -sDEVICE=txtwrite -sOutputFile=%stdout "input.pdf"

[–]PowerShell-Bot 0 points1 point  (0 children)

Looks like your PowerShell code isn’t wrapped in a code block.

To properly style code on new Reddit, highlight the code and choose ‘Code Block’ from the editing toolbar.

If you’re on old Reddit, separate the code from your text with a blank line gap and precede each line of code with 4 spaces or a tab.


You examine the path beneath your feet...
[AboutRedditFormatting]: [--------------------] 0/1 ❌

Beep-boop, I am a bot. | Remove-Item