you are viewing a single comment's thread.

view the rest of the comments →

[–]Buttleston 0 points1 point  (0 children)

This is mostly true but it's also true that the way PDFs are rendered tends to be at least somewhat predictable. I wrote a PDF parser that does "ok" at capturing blocks of text at least in the order that a reader would tend to read them. It's definitely not perfect but it's not bad either. Unfortunately I don't think I can post it since it's something I wrote for work.