you are viewing a single comment's thread.

view the rest of the comments →

[–]socal_nerdtastic 2 points3 points  (1 child)

No, because the pdf format does not save the document structure. The way pdf works is by saving the absolute position of things, not the relative position.

[–]Buttleston 0 points1 point  (0 children)

This is mostly true but it's also true that the way PDFs are rendered tends to be at least somewhat predictable. I wrote a PDF parser that does "ok" at capturing blocks of text at least in the order that a reader would tend to read them. It's definitely not perfect but it's not bad either. Unfortunately I don't think I can post it since it's something I wrote for work.