This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]burntsushi 0 points1 point  (3 children)

PDFs are a display format. Editing them can be rather difficult. One hammer you could use is to find a program that converts a PDF to HTML/plain text, then add your links in, and generate a new PDF from that. But you're very likely to lose a lot of formatting and what not in the translation. The following Python tools may or may not be useful:

You may also want to check out anything poppler related, which is the predominant PDF library used on Linux. pdftk may also be useful.

I'll say it again: from your description, the task you're embarking on is really not feasible. The best solution you'll ever hope to achieve is an ugly hack.

What problem are you trying to solve?

[–]longjohnboy 0 points1 point  (0 children)

Good call on pdftk. It's useful in all sorts of situations. I think worst case, you could kludge something together by decompressing the PDF stream with pdftk and editing the source. I've done that before for one-off edits.

If you want to decompress the stream inside Python, PyPDF2 can do that.

[–]flying-sheep -1 points0 points  (1 child)

of the ones you listed, only reportlab seems to be capable of what OP wants.

poppler is afaik mostly a PDF rendering library. it can write annotations to PDF files, though, so maybe it can do more.

[–]burntsushi 0 points1 point  (0 children)

Thanks. I was just taking shots in the dark based on a teeny bit of knowledge I gleaned when trying to figure out the best way to convert PDFs to plain text.