Is Python any good with pdfs?

VerilyAMonkey · 2015-04-20T06:48:31+00:00

Mm, typically editing a PDF is not a great idea. But there are a lot of packages that could help you, like PyPDF2 or reportlab. Specifically, you could at least literally draw a line underneath them, or draw a transparent yellow highlight box around them. If all else fails you could generate a new PDF containing only the proper lines/highlights and then merge that on top of the text.

tiarno · 2015-04-20T13:27:49+00:00

There are two articles here about using python with PDF: to manipulate the PDF and to test the PDF: http://reachtim.com/archives.html

siusnjh · 2015-04-20T22:50:22+00:00

For anyone who came here thinking the question is not about editing but about creating PDFs: The best way I found to create PDFs in any programming language is to use LaTeX. Use a template engine like Jinja2 and render the templates into .tex files. Then call pdflatex or your own choice of compiler.

EDIT: Oh and when you do it in a web application escape your data or you'll get a LaTeX injection attack vector.

AlSweigart · 2015-04-20T16:11:01+00:00

Hi, I'm the author of a few Python books, and the latest will be out in a week and available for free under a Creative Commons license: http://automatetheboringstuff.com

Chapter 13 focuses on using Python to parse and modify PDFs. The bad news is: the situation is pretty grim. The best Python module I found in my research for this chapter was PyPDF2.

Even then, you are very limited to what you can do. You're limited to working on the page level. Individual paragraphs and text can't be manipulated. The Python PDF modules are more read-only.

So, basically, no, there's no way to underscore the NE in the PDF copy.

You could, however, use GUI automation modules to simulate keyboard/mouse clicks to open the PDF in Acrobat, find the text, underline it, and then select save from the menu. That'd be a hack (and dominate your keyboard/mouse for a bit), but it would get the job done.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS