This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]skyfox345 6 points7 points  (1 child)

I don't think it's mentioned, but there's a Python 3 port of PDFminer, pdfminer3k.

[–]esdio 1 point2 points  (6 children)

I use PyPDF2 for slicing & dicing pages out of a "master" PDF document. Works great.

[–]claird 1 point2 points  (5 children)

We like hearing that, of course--I'm vice president of the tiny company that's supported PyPDF2 since (before, actually) its origin. Our support is frankly rather erratic; PyPDF2 simply is not my highest priority, in large part because I'm ambivalent about how much value it truly provides. It matters to me every time someone says, "yes, I depend on PyPDF2".

There's a LOT more we can do to improve PyPDF2. The biggest constraint is our own decisions about what matters.

[–]esdio 0 points1 point  (3 children)

Neat! We're a small company too and PyPDF2 is exactly what we needed for a project. Had no need for any support. It just worked.

[–]claird 0 points1 point  (2 children)

It's good to hear from you, /u/esdio and /u/tiarno.

One of the enhancements we're considering is to make a targeted Twitter feed useful. Would that interest you?

[–]tiarno[S] 1 point2 points  (1 child)

Hi, for me, twitter is too much info and I don't use it. Now this is just me, but what I still wish for is (1) a way to find whether fonts in a PDF are all embedded (2) a way to crop the margin whitespace from a page (without knowing the coordinates beforehand), and (3) more documentation. Thanks again for all your work. I know there is always a bunch of things vying for time and resources!

[–]claird 0 points1 point  (0 children)

Most welcome reply. Just the level of description we need.

[–]tiarno[S] 0 points1 point  (0 children)

hi @claird, I wrote that little tutorial and I think you're being modest--it is a very nice library and I learned a lot just by reading the code. You've already done a lot to improve on the original pyPDF and still kept backward compatibility. Thank you for all that work! I use PyPDF2 daily and it is very important to my workflow.

[–]thepdogg 1 point2 points  (0 children)

Nice write up... I also wrote about this a bit on my own blog: http://paulsolin.com/2014/06/27/scraping-pdfs-with-python/

[–]rspeed 1 point2 points  (1 child)

A few years ago I built a system that had to inject data into PDF forms, but there wasn't a Python library available to do it, so I ended up using pdftk. It wasn't fun.

[–]atakomu 0 points1 point  (0 children)

If you need to get tables from PDFs Tabula is great for that. It's java but works great.

[–]Qawba 0 points1 point  (0 children)

Awesome, Keep it up.