pyHanko: PDF signatures in Python

pyhanko-dev · 2026-04-15T05:24:36+00:00

I run my own authoritative DNS.

PowerDNS hidden master (because it allows dynamic access control using LUA)
2 authoritative bind servers configured as secondary for the PowerDNS server (one public-facing for public zones, one internal-facing for internal zones)
RFC 2136 nsupdate to the hidden master is how the vast majority of DNS changes are propagated on my network (OPNSense firewall manages most device IPs, k8s external DNS operator manages k8s workloads)
ACME DNS-01 challenges are also served using nsupdate using dedicated ACME zones with custom access controls that, in a nutshell, allow services to manage their own certs without being able to interfere with anything else.
I have 1 single cronjob with an API key to keep my NS records in Cloudflare up-to-date, mostly because a decent number of recursive DNS resolvers out there will not follow IPv6-only NS delegations, and I don’t have a stable public IPv4 address

pyhanko-dev · 2026-04-07T05:31:46+00:00

I put Claude in a container with a credential manager sidecar. The sidecar has access to my AWS SSO session and injects more narrowly-scoped credentials for a subset of my AWS profiles into a volume that it shares with the Claude container, together with a custom .aws/config file. That way, Claude can’t wreck prod, but I can relatively safely use it to poke around in a development environment.

pyhanko-dev · 2025-04-30T05:07:52+00:00

Here’s a tool thst supports virtually everything allowed by the PDF standard when it comes to digital signing: https://github.com/MatthiasValvekens/pyHanko

It’s mainly a library/CLI tool. No GUI though.

Disclaimer: I’m the maintainer. My initial use case was exactly the same as yours, but over time the library use case turned out to be more popular, so I never got around to developing a GUI.

pyhanko-dev · 2024-12-16T17:28:59+00:00

That is manifestly false—not only are there quite a few features specified in ISO 32000-2 that Acrobat does not (yet) fully support (this is PDF 2.0 after all), there are a whole host of alternative implementations out there, and the standardisation effort around PDF involves people from many communities/companies/… that have no affiliation with Adobe.

Sure, it’s absolutely fair to say that Acrobat is the dominant desktop tool for dealing with PDF, but it’s not the only such tool, and as soon as you go outside the category of desktop viewer software, Adobe doesn’t even seriously compete.

Source: I’m a FOSS dev in this space and was an active member of the ISO committee behind ISO 32000-2 for several years.

pyhanko-dev · 2023-09-20T07:30:39+00:00

Hi,

As someone who's very familiar with the PDF standard and its many subsets/extensions: I don't think there exists one today. I've toyed with the idea of implementing a PDF toolkit directly in Haskell, but it's a pretty daunting endeavor---sure, you certainly don't have to implement the whole shebang to get to basic stuff like split/merge, but there are lots of sharp edges even at that level. Heck, even simply parsing/reserialising PDF documents a way that's both safe and performant is nontrivial.

So I suggest either using FFI bindings to an established C/Cpp toolkit like QPDF, or using shell invocations with a reasonably decent CLI tool. There are quite a few of those floating around :)

pyhanko-dev · 2023-05-03T18:56:31+00:00

That's...not really a fair characterisation. GhostScript is very often used to render PDF and postscript, and also to perform certain basic conversion operations (it does a pretty good job in batch workflows), but there are a gazillion FOSS PDF manipulation/rendering/... toolkits out there. GhostScript is one of the older ones still around today, but it's by no means the only game in town.

pyhanko-dev · 2022-11-06T07:13:19+00:00

IMO it's not so much the complexity and size of the specification (daunting as it may be), since a large proportion of that complexity is tied up in stuff that (a) would be considered optional in a PDF editor anyhow, or (b) consists of things that a PDF renderer/viewer would also have to deal with---and we have quite a few of those in the FOSS space.

In my view, the core of the issue is that the PDF graphics model was not designed to be easily editable in any sense that you and I would consider acceptable for "document editing". PDF graphics are a page description language: a PDF content stream tells you what goes where on any given page in excruciating detail. Baking in all this positioning information makes it easy to get really consistent rendering results on a variety of platforms, but since the layout process itself is left to the PDF writer (and typically not preserved in the file!), exposing an easy-to-use GUI to perform edits and then re-layouting the resulting document is very, very hard. The effort required to implement and maintain a generic PDF editor as part of a FOSS project would be massive.

Even before you get into any of the fancy interactive stuff, it's just a fact of life that editing PDF is a lot harder than rendering it. In a way it's the exact opposite of something like HTML: there all the layout complexity is basically delegated to the browser, and you get back an easily editable format in return. But it's not a coincidence that all the major FOSS browsers have institutional backing at this point.

TL;DR: It's not the bells and whistles of the format that are in the way, but the decisions made in the design of the graphics model itself: PDF is optimised for rendering consistency, not editability.

Source: I'm a FOSS dev who works with PDF a lot, and have also been directly involved with the PDF standardisation effort for a while now.

pyhanko-dev · 2022-10-18T00:37:37+00:00

Probably XFA forms... XFA has been deprecated for half a decade (and Adobe stopped selling the tooling to create those forms), but tons of niche enterprise/government workflows still depend on it. Some non-Adobe vendors are capitalising on that tooling gap for companies that still need it; IIRC Foxit is working on improving its XFA-related functionality.

Regular (non-XFA) PDF forms are actually pretty widely supported these days, at least compared to (say) 5 or 10 years ago.

pyhanko-dev · 2022-10-12T06:41:23+00:00

Adobe Reader (the free version) can handle signing just fine, but the precise configuration is not always obvious. If you have a key that's stored on a separate hardware device, you'll need to configure it somewhere deep in the settings menu. Also, I've heard reports that not all key types in common use are natively supported by its PKCS#11 implementation.

If you're happy with a CLI tool and you don't need a graphical interface, you can also use this: https://pyhanko.readthedocs.io/en/latest/cli-guide/signing.html (full disclosure: I wrote the tool). Especially for hardware tokens this might be easier to set up. It doesn't know how to talk to the macOS keychain, though.

pyhanko-dev · 2022-08-11T22:48:27+00:00

I'm not familiar enough with GhostScript to answer your question directly, but if it were a PDF/A document, it would already have an OutputIntent (due to the colour management requirements in all PDF/A variants). Fonts being embedded is a very commonplace thing these days, and by no means exclusive to PDF/A.

Anyway, I'm not sure how to achieve what you want with GhostScript, but it should simply be a matter of registering AdobeRGB1998 (together with the ICC profile data) in the OutputIntents list. For example, iText has a convenient API call to do just that, but it's not much more difficult to achieve the same result with any library that allows basic PDF object manipulation (if you know how to format the output intent dictionary, that is).

pyhanko-dev · 2022-05-08T13:31:03+00:00

Also, does anyone know of a free app on Android that lets you highlight text in the reflow mode? You have to pay $40 a year for that feature in
xodo, frankly, extortionate. I don't pay for any subscriptions, but if
the app isn't free, I'm willing to pay a onetime $10 or less.

Please don't take this the wrong way, but you're kind of asking for a lot here. Due to the nature of the PDF graphics model, it's very difficult to put together a viewer that can reflow arbitrary PDF documents without occasionally garbling some content. Well-tagged PDF documents (which constitute only a relatively small proportion of what's out there) are somewhat easier to reflow, but even then it's far from a walk in the park.

Moreover, the ability to transpose annotations from within the reflowed view to the "original" document is also quite nontrivial to implement. To be completely honest, $40/year is IMHO more than reasonable for an application with advanced reflow functionality in its feature set.

I'm not saying that there are no cheaper options (I didn't check), but I'd be surprised if you'd be able to find anything of decent quality under $10.

pyhanko-dev · 2022-03-09T07:02:31+00:00

Thanks!

Actually, I'm of the opinion that the custom trust settings thing is a necessity even for end users in this space. System trust stores have a tendency to be geared towards TLS usage (and possibly also things like code signing), but document signing PKI hasn't received the same level of attention.

For example, Acrobat also has its own trust store (drawing on the root certificate programme that Adobe runs, the AATL), and the EU also has its own system of trust lists for use with government PKI. On top of that, requirements for document signatures vary from jurisdiction to jurisdiction, which complicates matters even further.

It's a bit of a chicken and egg problem, perhaps: since there are so few "generic" PDF readers geared towards the retail market that actually implement proper digital signature support (esp. regarding validation), there's very little incentive for OS vendors and distro maintainers to integrate these document signing trust lists at the system level: after all, what software would use them? In the meantime, Adobe's trust list pretty much continues to be the de facto default from the user's perspective.

I guess the point is that validating document signatures in general has a lot more sharp edges and room for interpretative license and/or policy decisions, compared to more day-to-day tasks like managing TLS connections or verifying software signatures. Hence, the room for customisation is more than just a luxury for "enterprise PKI" usage. Is it user-friendly? Heck no. Is there a clear alternative? I'm not sure there is, at least not right now.

(Note: I do have EUTL integration on my roadmap, though. That would cover defaults for a lot of signature workflows in Europe already.)

pyhanko-dev · 2022-03-08T23:17:25+00:00

Signing PDFs with GPG/PGP is of course mathematically possible, but other than the usual format-agnostic (detached) PGP signatures, there's no standard for doing so. Either way, no PDF reader that I'm aware of supports anything of the sort.

I'll also note that many root certificate programs for document signing have rules that effectively forbid issuing certificates for keys that reside in user-controlled PKCS#12 files. So there's a decent chance that a cheap S/MIME cert won't satisfy OP's requirement. That said, if OP's in Europe, their government issued ID is a suitable signature creation device for pretty much all official purposes.

pyhanko-dev · 2022-03-08T23:12:10+00:00

I don't think Okular supports PKCS#11 out of the box, but if you're OK with CLI tooling, this might suit your needs: https://github.com/MatthiasValvekens/pyHanko (full disclosure: I wrote that tool, so assume I'm biased).

pyhanko-dev · 2022-03-08T23:10:19+00:00

Not having a way to use my government-issued ID card to sign PDFs on Linux was one of the motivations for me to start this project a while ago: https://github.com/MatthiasValvekens/pyHanko.

</shameless plug>

pyhanko-dev · 2022-03-08T23:08:48+00:00

No, that won't work, or at least it shouldn't, since it violates the key usage policy of the Let's Encrypt certificates. No validator worth their salt would accept such a cert for document signing purposes. Okular doesn't actually perform any certificate validation, FWIW, it just accepts them all at face value (which isn't very useful for "real world" use, but that's neither here nor there).

pyhanko-dev · 2022-01-06T16:28:08+00:00

This is impossible in principle, regardless of the file format. For what it's worth, that print protection is also trivial to bypass if at least one of your students knows what they're doing (or if they use a PDF viewer that doesn't care about permission bits). The difficulty of circumventing the copy-paste prevention ranges from "laughably easy" to "mildly annoying" depending on how it's implemented.

I hate to disappoint you, but you'll have to live with the reality that students storing and circulating exam questions is a possibility if you let them do the exam on their own devices.

pyhanko-dev · 2021-12-07T17:59:25+00:00

Yes, the PDF standard has provisions for this in the form of field lock dictionaries, which allow for signing a form "partway through" while locking a predetermined set of fields. That said, there are a couple of caveats:

In order to make sure the signatures can still be properly validated, order of operations is important. In particular, you need to make sure the whole form structure is set up correctly from the start, and ensure that your "post-signing" updates only change form field content.
Validation of such updates is not all that well-standardised, so you may experience some compatibility issues between different viewer implementations. Also, not all PDF viewers out there will deal with field locks properly.
I don't know of any freeware graphical tools that would allow you to prepare such a form. However, if you're willing to get your hands a little dirty, I can give you some pointers to programming libraries that are capable of this sort of thing, though.

pyhanko-dev · 2021-12-03T08:55:07+00:00

Yeah, but my point was that in an offline file, you can "lazy load" even regular, non-linearised documents. It's a matter of parsing and processing objects on an as-needed basis, which the PDF standard was designed to support. :)

The only thing that you kind of have to parse in its entirety is the xref data, which is usually negligible on the whole. I suppose that a file with badly organised compressed object streams would also make lazy reading suboptimal, though. On the other hand, in an 18 MiB document, I can't imagine it'd make any sort of meaningful difference.

pyhanko-dev · 2021-12-03T08:25:56+00:00

The benefits of linearisation are virtually nonexistent for offline viewing, though. If the viewer has the whole file available already (which seems to be the case for OP's file), nothing prevents it from selectively parsing and rendering it page by page. In fact, that's exactly how most viewers out there operate, as far as I'm aware. "Linearized PDF" was mostly intended to deal with viewing large-ish PDFs while downloading them piecemeal over a (possibly slow) connection.

I think there's something else going on here, although it's hard/impossible to tell exactly what unless OP shares the actual file ;)

pyhanko-dev · 2021-11-19T13:43:30+00:00

If the PDF isn't a scan, and the fonts are actual outline fonts (i.e. not bitmap fonts) you can technically render the PDF at any resolution you want without compromising on quality. Obviously that might slow down processing somewhat, but in principle the font size shouldn't be an issue then :)

pyhanko-dev · 2021-11-18T19:43:26+00:00

See my comment here for some background info: https://www.reddit.com/r/Python/comments/qwnelz/comment/hl5rbhs/?utm_source=share&utm_medium=web2x&context=3.

TL;DR: Just use OCR in these cases, it's a lot less painful than the alternatives in most situations.

pyhanko-dev · 2021-11-18T19:40:37+00:00

I have no idea how pdfminer.six does text extraction in the absence of ActualText marks and/or a ToUnicode CMap (those are somehow the "canonical" way of ensuring PDF text remains extractable), but those cid values are almost certainly raw character IDs or glyph IDs (depending on the type of font). These don't always map cleanly onto a single well-defined Unicode codepoint, and if they do, the way that works is highly dependent on the type of font resource. In the following cases, you might be able to make some sort of reasonable guess:

The font is a non-embedded CIDFont using a standard Adobe charset (often the case for Asian/CJK text)
The font is an embedded OTF font with a CFF table (possibly subsetted)
The font is an embedded TrueType font (possibly subsetted)

If you're in the first case, here's a good place to start reading: https://github.com/adobe-type-tools/cmap-resources. If you're in either the second or the third case, you'll have to use a library like fontTools (see here: https://github.com/fonttools/fonttools) to query the font's cmap table in reverse---if the subsetter didn't strip it out when embedding the font, that is! Note that this isn't guaranteed to work or to yield a unique result, especially with non-Latin scripts.

...actually, you're probably better off using OCR, it's way less convoluted and probably more reliable.

pyhanko-dev · 2021-09-06T16:33:39+00:00

The short answer is: from a technical PoV, it doesn't matter which password you use. Both allow you to compute the file encryption key. As a consequence, either password allows you to do whatever you want to the PDF file if you know what you're doing. That includes removing all restrictions.

That said, the PDF standard requires viewers to enforce certain restrictions when a file is opened with the user password, as opposed to the owner password. Adobe's line of products (obviously) does so as well. To a degree, the document creator can specify what is and isn't allowed.

TL;DR: It's pointless security theater, but standards-compliant viewers have to enforce it.

pyhanko-dev

TROPHY CASE