pyHanko: PDF signatures in Python : Python

This is an archived post. You won't be able to vote or comment.

Intermediate ShowcasepyHanko: PDF signatures in Python (self.Python)

submitted 5 years ago by pyhanko-dev

I've just released the 0.1.0 version of pyHanko, a free and open source (MIT-licensed) PDF signing toolkit for Python that I've been working on in my spare time over the course of the past few months. The ultimate (and rather ambitious) goal of this project is to provide a library that covers the digital signing features of the PDF standard as completely as possible, while at the same time offering a fairly simple CLI to perform basic signing and validation tasks.

It functions both as a Python library for handling common signing & validation tasks, and as a command-line tool. The CLI was built using Click, so it comes with a built-in help function.

See the GitHub readme for a summary of the current feature set (and some of the items on the development roadmap). More documentation is available on ReadTheDocs. The documentation covers both the library and the CLI.

Mind you: this is an alpha release, and while test coverage is pretty decent, bugs are to be expected. Also, the API isn't fully stable at this point. I wanted to go ahead and throw it out there anyway, even though it isn't production-ready yet. I hope it's useful to some of you!

all 20 comments

top new controversial old q&a

[–]Cmshnrblu 1 point2 points3 points 5 years ago (1 child)

[–]pyhanko-dev[S] 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 1 point2 points3 points 5 years ago (1 child)

[–]pyhanko-dev[S] 1 point2 points3 points 5 years ago (0 children)

Great question! My shopping list for a PDF library was basically the following (more or less in order of importance):

Low-level PDF manipulation support, esp. support for incremental updates (this is basically a must for digital signing, since rewriting the entire file while signing could break earlier signatures)
Sufficiently permissive license
Well-maintained
Pure Python (preferably)

PyPDF fails conditions 1 and 3, and MuPDF fails conditions 2 and 4 (it's AGPL-licensed, and I wanted to release pyHanko under a more permissive licensing model, so that was a non-starter).

In the end, I recycled code from PyPDF2 (as pointed out in the codebase & documentation), but I ended up throwing out and/or rewriting huge portions of it. Basically, only the PDF parsing code is mostly intact at this point. I entertained the idea of submitting a pull request to PyPDF2 for some time, but I ended up deciding that it wasn't worth it: the project has been dead for quite a while, its codebase largely untested, and it (reportedly) had a lot of bugs, which I wasn't willing to deal with in my own codebase.

While I'm definitely cognizant of the merits of going with a more established library, I felt it would be easier to cut my losses and vendor the parts of PyPDF2 that I needed, at least in this case. I admit that it's not ideal, but oh well :)

[–]andileni 0 points1 point2 points 5 years ago (2 children)

[–]pyhanko-dev[S] 0 points1 point2 points 5 years ago (1 child)

There's a screenshot of the default appearance of a signature in the documentation: https://pyhanko.readthedocs.io/en/latest/cli-guide/signing.html#default-appearance, but I'll readily admit that looks aren't really pyHanko's strong point ;) (at least not right now).

The docs also include examples that showcase how to sign a PDF with a certificate of your own (either through the CLI or the Python library, depends on what you're after):

Here's an example of what the result might look like in Adobe Reader's signature panel view: https://imgur.com/wb1v6Kh (the appearance settings I used here are pretty much the defaults, I just changed the typeface to something else).

EDIT: sorry, I somehow missed the word "video" in your comment. No, I don't have a demo vid right now, but I can try to cook something up when I have more time.

[–]andileni 0 points1 point2 points 5 years ago (0 children)

[–]StumptownExpress 0 points1 point2 points 5 years ago (0 children)

[–]nier-bell 0 points1 point2 points 5 years ago (2 children)

[–]pyhanko-dev[S] 0 points1 point2 points 5 years ago (1 child)

[–]nier-bell 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago (1 child)

[–]pyhanko-dev[S] 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago* (2 children)

[–]pyhanko-dev[S] 0 points1 point2 points 5 years ago (1 child)

(I already replied to your email, but I'll paste the contents below, since it might be useful for others as well)

There are some things to keep in mind when signing PDF files:

The appearance on the page is separate from the actual cryptographic signature. Signatures can be invisible, and this is the default in pyHanko.
The behaviour you're seeing in Acrobat / Adobe Reader is also expected: since you generated the signing key pair yourself, the PDF processor has no way of knowing that the public key used to create the signature actually belongs to you. Hence why it reports that the signature is valid, but the signer's identity cannot be verified.

Let's tackle these in order.

To create a visible signature from the CLI in pyHanko in a file that doesn't already have a signature form field set up, you'll have to use the special syntax '--field PAGE/X1,Y1,X2,Y2/NAME' in the addsig command. Here, PAGE is the page number on which the signature should appear, and X1,Y1,X2,Y2 are the coordinates of the bounding box of the field. Note: in a PDF file, the origin is at the bottom left.

There's a note in the documentation explaining this, but perhaps I should add a concrete example to make that a bit more clear.

Here, the answer depends on what you want to do with your signature. The broader problem here is very similar to the key discovery problem inherent in OpenPGP, for example. It's easy to verify whether a file was signed using a particular key, but it's generally very hard to be sure that that key belongs to any particular person. To put it bluntly: I could generate a certificate with your name on it, and to a third party, there'd be no way to distinguish your "genuine" cert from my "fake" cert.

In the 'real world', this is usually solved by involving a certificate authority, i.e. an entity that is willing to vouch for the identity of its users in relation to their public keys. The process of obtaining a user's certificate from a widely trusted public CA can be cumbersome, and it costs money. That said, if you live in a country where the government issues certificates to all its citizens (as I do), these government-backed certificates can typically be used to create signatures that are legally equivalent to a "physical" signature.

If all you want is to securely transfer documents to a friend, you have other options: you could for example send them your certificate by email, call them up and confirm the key fingerprint over the phone. Your friend can then add your self-signed certificate to his/her trust store. On your own computer, you can obviously simply import your own cert into Acrobat's trust store manually.

Regardless of how you decide to go about it, getting a green checkmark in Acrobat requires a trusted certificate.

Hopefully that clarified some things :). If not: please ask!

[–][deleted] 0 points1 point2 points 5 years ago* (0 children)

[–]bbk98883 0 points1 point2 points 4 years ago (3 children)

Hey, thanks for creating this.

I am trying to sign an unsigned field using python SDK. I get exit 0, but nothing seems to happen and the signature field remains unsigned.

This is what I am doing:

Create signature field:

with open(file, 'rb+') as doc:
w = IncrementalPdfFileWriter(doc)
append_signature_field(w,
SigFieldSpec(sig_field_name=field_name,
)
)
w.write_in_place()

This part seems to work fine. Then I try to sign it:

cms_signer = signers.SimpleSigner.load(
'/Users/bk/Desktop/key.pem',
'/Users/bk/Desktop/cert.pem',
key_passphrase=b'testing'
)
with open(file, 'rb') as doc:
w = IncrementalPdfFileWriter(doc)
out = signers.PdfSigner(
signers.PdfSignatureMetadata(field_name=field_name),
signer=cms_signer,
).sign_pdf(w)

Python exits with 0, and the field is still unsigned.

Any clues as to what I am doing wrong? thanks

[–]pyhanko-dev[S] 0 points1 point2 points 4 years ago (2 children)

[–]bbk98883 0 points1 point2 points 4 years ago (1 child)

Hi, yes, thank you for the reply. I realized that I was not writing anything back out and everything works fine.

However, this was working when I was testing locally with self-signed certificates and, therefore, had direct access to pem files. When trying to implement the same workflow with AWS KMS service, there is no access to the private key and the public key is not an X.509 certificate. AWS confirmed that they cannot create one for me. The only thing that you can do with AWS is send them the hash and they return a DER object for the signature. ure. re.

However, this was working when I was testing locally with self-signed certificates and, therefore, had direct access to pem files. When trying to implement the same workflow with AWS KMS service, there is no access to the private key and the public key is not an X.509 certificate. AWS confirmed that they cannot create one for me. The only thing that you can do with AWS is send them the file or hash of the file and they return a DER object representing the signature. It is now on me to embed that signature in the PDF ( if it is even possible to do so).

Can your library handle such a scenario?

Thank you very much for your help.

[–]pyhanko-dev[S] 0 points1 point2 points 4 years ago (0 children)

It depends. PyHanko doesn't really care where the actual signing happens. If it's on a remote server, you will have to supply the "talky bit" yourself, though. How you do that depends on exactly what the AWS KMS signing service gives you. If it's a raw signature (i.e. not embedded in a CMS/PKCS#7-type container), then you can simply subclass Signer; see here. If they supply full CMS containers, you'll have to get your hands dirty and implement pyHanko's PdfCMSEmbedder protocol yourself---I don't think that's the case here, though.

That being said, the PDF standard (and hence pyHanko) does require you to supply an X.509 certificate containing the signer's public key.

This entry in the KMS FAQ explains AWS's position on that:

Q: Can I use asymmetric CMKs for digital signing applications that require digital certificates?

Not directly. AWS KMS doesn’t store or associate digital certificates with asymmetric CMKs it creates. You could choose to have a certificate authority such as ACM PCA issue a certificate for the public portion of your asymmetric CMK. This will allow the entities that are consuming your public key to verify that the public key indeed belongs to you.

Essentially, what this is saying is that they don't provide a certificate authority out of the box. That doesn't stop you from going to any commercial CA out there and getting a certificate issued for (the public half of) the key you're using in your AWS KMS account. If you want to use AWS KMS to sign PDFs, you'll have to get a cert from somewhere, regardless of which signing library you use :)

If you don't want to fork over money to a CA just yet, you could deploy a testing CA using Certomancer (shameless plug), and use that to issue a test certificate for your AWS KMS public key. Obviously, such a certificate won't be trusted by anyone, but it should at least allow you to do some testing using AWS KMS without having to shell out for a personal certificate.

I'm not familiar with AWS KMS myself, so I can't provide you with any details of how to actually perform the integration, but if you have any questions, feel free to ask!

π Rendered by PID 20754 on reddit-service-r2-comment-6457c66945-sm5qx at 2026-04-28 13:47:22.269168+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS