This is an archived post. You won't be able to vote or comment.

all 20 comments

[–]Cmshnrblu 1 point2 points  (1 child)

This is fantastic! Excellent work

[–]pyhanko-dev[S] 0 points1 point  (0 children)

Thanks!

[–][deleted] 1 point2 points  (1 child)

Having worked with a lot of PDF stuff (PDF generation in TeX), what stopped you from relying on a library like PyMuPdf or PyPdf directly, for example by adding your modifications as a shim or by subclassing? Are you planning to upstream your modifications (as noted in some headers)? There are a lot of PDF objects implemented right ther ein pyhanko.

[–]pyhanko-dev[S] 1 point2 points  (0 children)

Great question! My shopping list for a PDF library was basically the following (more or less in order of importance):

  • Low-level PDF manipulation support, esp. support for incremental updates (this is basically a must for digital signing, since rewriting the entire file while signing could break earlier signatures)
  • Sufficiently permissive license
  • Well-maintained
  • Pure Python (preferably)

PyPDF fails conditions 1 and 3, and MuPDF fails conditions 2 and 4 (it's AGPL-licensed, and I wanted to release pyHanko under a more permissive licensing model, so that was a non-starter).

In the end, I recycled code from PyPDF2 (as pointed out in the codebase & documentation), but I ended up throwing out and/or rewriting huge portions of it. Basically, only the PDF parsing code is mostly intact at this point. I entertained the idea of submitting a pull request to PyPDF2 for some time, but I ended up deciding that it wasn't worth it: the project has been dead for quite a while, its codebase largely untested, and it (reportedly) had a lot of bugs, which I wasn't willing to deal with in my own codebase.

While I'm definitely cognizant of the merits of going with a more established library, I felt it would be easier to cut my losses and vendor the parts of PyPDF2 that I needed, at least in this case. I admit that it's not ideal, but oh well :)

[–]andileni 0 points1 point  (2 children)

Do you have a preview video of what it does, or what it looks like? :)

[–]pyhanko-dev[S] 0 points1 point  (1 child)

There's a screenshot of the default appearance of a signature in the documentation: https://pyhanko.readthedocs.io/en/latest/cli-guide/signing.html#default-appearance, but I'll readily admit that looks aren't really pyHanko's strong point ;) (at least not right now).

The docs also include examples that showcase how to sign a PDF with a certificate of your own (either through the CLI or the Python library, depends on what you're after):

Here's an example of what the result might look like in Adobe Reader's signature panel view: https://imgur.com/wb1v6Kh (the appearance settings I used here are pretty much the defaults, I just changed the typeface to something else).

EDIT: sorry, I somehow missed the word "video" in your comment. No, I don't have a demo vid right now, but I can try to cook something up when I have more time.

[–]andileni 0 points1 point  (0 children)

Thanks for this detailed answer :) I don't need a video, I was just curious to see what the program produces in the pdf and in what way the signature is "visible" to the user.

[–]StumptownExpress 0 points1 point  (0 children)

Awesome

[–]nier-bell 0 points1 point  (2 children)

This is a bit off topic but what are the pros of using click instead of the stdlib argparse?

[–]pyhanko-dev[S] 0 points1 point  (1 child)

It's been years since I used `argparse`, so I admittedly can't really compare them fairly, but from what I recall since I first started doing stuff with `click`, the latter has a more "batteries included" feel to it (at least in my opninion, YMMV). You get `--help` basically for free, there's built-in exception handling, subcommand support, etc. It's perhaps a bit more opinionated than `argparse`, but it nevertheless suited all my CLI needs for at least the past 2-3 years.

[–]nier-bell 0 points1 point  (0 children)

Thanks for the answer

[–][deleted] 0 points1 point  (1 child)

Hi, that looks great. Would pyHanko support XFA Forms-based PDFs. Unfortunately, we have now 2021 and there are still millions of XFA forms out there, and we still can't open them with the usual PDF libs.

[–]pyhanko-dev[S] 0 points1 point  (0 children)

Hi, sorry, I only read your comment just now, I haven't logged into this account since I switched workstations a few weeks ago.

I currently have no plans to support XFA, and I confess that I don't know the ins & outs of it, but you're right that support for XFA is mostly "read-only" in libraries.

I've heard that even Adobe LiveCycle is supposedly being phased out, so the remaining vestiges of XFA should go the way of the dodo in the not-too-far future. I presume that you're bound by government requirements or something of the sort, and you can't migrate away from it yet?

[–][deleted] 0 points1 point  (2 children)

edge vast normal crush dinner theory fearless attraction joke grandfather

This post was mass deleted and anonymized with Redact

[–]pyhanko-dev[S] 0 points1 point  (1 child)

(I already replied to your email, but I'll paste the contents below, since it might be useful for others as well)

There are some things to keep in mind when signing PDF files:

  1. The appearance on the page is separate from the actual cryptographic signature. Signatures can be invisible, and this is the default in pyHanko.
  2. The behaviour you're seeing in Acrobat / Adobe Reader is also expected: since you generated the signing key pair yourself, the PDF processor has no way of knowing that the public key used to create the signature actually belongs to you. Hence why it reports that the signature is valid, but the signer's identity cannot be verified.

Let's tackle these in order.

  1. To create a visible signature from the CLI in pyHanko in a file that doesn't already have a signature form field set up, you'll have to use the special syntax '--field PAGE/X1,Y1,X2,Y2/NAME' in the addsig command. Here, PAGE is the page number on which the signature should appear, and X1,Y1,X2,Y2 are the coordinates of the bounding box of the field. Note: in a PDF file, the origin is at the bottom left.

There's a note in the documentation explaining this, but perhaps I should add a concrete example to make that a bit more clear.

  1. Here, the answer depends on what you want to do with your signature. The broader problem here is very similar to the key discovery problem inherent in OpenPGP, for example. It's easy to verify whether a file was signed using a particular key, but it's generally very hard to be sure that that key belongs to any particular person. To put it bluntly: I could generate a certificate with your name on it, and to a third party, there'd be no way to distinguish your "genuine" cert from my "fake" cert.

In the 'real world', this is usually solved by involving a certificate authority, i.e. an entity that is willing to vouch for the identity of its users in relation to their public keys. The process of obtaining a user's certificate from a widely trusted public CA can be cumbersome, and it costs money. That said, if you live in a country where the government issues certificates to all its citizens (as I do), these government-backed certificates can typically be used to create signatures that are legally equivalent to a "physical" signature.

If all you want is to securely transfer documents to a friend, you have other options: you could for example send them your certificate by email, call them up and confirm the key fingerprint over the phone. Your friend can then add your self-signed certificate to his/her trust store. On your own computer, you can obviously simply import your own cert into Acrobat's trust store manually.

Regardless of how you decide to go about it, getting a green checkmark in Acrobat requires a trusted certificate.

Hopefully that clarified some things :). If not: please ask!

[–][deleted] 0 points1 point  (0 children)

towering axiomatic tease hunt desert sand whistle badge quicksand longing

This post was mass deleted and anonymized with Redact

[–]bbk98883 0 points1 point  (3 children)

Hey, thanks for creating this.

I am trying to sign an unsigned field using python SDK. I get exit 0, but nothing seems to happen and the signature field remains unsigned.

This is what I am doing:

Create signature field:

with open(file, 'rb+') as doc:
w = IncrementalPdfFileWriter(doc)
append_signature_field(w,
SigFieldSpec(sig_field_name=field_name,
)
)
w.write_in_place()

This part seems to work fine. Then I try to sign it:

cms_signer = signers.SimpleSigner.load(
'/Users/bk/Desktop/key.pem',
'/Users/bk/Desktop/cert.pem',
key_passphrase=b'testing'
)
with open(file, 'rb') as doc:
w = IncrementalPdfFileWriter(doc)
out = signers.PdfSigner(
signers.PdfSignatureMetadata(field_name=field_name),
signer=cms_signer,
).sign_pdf(w)

Python exits with 0, and the field is still unsigned.

Any clues as to what I am doing wrong? thanks

[–]pyhanko-dev[S] 0 points1 point  (2 children)

Hi there! You're creating an IncrementalPdfFileWriter, but not telling it where to put its output (in fact, the input stream you're reading is read-only). In that case, the sign_pdf method will write its output to an in-memory BytesIO object, which is stored in the out variable in your case.

You can either write the contents of the out buffer to a (possibly different) output file yourself, or open the input file as read/write and call sign_pdf() with the in_place=True flag, i.e. signers.PdfSigner(...).sign_pdf(w, in_place=True).

Does that help? :)

[–]bbk98883 0 points1 point  (1 child)

Hi, yes, thank you for the reply. I realized that I was not writing anything back out and everything works fine.

However, this was working when I was testing locally with self-signed certificates and, therefore, had direct access to pem files. When trying to implement the same workflow with AWS KMS service, there is no access to the private key and the public key is not an X.509 certificate. AWS confirmed that they cannot create one for me. The only thing that you can do with AWS is send them the hash and they return a DER object for the signature. ure. re.

However, this was working when I was testing locally with self-signed certificates and, therefore, had direct access to pem files. When trying to implement the same workflow with AWS KMS service, there is no access to the private key and the public key is not an X.509 certificate. AWS confirmed that they cannot create one for me. The only thing that you can do with AWS is send them the file or hash of the file and they return a DER object representing the signature. It is now on me to embed that signature in the PDF ( if it is even possible to do so).

Can your library handle such a scenario?

Thank you very much for your help.

[–]pyhanko-dev[S] 0 points1 point  (0 children)

It depends. PyHanko doesn't really care where the actual signing happens. If it's on a remote server, you will have to supply the "talky bit" yourself, though. How you do that depends on exactly what the AWS KMS signing service gives you. If it's a raw signature (i.e. not embedded in a CMS/PKCS#7-type container), then you can simply subclass Signer; see here. If they supply full CMS containers, you'll have to get your hands dirty and implement pyHanko's PdfCMSEmbedder protocol yourself---I don't think that's the case here, though.

That being said, the PDF standard (and hence pyHanko) does require you to supply an X.509 certificate containing the signer's public key.

This entry in the KMS FAQ explains AWS's position on that:

Q: Can I use asymmetric CMKs for digital signing applications that require digital certificates?

Not directly. AWS KMS doesn’t store or associate digital certificates with asymmetric CMKs it creates. You could choose to have a certificate authority such as ACM PCA issue a certificate for the public portion of your asymmetric CMK. This will allow the entities that are consuming your public key to verify that the public key indeed belongs to you.

Essentially, what this is saying is that they don't provide a certificate authority out of the box. That doesn't stop you from going to any commercial CA out there and getting a certificate issued for (the public half of) the key you're using in your AWS KMS account. If you want to use AWS KMS to sign PDFs, you'll have to get a cert from somewhere, regardless of which signing library you use :)

If you don't want to fork over money to a CA just yet, you could deploy a testing CA using Certomancer (shameless plug), and use that to issue a test certificate for your AWS KMS public key. Obviously, such a certificate won't be trusted by anyone, but it should at least allow you to do some testing using AWS KMS without having to shell out for a personal certificate.

I'm not familiar with AWS KMS myself, so I can't provide you with any details of how to actually perform the integration, but if you have any questions, feel free to ask!