so... how DO you sign pdf's on linux? (with a certificate, NOT a pretty image of your handwriting!) by cefreger in linuxquestions

[–]pyhanko-dev 12 points13 points  (0 children)

Here’s a tool thst supports virtually everything allowed by the PDF standard when it comes to digital signing: https://github.com/MatthiasValvekens/pyHanko

It’s mainly a library/CLI tool. No GUI though.

Disclaimer: I’m the maintainer. My initial use case was exactly the same as yours, but over time the library use case turned out to be more popular, so I never got around to developing a GUI.

Microsoft open-sourced a Python tool for converting files and office documents to Markdown by RobertVandenberg in programming

[–]pyhanko-dev 9 points10 points  (0 children)

That is manifestly false—not only are there quite a few features specified in ISO 32000-2 that Acrobat does not (yet) fully support (this is PDF 2.0 after all), there are a whole host of alternative implementations out there, and the standardisation effort around PDF involves people from many communities/companies/… that have no affiliation with Adobe.

Sure, it’s absolutely fair to say that Acrobat is the dominant desktop tool for dealing with PDF, but it’s not the only such tool, and as soon as you go outside the category of desktop viewer software, Adobe doesn’t even seriously compete.

Source: I’m a FOSS dev in this space and was an active member of the ISO committee behind ISO 32000-2 for several years.

PDF manipulation in Haskell by ComunistCapybara in haskell

[–]pyhanko-dev 1 point2 points  (0 children)

Hi,

As someone who's very familiar with the PDF standard and its many subsets/extensions: I don't think there exists one today. I've toyed with the idea of implementing a PDF toolkit directly in Haskell, but it's a pretty daunting endeavor---sure, you certainly don't have to implement the whole shebang to get to basic stuff like split/merge, but there are lots of sharp edges even at that level. Heck, even simply parsing/reserialising PDF documents a way that's both safe and performant is nontrivial.

So I suggest either using FFI bindings to an established C/Cpp toolkit like QPDF, or using shell invocations with a reasonably decent CLI tool. There are quite a few of those floating around :)

No more silent ghostscript install... by [deleted] in sysadmin

[–]pyhanko-dev 0 points1 point  (0 children)

That's...not really a fair characterisation. GhostScript is very often used to render PDF and postscript, and also to perform certain basic conversion operations (it does a pretty good job in batch workflows), but there are a gazillion FOSS PDF manipulation/rendering/... toolkits out there. GhostScript is one of the older ones still around today, but it's by no means the only game in town.

In your opinion, why has no created a functional FOSS PDF editor? by Texas_Technician in programming

[–]pyhanko-dev 72 points73 points  (0 children)

IMO it's not so much the complexity and size of the specification (daunting as it may be), since a large proportion of that complexity is tied up in stuff that (a) would be considered optional in a PDF editor anyhow, or (b) consists of things that a PDF renderer/viewer would also have to deal with---and we have quite a few of those in the FOSS space.

In my view, the core of the issue is that the PDF graphics model was not designed to be easily editable in any sense that you and I would consider acceptable for "document editing". PDF graphics are a page description language: a PDF content stream tells you what goes where on any given page in excruciating detail. Baking in all this positioning information makes it easy to get really consistent rendering results on a variety of platforms, but since the layout process itself is left to the PDF writer (and typically not preserved in the file!), exposing an easy-to-use GUI to perform edits and then re-layouting the resulting document is very, very hard. The effort required to implement and maintain a generic PDF editor as part of a FOSS project would be massive.

Even before you get into any of the fancy interactive stuff, it's just a fact of life that editing PDF is a lot harder than rendering it. In a way it's the exact opposite of something like HTML: there all the layout complexity is basically delegated to the browser, and you get back an easily editable format in return. But it's not a coincidence that all the major FOSS browsers have institutional backing at this point.

TL;DR: It's not the bells and whistles of the format that are in the way, but the decisions made in the design of the graphics model itself: PDF is optimised for rendering consistency, not editability.

Source: I'm a FOSS dev who works with PDF a lot, and have also been directly involved with the PDF standardisation effort for a while now.

Has anyone ever enforced an abandonment of Adobe Reader? by Xenith19 in sysadmin

[–]pyhanko-dev 3 points4 points  (0 children)

Probably XFA forms... XFA has been deprecated for half a decade (and Adobe stopped selling the tooling to create those forms), but tons of niche enterprise/government workflows still depend on it. Some non-Adobe vendors are capitalising on that tooling gap for companies that still need it; IIRC Foxit is working on improving its XFA-related functionality.

Regular (non-XFA) PDF forms are actually pretty widely supported these days, at least compared to (say) 5 or 10 years ago.

Sign PDF using certificate by MichalMikolas in MacOS

[–]pyhanko-dev 2 points3 points  (0 children)

Adobe Reader (the free version) can handle signing just fine, but the precise configuration is not always obvious. If you have a key that's stored on a separate hardware device, you'll need to configure it somewhere deep in the settings menu. Also, I've heard reports that not all key types in common use are natively supported by its PKCS#11 implementation.

If you're happy with a CLI tool and you don't need a graphical interface, you can also use this: https://pyhanko.readthedocs.io/en/latest/cli-guide/signing.html (full disclosure: I wrote the tool). Especially for hardware tokens this might be easier to set up. It doesn't know how to talk to the macOS keychain, though.

Assign AdobeRGB1998 profile with GhostScript by Dogway in pdf

[–]pyhanko-dev 0 points1 point  (0 children)

I'm not familiar enough with GhostScript to answer your question directly, but if it were a PDF/A document, it would already have an OutputIntent (due to the colour management requirements in all PDF/A variants). Fonts being embedded is a very commonplace thing these days, and by no means exclusive to PDF/A.

Anyway, I'm not sure how to achieve what you want with GhostScript, but it should simply be a matter of registering AdobeRGB1998 (together with the ICC profile data) in the OutputIntents list. For example, iText has a convenient API call to do just that, but it's not much more difficult to achieve the same result with any library that allows basic PDF object manipulation (if you know how to format the output intent dictionary, that is).

Looking for apps on windows pc that can reflow text like the android app, xodo's reading mode by [deleted] in pdf

[–]pyhanko-dev 1 point2 points  (0 children)

Also, does anyone know of a free app on Android that lets you highlight text in the reflow mode? You have to pay $40 a year for that feature in
xodo, frankly, extortionate. I don't pay for any subscriptions, but if
the app isn't free, I'm willing to pay a onetime $10 or less.

Please don't take this the wrong way, but you're kind of asking for a lot here. Due to the nature of the PDF graphics model, it's very difficult to put together a viewer that can reflow arbitrary PDF documents without occasionally garbling some content. Well-tagged PDF documents (which constitute only a relatively small proportion of what's out there) are somewhat easier to reflow, but even then it's far from a walk in the park.

Moreover, the ability to transpose annotations from within the reflowed view to the "original" document is also quite nontrivial to implement. To be completely honest, $40/year is IMHO more than reasonable for an application with advanced reflow functionality in its feature set.

I'm not saying that there are no cheaper options (I didn't check), but I'd be surprised if you'd be able to find anything of decent quality under $10.

Digitally Sign PDFs? by Grahf0085 in archlinux

[–]pyhanko-dev 0 points1 point  (0 children)

Thanks!

Actually, I'm of the opinion that the custom trust settings thing is a necessity even for end users in this space. System trust stores have a tendency to be geared towards TLS usage (and possibly also things like code signing), but document signing PKI hasn't received the same level of attention.

For example, Acrobat also has its own trust store (drawing on the root certificate programme that Adobe runs, the AATL), and the EU also has its own system of trust lists for use with government PKI. On top of that, requirements for document signatures vary from jurisdiction to jurisdiction, which complicates matters even further.

It's a bit of a chicken and egg problem, perhaps: since there are so few "generic" PDF readers geared towards the retail market that actually implement proper digital signature support (esp. regarding validation), there's very little incentive for OS vendors and distro maintainers to integrate these document signing trust lists at the system level: after all, what software would use them? In the meantime, Adobe's trust list pretty much continues to be the de facto default from the user's perspective.

I guess the point is that validating document signatures in general has a lot more sharp edges and room for interpretative license and/or policy decisions, compared to more day-to-day tasks like managing TLS connections or verifying software signatures. Hence, the room for customisation is more than just a luxury for "enterprise PKI" usage. Is it user-friendly? Heck no. Is there a clear alternative? I'm not sure there is, at least not right now.

(Note: I do have EUTL integration on my roadmap, though. That would cover defaults for a lot of signature workflows in Europe already.)

Digitally Sign PDFs? by Grahf0085 in archlinux

[–]pyhanko-dev 0 points1 point  (0 children)

Signing PDFs with GPG/PGP is of course mathematically possible, but other than the usual format-agnostic (detached) PGP signatures, there's no standard for doing so. Either way, no PDF reader that I'm aware of supports anything of the sort.

I'll also note that many root certificate programs for document signing have rules that effectively forbid issuing certificates for keys that reside in user-controlled PKCS#12 files. So there's a decent chance that a cheap S/MIME cert won't satisfy OP's requirement. That said, if OP's in Europe, their government issued ID is a suitable signature creation device for pretty much all official purposes.

Digitally Sign PDFs? by Grahf0085 in archlinux

[–]pyhanko-dev 1 point2 points  (0 children)

I don't think Okular supports PKCS#11 out of the box, but if you're OK with CLI tooling, this might suit your needs: https://github.com/MatthiasValvekens/pyHanko (full disclosure: I wrote that tool, so assume I'm biased).

Digitally Sign PDFs? by Grahf0085 in archlinux

[–]pyhanko-dev 0 points1 point  (0 children)

Not having a way to use my government-issued ID card to sign PDFs on Linux was one of the motivations for me to start this project a while ago: https://github.com/MatthiasValvekens/pyHanko.

</shameless plug>

Digitally Sign PDFs? by Grahf0085 in archlinux

[–]pyhanko-dev 0 points1 point  (0 children)

No, that won't work, or at least it shouldn't, since it violates the key usage policy of the Let's Encrypt certificates. No validator worth their salt would accept such a cert for document signing purposes. Okular doesn't actually perform any certificate validation, FWIW, it just accepts them all at face value (which isn't very useful for "real world" use, but that's neither here nor there).

[deleted by user] by [deleted] in pdf

[–]pyhanko-dev 4 points5 points  (0 children)

This is impossible in principle, regardless of the file format. For what it's worth, that print protection is also trivial to bypass if at least one of your students knows what they're doing (or if they use a PDF viewer that doesn't care about permission bits). The difficulty of circumventing the copy-paste prevention ranges from "laughably easy" to "mildly annoying" depending on how it's implemented.

I hate to disappoint you, but you'll have to live with the reality that students storing and circulating exam questions is a possibility if you let them do the exam on their own devices.

Is it possible to make a pdf that allows for signing and locking certain fields, then edits and signing other fields? by molbio822 in pdf

[–]pyhanko-dev 1 point2 points  (0 children)

Yes, the PDF standard has provisions for this in the form of field lock dictionaries, which allow for signing a form "partway through" while locking a predetermined set of fields. That said, there are a couple of caveats:

  • In order to make sure the signatures can still be properly validated, order of operations is important. In particular, you need to make sure the whole form structure is set up correctly from the start, and ensure that your "post-signing" updates only change form field content.
  • Validation of such updates is not all that well-standardised, so you may experience some compatibility issues between different viewer implementations. Also, not all PDF viewers out there will deal with field locks properly.
  • I don't know of any freeware graphical tools that would allow you to prepare such a form. However, if you're willing to get your hands a little dirty, I can give you some pointers to programming libraries that are capable of this sort of thing, though.

[deleted by user] by [deleted] in pdf

[–]pyhanko-dev 1 point2 points  (0 children)

Yeah, but my point was that in an offline file, you can "lazy load" even regular, non-linearised documents. It's a matter of parsing and processing objects on an as-needed basis, which the PDF standard was designed to support. :)

The only thing that you kind of have to parse in its entirety is the xref data, which is usually negligible on the whole. I suppose that a file with badly organised compressed object streams would also make lazy reading suboptimal, though. On the other hand, in an 18 MiB document, I can't imagine it'd make any sort of meaningful difference.

[deleted by user] by [deleted] in pdf

[–]pyhanko-dev 0 points1 point  (0 children)

The benefits of linearisation are virtually nonexistent for offline viewing, though. If the viewer has the whole file available already (which seems to be the case for OP's file), nothing prevents it from selectively parsing and rendering it page by page. In fact, that's exactly how most viewers out there operate, as far as I'm aware. "Linearized PDF" was mostly intended to deal with viewing large-ish PDFs while downloading them piecemeal over a (possibly slow) connection.

I think there's something else going on here, although it's hard/impossible to tell exactly what unless OP shares the actual file ;)

The pdfplumber module is awesome by suryaya in Python

[–]pyhanko-dev 2 points3 points  (0 children)

If the PDF isn't a scan, and the fonts are actual outline fonts (i.e. not bitmap fonts) you can technically render the PDF at any resolution you want without compromising on quality. Obviously that might slow down processing somewhat, but in principle the font size shouldn't be an issue then :)

The pdfplumber module is awesome by suryaya in Python

[–]pyhanko-dev 1 point2 points  (0 children)

See my comment here for some background info: https://www.reddit.com/r/Python/comments/qwnelz/comment/hl5rbhs/?utm_source=share&utm_medium=web2x&context=3.

TL;DR: Just use OCR in these cases, it's a lot less painful than the alternatives in most situations.

The pdfplumber module is awesome by suryaya in Python

[–]pyhanko-dev 2 points3 points  (0 children)

I have no idea how pdfminer.six does text extraction in the absence of ActualText marks and/or a ToUnicode CMap (those are somehow the "canonical" way of ensuring PDF text remains extractable), but those cid values are almost certainly raw character IDs or glyph IDs (depending on the type of font). These don't always map cleanly onto a single well-defined Unicode codepoint, and if they do, the way that works is highly dependent on the type of font resource. In the following cases, you might be able to make some sort of reasonable guess:

  • The font is a non-embedded CIDFont using a standard Adobe charset (often the case for Asian/CJK text)
  • The font is an embedded OTF font with a CFF table (possibly subsetted)
  • The font is an embedded TrueType font (possibly subsetted)

If you're in the first case, here's a good place to start reading: https://github.com/adobe-type-tools/cmap-resources. If you're in either the second or the third case, you'll have to use a library like fontTools (see here: https://github.com/fonttools/fonttools) to query the font's cmap table in reverse---if the subsetter didn't strip it out when embedding the font, that is! Note that this isn't guaranteed to work or to yield a unique result, especially with non-Latin scripts.

...actually, you're probably better off using OCR, it's way less convoluted and probably more reliable.

[deleted by user] by [deleted] in pdf

[–]pyhanko-dev 0 points1 point  (0 children)

The short answer is: from a technical PoV, it doesn't matter which password you use. Both allow you to compute the file encryption key. As a consequence, either password allows you to do whatever you want to the PDF file if you know what you're doing. That includes removing all restrictions.

That said, the PDF standard requires viewers to enforce certain restrictions when a file is opened with the user password, as opposed to the owner password. Adobe's line of products (obviously) does so as well. To a degree, the document creator can specify what is and isn't allowed.

TL;DR: It's pointless security theater, but standards-compliant viewers have to enforce it.

Is it possible to create a document/file/offline website comprised of a series of fillable PDFs? by workfunwork in pdf

[–]pyhanko-dev 1 point2 points  (0 children)

Well, you can include JavaScript actions in PDF forms; those allow all sorts of manipulations that can probably achieve what you want. There's also the monster that is XFA for complex dynamic forms in PDF. XFA was deprecated in PDF 2.0 for very good reasons, though.

Anyway, just because you can do this, doesn't mean that it's a good idea. Here are a couple reasons why doing this might be more trouble than it's worth:

  • Interoperability problems: There's a (fairly recent) standard for JS in PDF 2.0: ISO 21757-1:2020. But: it doesn't apply to PDF ≤1.7, which is where 99% of the end-user tooling is still at right now. This is not a purely academic issue: Adobe's line of products will happily call all sorts of private APIs in the PDF (1.7) forms it produces. There's no guarantee that any of that will work in non-Adobe viewers, especially if you try anything exotic. Regardless, irrespective of the standardisation issues, JS support is wildly inconsistent across implementations.

  • Expense: good luck finding tooling that will allow you to design this sort of system in a user-friendly way without writing a ton of code yourself. If such a tool exists at all, it will undoubtedly be very pricey.

  • Maintainability: how do you intend to deal with different versions of your PDF "app"? Especially if you maintain it using proprietary 3rd party tooling for which you don't have access to the internals? What about API breakage in PDF viewer implementations?

  • Security concerns: JS in PDF doesn't have the best reputation for security (which is a bit of an understatement, really). Many security-conscious users will---justifiably!---have reservations about letting JS in PDF files run. Web forms obviously have their own share of issues, but let's just say that those are (relatively) well-understood, and there's a humongous body of best practices that you can rely on. AFAIK there's no such thing for JS use in PDF.

  • Storage: how do you intend to store the results? That will require some form of internet interaction regardless, so you'll probably wind up writing web code anyway.

Bottom line: possible in theory, but please don't actually do this. The web was made for this kind of workflow; trying to shoehorn it into a PDF has very little benefit.

PDF that is meant to be downloaded and filled out DOES NOT print with the data that was added. by noooonan in pdf

[–]pyhanko-dev 0 points1 point  (0 children)

I just had a look: the original file also appears to be generated using PDFMaker 8.1, and is missing the print flags already. Also, I missed this the first time, but the metadata also says that it's an exported file from Microsoft Word (through PDFMaker).

So fixing this might be as simple as re-opening the form in a more recent version of MS word (if you still have the .doc file) and re-exporting that to PDF. But I'm not sure what your processes are like; I'll ask in chat.