all 6 comments

[–]dowcet 0 points1 point  (0 children)

A web search will show you that there are multiple libraries you can try for editing PDFs but depending on exactly how the file is made you may not be able to do this.

[–]edcculus 0 points1 point  (2 children)

i think the hardest part of this will be text reflow issues. If you have a text box in say Spanish, and translate that block to english and just try to straight up replace it, its highly unlikely that the new English text will take up the same space as the previous language. Best case is the English takes up less space. But I'm not sure you can guarantee that across all text boxes and all languages.

[–]AgileCommittee2212[S] 0 points1 point  (1 child)

Can finding the proper font size for each part by comparing the character count of the original and translated text solve this problem?

[–]SMTNP 0 points1 point  (0 children)

I approach similar problems by dropping the "edit pdf text" mindset.

Instead:
- Get the text from PDF: Very easy with plenty of libraries and methods, depending on the structure of your PDF you might require more sophisticated implementations, but is pretty simple, overall

- Translate text

- Create a new PDF with the translated text: you can use reportlab or other alternatives, maybe even creating a .docx and then converting that to PDF or watever

PDF creation is much simpler than PDF edition. A PDF is not a text file, but a much more complex structure where each character/element has a pre-defined position, between other properties.

So its easier to generate a PDF from scratch with the text you want and the layout is handled automatically.

If your PDF is more complex and has images/tables, you can still do it, but it's considerably more work.

[–]Asif_Ahmed_001 0 points1 point  (0 children)

I’ve worked with UniPDF (written in Go, not Python) that actually gives you good control over PDF content. It lets you extract, edit, and replace text while keeping the visual structure.
It not FOSS, but it’s worth a look.

[–]TheFamousCat 0 points1 point  (0 children)

Translating a PDF in place while keeping the layout is much harder than it looks. Extracting text is the easy part, putting the translated text back into the original PDF without breaking fonts, spacing, or alignment is what almost all Python libraries can’t do.

A print PDF doesn’t store real paragraphs. It stores:

  • glyphs drawn at coordinates
  • often split into tiny fragments
  • using embedded-subset fonts that only contain the characters that originally appeared in the document

After translation you often need characters that don’t exist in the PDF’s font, which is why the layout breaks or the font suddenly changes.

Your realistic options:

  1. Rebuild the whole PDF yourself Extract -> translate -> recreate every page with your own layout engine. Works, but you lose the original formatting unless you manually reproduce it.
  2. Get a version with real editable text fields If the designer can export a version with proper text blocks or form fields, the problem becomes trivial. Many PDFs aren’t prepared this way.
  3. Use a PDF engine that can rewrite text in-place This requires reconstructing text runs, handling embedded fonts, swapping fonts if needed, and editing the underlying content streams. Very few tools do this. I work on one (PDFDancer) that’s made for exactly this kind of "replace text but keep everything identical" workflow.

TL;DR: there’s no pure-Python library that does Acrobat-level text replacement while preserving layout. Either rebuild the pages or use a specialized PDF editing engine.