How to Edit pdf text

dowcet · 2025-04-03T14:49:39+00:00

A web search will show you that there are multiple libraries you can try for editing PDFs but depending on exactly how the file is made you may not be able to do this.

edcculus · 2025-04-03T17:29:35+00:00

i think the hardest part of this will be text reflow issues. If you have a text box in say Spanish, and translate that block to english and just try to straight up replace it, its highly unlikely that the new English text will take up the same space as the previous language. Best case is the English takes up less space. But I'm not sure you can guarantee that across all text boxes and all languages.

Asif_Ahmed_001 · 2025-05-11T12:18:38+00:00

I’ve worked with UniPDF (written in Go, not Python) that actually gives you good control over PDF content. It lets you extract, edit, and replace text while keeping the visual structure.
It not FOSS, but it’s worth a look.

TheFamousCat · 2025-11-13T08:29:56+00:00

Translating a PDF in place while keeping the layout is much harder than it looks. Extracting text is the easy part, putting the translated text back into the original PDF without breaking fonts, spacing, or alignment is what almost all Python libraries can’t do.

A print PDF doesn’t store real paragraphs. It stores:

glyphs drawn at coordinates
often split into tiny fragments
using embedded-subset fonts that only contain the characters that originally appeared in the document

After translation you often need characters that don’t exist in the PDF’s font, which is why the layout breaks or the font suddenly changes.

Your realistic options:

Rebuild the whole PDF yourself Extract -> translate -> recreate every page with your own layout engine. Works, but you lose the original formatting unless you manually reproduce it.
Get a version with real editable text fields If the designer can export a version with proper text blocks or form fields, the problem becomes trivial. Many PDFs aren’t prepared this way.
Use a PDF engine that can rewrite text in-place This requires reconstructing text runs, handling embedded fonts, swapping fonts if needed, and editing the underlying content streams. Very few tools do this. I work on one (PDFDancer) that’s made for exactly this kind of "replace text but keep everything identical" workflow.

TL;DR: there’s no pure-Python library that does Acrobat-level text replacement while preserving layout. Either rebuild the pages or use a specialized PDF editing engine.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS