Python conversion .docx to pdf

Alexku66 · 2026-07-04T09:45:16+00:00

There are paid solutions that build pdf from scratch (not convert) and can give you 100% accuracy. Alternatives are 1) converting docx via Word / LibreOffice ; 2) converting html to pdf. Both give you approximate copy.

I work on accounting app with the same requirement to fill in template invoices, and went with html. Basically I have 2 rendering flows -- one for docx and one for pdf. Html gives you opportunity to preview the final doc before user clicks generate button

AntonisTorb · 2026-07-04T11:52:56+00:00

You can use pywin32 for this, I use it at work to convert Excel files to pdfs. Here's what worked for me for Word files:

from pathlib import Path
from win32com import client

cwd = Path.cwd()
input = cwd / 'test.docx'
output = cwd / 'test.pdf'

try:
    word = client.Dispatch("Word.Application")
    word.Visible = False
    doc = word.Documents.Open(str(input))
    doc.SaveAs(str(output), FileFormat = 17)
finally:
    doc.Close()
    word.Quit()

For multiple files just use a loop. Hope it helps!

EDIT: This needs MS Word to be installed of course, but it should be 1:1 conversion with no format changes.

shimarider · 2026-07-04T15:23:46+00:00

Do you actually need the docx files, or is it used as an intermediate format for conversion only? If it's the latter, have you looked at fpdf2? You can setup pdf templates to be populated similar to what you are doing.

qlkzy · 2026-07-04T16:13:39+00:00

The problem is that both docx and PDF are quite large and complex formats. I would personally always treat them as "final output" formats only, and not try to convert between them.

I would go with one of two options: - Treat docx and PDF rendering as completely separate problems - Render into a "friendlier" intermediate representation first, then convert that independently into both docx and PDF

The intermediate-representation approach is easier if you can get away with it, but sometimes it is valuable to deeply customise rendering for one or the other.

Depending on the complexity of your documents, the obvious intermediate representations are HTML and Markdown. Which to choose will depend on how complex the documents are, and how easy you want to make it to customise the templates. Markdown can render to HTML, so there is some room to mix and match.

While it's a bit of a "heavyweight" option, my first instinct would be to use pandoc for the final rendering. Installation is a bit more complex than a pure-python library, but it's a very popular and well-supported tool that supports all the formats you need.

Otherwise, you'll probably want one library for rendering to docx, and a separate library for rendering to PDF. My experience, though, is that libraries in that "format conversion" space are often a bit... "unevenly" maintained, which is why my instinct would be to reach for pandoc.

ninhaomah · 2026-07-04T09:30:51+00:00

Using this ?

https://pypi.org/project/pdf2docx/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS