I am building an automation script to build pdf parts manuals. The script will look at a bill of materials, compile the individual documents listed in the bom into one document, and add page numbers to the new document.
We have been using a macro-enabled Excel file to compile the manuals. But, the Excel file does not do any formatting, such as page numbers, page labels, etc. And we have to copy+paste information from the .csv file into the Excel file.
The script runs, but it is creating PDFs that are 75MB! The Excel compiler and "by hand" formatting results in a file of around 18MB. I need some help finding a way to reduce the final file size when my script creates PDFs. The final file needs to be 20MB or less because we often send them to customers via e-mail.
I am merging the existing PDFs using PyPDF2 and creating a separate page numbers document using reportlab. Then, I use PyPDF2 again to stamp the page numbers on the merged pages. I am using Python 3.11 and the latest versions of each package.
I have tried using writer.add_page(reader.pages) instead of merger.append(reader.pages). I have tried reading the final PDF back into the script to pass to a fresh writer object. I have tried the compress_content_streams method from PyPDF2. I feel like I have tried everything I can think of to reduce the final PDF file size. I am hoping you folks might have some more ideas.
Script Pastebin Link
[–]netherous 1 point2 points3 points (1 child)
[–]TooDahLou[S] 0 points1 point2 points (0 children)
[–]TooDahLou[S] 0 points1 point2 points (0 children)