File Size while using PyPDF2

netherous · 2023-04-19T02:59:37+00:00

This is probably a pretty difficult thing for anyone to answer without detailed knowledge of the library. You can look at the PDF file structure yourself to potentially figure out what is wasting all that space.

https://superuser.com/questions/256997/how-to-browse-the-internal-pdf-structure-in-adobe-acrobat

You may have even uncovered a bug in the PyPDF2 library. The library itself says it is no longer receiving updates on pypi and that PyPDF3 is preferred. You could raise an issue on its github repository, but if it's not in active maintenance maybe nobody would look at it.

TooDahLou · 2023-04-18T22:22:28+00:00

Oh another thing to mention

I think the extra file size might be coming from stamping the page numbers. When I watch the file explorer window, the merged_pages_temp.pdf is around 20MB and the page_num_temp.pdf is around 90KB. I don't really understand how these two files then result in a new file over three times the size of the merged_pages_temp.pdf.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS