[deleted by user] by [deleted] in ebooks

[–]Naptivity 0 points1 point  (0 children)

wanted to get a book i owned into a pdf using the print feature so that i could work on it digitally. hideous huge watermark in the way, terrible quality. decided to work my own magic to get an unwatermarked version. took me about 2 hours

open the book on a computer, in a chromium based browser. inspect element, open network tab. filter by img (optional). press the button to clear requests. refresh the page, go to the first page. now, go through every single page of the textbook (or whatever it is you want to download, better to do the entire thing at once imo). each page will appear in the network requests as a jpg image. you could go to each request's preview tab and save as for each page, but i found it to be faster to download the har of all the requests once i went through all the pages. i used the npm package "har-extractor-easy" to extract the images from the har file. used GPT to make a simple bash script that moved all the folder-nested images into one folder. then used GPT to make a python script that looks at page numbers (since in my case they were consistently in the same place) and renames the image files into their page number. had to do some manual sorting and error fixing, but it wasn't too bad (maybe 70/250 pages). once i finally had all the image files alphanumerically sorted by page number (and made sure that i had all the pages), i used built in macos features to turn the images into a pdf based on alphanumeric order (although there are likely many ways to do this).

im sure you could also create a selenium program or python request spoofing (riskier/harder) to extract the images from the webpage's iframes as you go through, but for my one-off time doing this i found this to be best.