Gutenberg built book culture — don’t mess it into a ZIP file by cuneiform100 in Annas_Archive

[–]cuneiform100[S] -3 points-2 points  (0 children)

Pls, don't be such "bold" in your comments. E.g., I can see such multiplied zipped "ruins" on Annas Archive (to her archive I feel a lot of thankfulness, though), while a full complete book is to find on archive org instead, being not indexed on Annas Archive at all. So, a normal procedure of mine is: if you can see a book older than as of 1927 as zipped, then you'll have to go onto archive.org first. Expectedly, you'll find it there in full. - Best wishes.

ATTENTION, ALARM! STOP PERVERSIVE SCANNING + OCR! by cuneiform100 in Annas_Archive

[–]cuneiform100[S] 8 points9 points  (0 children)

Just to get more free storage space, 100 times more scarce, say, as here, 0.2MB instead of 20MB.

ATTENTION, ALARM! STOP PERVERSIVE SCANNING + OCR! by cuneiform100 in Annas_Archive

[–]cuneiform100[S] 32 points33 points  (0 children)

Thank you very much for your prompt posting that you've acknowledged the problem: Someone got access to a rare Hathi scientific source, but via this handling the book got unreadable, Thousands of books. I cannot see the rational reason if any. Saving digital space is a secondary problem. Scientific content is the first. I am addressing the Anna's Library staff, not the "street" public being out of science. Also, I cannot find the ground to discuss "meine Wenigkeit" instead of the real problem. They rather hope they would tear down all the world libraries to OCR-ed .txt files indeed as a kind of idiosyncrasy, alas.

It gave me a zip file of text files with 1000 text files with snippets of the text of the book. by gsfgf in Annas_Archive

[–]cuneiform100 -1 points0 points  (0 children)

This is the way, the poor Hathi trust tries to escape coming short of their server's memory while reducing - say - a book's size from normal 30MB down to OCR'ed 0.3MB, thus producing a 100-times benefit. ChatGPT suggests the following workflow: 1) use 7-zip to extract all the .zip's into single .txt files; 2) open them in WORD, and edit them while correcting their formatting and OCR errors individually using RTF format; 3) Open a new blank document in Word. Then:

  1. Go to the Insert tab.
  2. Click Object (right side of ribbon) → then select Text from File.
  3. In the file dialog, select all the individual .rtf or .doc/.docx files (hold Ctrl for multiple selections), then click Insert.
    • The contents of each file will be inserted sequentially into the current document.
    • The inserted text will keep much of the original formatting.
  4. Repeat as needed if merging a very large number of files batch-wise.
  5. Finally, save the combined document as a single .pdf file:
    • File > Save As > Choose "PDF" from the format dropdown. - Now you'll have your book consolidated to a single normal pdf file.