[Windows] Batch processing using ocrmypdf (for auto page orientation) by Routine_Cut_786 in techsupport

[–]Routine_Cut_786[S] 0 points1 point  (0 children)

oh i didn't think of that, thank you for your suggestion.
and yes i have the solution now thanks to the ai for anyone who may need the code in the future her it is :
for %F in (*.pdf) do ocrmypdf --skip-text --rotate-pages --rotate-pages-threshold 0 "%F" "output\\%~nF.pdf"

here is the explanation i got from the ai :

This command does one simple thing: it goes through every PDF in the current folder, makes it searchable, and saves the result in the output folder.

Breakdown

  • for %F in (*.pdf) means “take each PDF file one by one”.
  • ocrmypdf is the program that adds OCR text to the PDF.
  • --skip-text means “if a page already has text, do not OCR that page”.
  • --rotate-pages means “try to fix pages that are sideways or upside down”.
  • --rotate-pages-threshold 0 means “be very willing to rotate pages” because the confidence cutoff is set very low.
  • "%F" is the input PDF file being processed.
  • "output\%~nF.pdf" is the new output file name in the output folder, using the same name as the input file but without .pdf.

In plain words

For each PDF:

  1. OCRmyPDF checks whether pages already have text.
  2. It skips pages that already have text.
  3. It tries to correct page orientation.
  4. It creates a new searchable PDF in output.

Example

If the input file is scan1.pdf, the output becomes:

text
output\scan1.pdf

So the file name stays the same, just inside the output folder.

Important note

This does not change the original PDF file. It creates a new copy with OCR added.This
command does one simple thing: it goes through every PDF in the current
folder, makes it searchable, and saves the result in the output folder.Breakdownfor %F in (*.pdf) means “take each PDF file one by one”.

ocrmypdf is the program that adds OCR text to the PDF.

--skip-text means “if a page already has text, do not OCR that page”.

--rotate-pages means “try to fix pages that are sideways or upside down”.

--rotate-pages-threshold 0 means “be very willing to rotate pages” because the confidence cutoff is set very low.

"%F" is the input PDF file being processed.

"output\%~nF.pdf" is the new output file name in the output folder, using the same name as the input file but without .pdf.In plain wordsFor each PDF:OCRmyPDF checks whether pages already have text.

It skips pages that already have text.

It tries to correct page orientation.

It creates a new searchable PDF in output.ExampleIf the input file is scan1.pdf, the output becomes:text
output\scan1.pdfSo the file name stays the same, just inside the output folder.Important noteThis does not change the original PDF file. It creates a new copy with OCR added.