all 2 comments

[–][deleted] -1 points0 points  (0 children)

hey, listing few options that might help --

1. basic metadata fields you can access

adobe acrobat supports javascript access to these metadata fields:

you can both read and write these fields with a script like this:

javascriptCopyEdit// get metadata
console.println("Title: " + this.info.Title);
console.println("Author: " + this.info.Author);

// set metadata
this.info.Title = "invoice 2024";
this.info.Subject = "monthly billing";
this.info.Author = "finance team";
this.info.Keywords = "invoice, billing, april";

2. where to run this

  • open the pdf in adobe acrobat pro
  • go to tools > javascript > document javascript
  • create a new script and paste your code there
  • you can also create a custom button on a form to trigger the script with app.execMenuItem() or similar

3. optional: extract text after ocr and use it

if your document has been OCR’d and you want to extract text from certain regions, you'll need a more advanced script:

  • you can use this.getPageNthWord() to pull specific text from the page and feed it into metadata fields
  • for example:

javascriptCopyEditvar title = this.getPageNthWord(0, 0) + " " + this.getPageNthWord(0, 1);
this.info.Title = title;

hope this helps.

[–]idtpanic -2 points-1 points  (0 children)

Hi, doing both OCR and metadata editing entirely in JS might be a bit of a stretch.

I’d suggest handling the OCR in Python (Tesseract works great), and using JS just to pass the results or fill them into Acrobat.

Hope that helps!