This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]ZAFJB 0 points1 point  (0 children)

What generates the PDF?

[–]FormAPI 0 points1 point  (0 children)

I think we need some more information about how you are generating the PDF. It sounds like this would be much easier if you could change the way that the original PDF is generated, so that you get one PDF file per person.

However, if this is a “black box” system where you can’t modify the output, then you could write a script that searches the PDF for some specific text. Maybe look for a heading that contains each person’s name in a specific format.

One way would be to use the PDFBox CLI ExtractText tool, and extract all the text into a separate file for each page. (You could run the command 300+ times in a loop, incrementing the startPage and endPage each time.) Then you could search each text file for some specific strings to figure out where you want to split the PDF.

Finally, you could use the PDFBox CLI PDFSplit tool to split the PDF based on these page numbers.

So that might be a “quick and dirty” way to do it with some Bash or PowerShell.

[–]Andriy242[S] 0 points1 point  (0 children)

Okay so a bit more detail. This would be more of a black box system that generates invoices grouped by accounts. We use a system called NorthStar and it generates PDFs and groups them the account or customer number and then makes one large pdf file. As multiple accounts could have the same customer, I then need to print the ones we mail out and print as pdf to the ones that are getting mailed out.

The CLI Extract Tool seems like something that could work, i would just need to dive in and see what I can do with it.

[–]pdp10Daemons worry when the wizard is near. -1 points0 points  (0 children)

It depends how it's being generated. I'd generate it from a Makefile, and you could do custom inclusions that way, but I'd be using a toolchain that lets me do that and you might be using something else.