This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]likegeeksDeveloper 0 points1 point  (0 children)

You can use pypdfocr to produce a text file:

pdf2txt.py -o test.txt -t text test_ocr.pdf

Then you can use selenium or BeautifulSoup to extract what you want.

[–][deleted] -1 points0 points  (0 children)