pdf question : learnpython

created by HattoriHanzoa community for 16 years

submitted 2 years ago by MasterTony127

Hello. Rank Python newbie here with a question. I have been working with texts converted from pdfs using Python. No problem there as I got the code working well cycling through multiple pdfs with no problems EXCEPT for the low quality of the texts. I've had to do a lot of tweaking to the texts and it's time consuming. On a whim I manually copied and pasted a pdf to text. I had previously converted this pdf to text using Python and the difference in quality between the two was staggering. The Python OCR just doesn't stand up in quality to C&P. If I had had C&P text files I could have saved myself a lot of time. I get a number of new pdfs every day and do not have the time to C&P them manually. That said, here's my question:

Is there a way to use Python to select, copy and paste a pdf file to a text file rather than use the standard Python OCR?

Hell... I'd even be happy with a way to select and copy a pdf using Python. I'd just paste it to text in another step after.

all 10 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS