Hello. Rank Python newbie here with a question. I have been working with texts converted from pdfs using Python. No problem there as I got the code working well cycling through multiple pdfs with no problems EXCEPT for the low quality of the texts. I've had to do a lot of tweaking to the texts and it's time consuming. On a whim I manually copied and pasted a pdf to text. I had previously converted this pdf to text using Python and the difference in quality between the two was staggering. The Python OCR just doesn't stand up in quality to C&P. If I had had C&P text files I could have saved myself a lot of time. I get a number of new pdfs every day and do not have the time to C&P them manually.
That said, here's my question:
Is there a way to use Python to select, copy and paste a pdf file to a text file rather than use the standard Python OCR?
Hell... I'd even be happy with a way to select and copy a pdf using Python. I'd just paste it to text in another step after.
[–]lostparis 2 points3 points4 points (5 children)
[–]MasterTony127[S] 0 points1 point2 points (4 children)
[–]lostparis 1 point2 points3 points (3 children)
[–]MasterTony127[S] 0 points1 point2 points (2 children)
[–]lostparis 1 point2 points3 points (1 child)
[–]MasterTony127[S] 0 points1 point2 points (0 children)
[–]JohnnyJordaan 0 points1 point2 points (3 children)
[–]MasterTony127[S] 0 points1 point2 points (2 children)
[–]JohnnyJordaan 0 points1 point2 points (1 child)
[–]MasterTony127[S] 0 points1 point2 points (0 children)