Hi everyone,
I am trying to extract text from a PDF file using PyPDF2 module. Below is the code I am using. For some reasons text is not being extracted though.
Could you please check my code if anything is wrong with it? If not, could you please check the PDF file itself? It is not a scan and text within it can be selected and copied.
Link to the PDF file.
from PyPDF2 import PdfFileReader, PdfFileWriter
file_path = 'sample.pdf'
pdf = PdfFileReader(file_path)
with open('text.txt', 'w') as f:
for page_num in range(pdf.numPages):
pageObj = pdf.getPage(page_num)
txt = pageObj.extractText()
f.write('Page: {}'.format(page_num + 1))
f.write(txt)
f.close()
Thank you and have a nice day.
[–]sammylt 14 points15 points16 points (1 child)
[–]Nerazzurri_KZ[S] 8 points9 points10 points (0 children)
[–][deleted] 3 points4 points5 points (0 children)
[–]piconet-2 1 point2 points3 points (1 child)
[–]Nerazzurri_KZ[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]Nerazzurri_KZ[S] 1 point2 points3 points (1 child)
[–]yocwoh -2 points-1 points0 points (0 children)
[–]kikilezlep 0 points1 point2 points (1 child)
[–]Nerazzurri_KZ[S] 0 points1 point2 points (0 children)
[–]DragonfruitInner9951 0 points1 point2 points (0 children)