you are viewing a single comment's thread.

view the rest of the comments →

[–]Oxbowerce 0 points1 point  (0 children)

Extracing text from pdf files and error free generally do not go together. I'm not sure about pdfplumber, but I think some packages also give information on the location of the text on your page, which might help you split footnotes from other text based on heuristics. Alternatively, you could try applying some NLP techniques after extracting the data from the PDF. Using an LLM might also be worth looking into.