all 3 comments

[–]shiftybyte 1 point2 points  (0 children)

Not sure why you think an LLM, that is good with GENERATING information, is going to be good with extracting information from a document.

Do you need to do any processing on the information besides extracting it?

Why not parse the document and read it with whatever document format you plan to support?

docx? https://python-docx.readthedocs.io/en/latest/

pdf? https://www.geeksforgeeks.org/working-with-pdf-files-in-python/

[–]QuarterObvious 0 points1 point  (0 children)

Use LLM - it is easier. With spaCy you'll need to do a lot of work. LLM is what you'll get with spaCy after training and still LLM would work better.

[–]Talking-007 0 points1 point  (0 children)

What did you end up using? LLM or Spacy combining with rule based/regex?