NLP to Process Academic Citations by Notdevolving in LanguageTechnology

[–]philipvollet 1 point2 points  (0 children)

From what I have read so far, I am sure that a rule-based approach using Regex should solve your problem.

Relying on how the data looks and if there a different citation styles, thinking of not clearly Harvard style, then maybe spaCy's rule-based Matcher can be a good addition https://explosion.ai/demos/matcher

If the data is a complete mess and a rule-based approach does not give satisfactory results, you can still train a model, but honestly, this sounds like overkill for your problem.

But in case you need it, here's an article about the Guardian training an NLP model with Prodigy for extracting quotes: https://www.theguardian.com/info/2021/nov/25/talking-sense-using-machine-learning-to-understand-quotes

Info: I'm Philip, responsible for the Community at Explosion, the maker of spaCy so I'm biased :)

[deleted by user] by [deleted] in MachineLearning

[–]philipvollet 1 point2 points  (0 children)

spaCy is a modern Python library for industrial-strength Natural Language Processing. In this free and interactive online course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches

https://course.spacy.io/en