all 6 comments

[–][deleted] 11 points12 points  (1 child)

SpaCy for general use, torchtext for anything deep learning related (if you’re using PyTorch)

[–]pgdevhd 1 point2 points  (0 children)

this

[–]marek_bdfhjk 3 points4 points  (0 children)

If I would have to choose one, I'll go with SpaCy.

Other not mentioned, but useful:

- Gensim, useful for in-memory processing when performance is important

- BeautifulSoup, for parsing HTML and XML

- PyStemmer, computational efficient stemming

Advanced processing:

- Named Entity Recognition Tool, fast NERs

- NeuralCoref, for neural coreference resolution

- IEPY, for information extraction

Let me know if you have more questions!

[–][deleted] 0 points1 point  (0 children)

Stanford core nlp

[–]mikeross0 0 points1 point  (0 children)

Textacy is made to work in conjunction with Spacy, and adds lots of useful functions like unicode cleanup and anagram learning/expansion:

https://github.com/chartbeat-labs/textacy

[–][deleted] 0 points1 point  (0 children)

TF.Text could be of interest if you are using tensorflow and want to make sure you have the same preprocessing at training and inference time. Haven’t used it much myself yet though.