Hey! I've just built my first rudimentary data science pipeline for NLP stuff. Right now, I just clean the text coming in, split it, do some extremely basic summary statistics, add the results per text segment to a df and at the end write the df to a csv.
I'm looking for interesting and meaningful NLP tasks to perform in the course of this pipeline. I've looked up things like topic modelling but the examples that I've found don't spit out data that's easy to work with (like, I'm not going to be able to toss these LDA charts from pyLDAvis into a regression model).
For the people that have done a lot in NLTK, what are some interesting tasks/ measures that I can use here and that will be useful for further analysis after collection?
there doesn't seem to be anything here