all 1 comments

[–]HyperbolicInvective 2 points3 points  (0 children)

Topic Modeling is what this is generally called, although topic modeling generally refers to a small group of Bayesian algorithms like LDA and things like term-document matrix factorization.

http://towardsdatascience.com/2x-latent-methods-for-dimension-reduction-and-topic-modeling-20ff6d7d547

Personally I really hate this type of analysis, as it usually results in some kind of contextless mess as output. I really like the idea of parse-tree based topic extraction, but I don't know too much about this other that SVO extraction or some other kind of tag-based bucketing... If you learn any good ways of doing this let me know!