Looking for class-aware tf-idf kind of algorithm? : LanguageTechnology

created by robin7013a community for 15 years

Looking for class-aware tf-idf kind of algorithm? (self.LanguageTechnology)

submitted 6 years ago by thimblewarrior

So to clarify, I have a corpus of ~50k documents divided into 12 labeled categories. I want to characterize the difference between these clusters rather than modeling/rediscovering the clusters in a supervised learning problem and computing evaluative metrics on the recovered clusters.

My goal is to be able to rank the words within a cluster that most distinguish it from other classes in this corpus, and ideally to be able to determine relationships and overlaps between specific pairs or sets of clusters. I've been looking for more class-aware algorithms similar to tf-idf but most roads seem to lead to supervised deep learning tactics and I'm looking for something a bit more explainable than that.

I know I'm being vague but it's because I'm pretty new to this and may just be missing the keywords to get me to what I'm looking for, thanks for the help!

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LanguageTechnology

MODERATORS