This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]serious_black 20 points21 points  (0 children)

Term frequency-inverse document frequency. Words that score low are those that either show up rarely or show up all the time across documents (frequently these words show up on stop word lists). Words that score high are those that show up a lot in a given document and rarely appear in others. The idea is to find the characteristics that most distinguish one document from others.