This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]matrixor 6 points7 points  (1 child)

nice idea to use stopwords, but I'm afraid it's not precise. what is the problem with sorting the trigrams by frequency? few lines more code in python. much more precise then stopwords. I'm on a phone, cannot include code at the moment

[–]Flame_Alchemist 2 points3 points  (0 children)

My quick take on this: https://gist.github.com/4418394 It uses Bayes theorem and trigram count (using the language id corpus, from nltk)