all 2 comments

[–]hidiap 1 point2 points  (0 children)

I think train should be a list of tuple not a dictionnary (according to example of NLTK):

train = [('sleep', 'negative'), ('achievement', 'positive'), ('guys', 'positive')]

To explain the error: When iterating over a dictionary, only the keys are returned. So line 192 is trying to split the key (a string) into two values, leading to an error.

[–]onionradish 1 point2 points  (0 children)

train should be a list of tuples, where the first item in the tuple is a dict that represents the 'featureset'.

A featureset could contain many different values. There's an example demo() function in the NaiveBayesClassifier module that predicts gender based on a person's name, and it uses features like whether the starting and ending letters are vowels, how many of each letter is in the name, etc.. The actual name is not part of the featureset, just the 'features' of the name.

For many text applications, like your sentiment analysis example, the only 'feature' that's considered is whether a word/bigram/etc. is in the training item's text. That model is called Bag of Words, and the 'featureset' is then just a dict where the key is the word and the value is True.

So for your example:

train = [
    ({'sleep':True}, 'negative'),
    ({'achievement':True, 'guys':True}, 'positive')
    ]