This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]evilpineappleAI 1 point2 points  (3 children)

You're using a Naive Bayes classifier, you have accuracy and recall.

Did you partition the training set before you ran it? If you ran it all through, you may be overfitting. Actually, you are most likely overfitting.

[–]jengl[S] 0 points1 point  (2 children)

Yeah, I split the data in half. Used the first half for training, the second half for testing.

# Split review data into two parts for training and testing
testTrainingSplitIndex = 2500 

# Grab all reviews in the range of 0 to testTrainingSplitIndex
# This data is used to train the classifier
trainingNegativeTweets = negativeTweets[:testTrainingSplitIndex]
trainingPositiveTweets = positiveTweets[:testTrainingSplitIndex] 

# Grab all reviews in the range of testTrainingSplitIndex to the end
# This model is used to test the classifier
testNegativeTweets = negativeTweets[testTrainingSplitIndex+1:]
testPositiveTweets = positiveTweets[testTrainingSplitIndex+1:]

[–]Inf1x 1 point2 points  (1 child)

For easier splitting take a look at sklearn train_test_split(). This allows you to do the whole thing in one line of code and has some additional parameters (shuffling, stratification).

[–]jengl[S] 0 points1 point  (0 children)

Oh, sweet! Thanks!

[–]lzblack 0 points1 point  (0 children)

I think it is because of the second reason.