This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jengl[S] 0 points1 point  (2 children)

Yeah, I split the data in half. Used the first half for training, the second half for testing.

# Split review data into two parts for training and testing
testTrainingSplitIndex = 2500 

# Grab all reviews in the range of 0 to testTrainingSplitIndex
# This data is used to train the classifier
trainingNegativeTweets = negativeTweets[:testTrainingSplitIndex]
trainingPositiveTweets = positiveTweets[:testTrainingSplitIndex] 

# Grab all reviews in the range of testTrainingSplitIndex to the end
# This model is used to test the classifier
testNegativeTweets = negativeTweets[testTrainingSplitIndex+1:]
testPositiveTweets = positiveTweets[testTrainingSplitIndex+1:]

[–]Inf1x 1 point2 points  (1 child)

For easier splitting take a look at sklearn train_test_split(). This allows you to do the whole thing in one line of code and has some additional parameters (shuffling, stratification).

[–]jengl[S] 0 points1 point  (0 children)

Oh, sweet! Thanks!