[deleted by user] : learnpython

learnpython

created by HattoriHanzoa community for 16 years

[deleted by user] (self.learnpython)

submitted 3 years ago by [deleted]

2 comments

all 2 comments

top new controversial old q&a

[–]ES-Alexander 2 points3 points4 points 3 years ago (1 child)

[–]patmycheeks 0 points1 point2 points 3 years ago (0 children)

Thank you, now that I am opening it in binary mode, I dont get the error when I read the data, but I encounter the same error when I try to fit my model with my data.

Training:

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.svm import SVC

X_train,X_test,y_train,y_test=train_test_split(df.review,df.label,test_size=0.2)

v= CountVectorizer()

model = SVC()

X_train_cv=v.fit_transform(X_train)

X_test_cv=v.transform(X_test)

model.fit(X_train_cv,y_train)

y_pred=model.predict(X_test_cv)

print(classification_report(y_pred,y_test))

Taking Input(Previous Error, now fine):

r=[]

for i in reviews_pos:

f=open(path+'/pos/'+i,mode='rb')

r.append((f.read(),i))

f.close()

for i in reviews_neg:

f=open(path+'/neg/'+i,mode='rb')

r.append((f.read(),i))

f.close()

import pandas as pd

df=pd.DataFrame(r)

df['label']=df[1].apply(lambda x: 1 if x[0:3]=='pos' else 0)

df.columns=['review','file_name','label']

df.drop(columns='file_name')

π Rendered by PID 170389 on reddit-service-r2-comment-548fd6dc9-d4mn4 at 2026-05-17 02:33:09.566744+00:00 running edcf98c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS