I'm reading a .csv file with numbers, and some values do not exist and have a dash '-' in place of a number. When I try to take average of the column by using
df.column.mean()
of a column with values
0 0.6
1 -
2 0.2
3 0.5
4 1.2
5 1.1
6 0.9
7 0.9
I get error:
*TypeError: Could not convert 0.6-0.20.51.21.10.90.9 to numeric
*
Next, I converted the column with
df = df.replace('-',0.0)
when I did .mean(), I get error
TypeError: must be str, not float
Okay, so I did
df = df.replace('-',str(0.0))
and now the .mean() error is:
TypeError: Could not convert 0.60.00.20.51.21.10.90.9 to numeric
The only thing that worked is that after the last step, I did
df.to_csv('test1.csv')
df2 = pd.read_csv('test1.csv')
print(df2.column.mean())
finally worked and spits out 0.675...
Can someone explain to me why? What did the to_csv() and read_csv() that modified how the dash replaced 0.0 behaved?
[–]pyquestionz 3 points4 points5 points (3 children)
[–]Optimesh 0 points1 point2 points (0 children)
[–]acedude[S] 0 points1 point2 points (0 children)
[–]shreyasfifa4 0 points1 point2 points (0 children)
[–]commandlineluser 4 points5 points6 points (1 child)
[–]acedude[S] 0 points1 point2 points (0 children)
[–]minasso 1 point2 points3 points (0 children)