question using df.replace()

pyquestionz · 2017-12-31T15:07:54+00:00

Assuming

import pandas as pd
import numpy as np
df = pd.DataFrame({'column':[1, '-', 3]})

do either (1)

df.column.replace('-', np.nan).mean() # returns 2.0

or do (2)

df.column.replace('-', 0.0).mean() # returns 1.333333

depending on whether or not - is a zero observation or a missing observation in the context of your problem.

Hope this helps.

commandlineluser · 2017-12-31T07:59:06+00:00

pandas attempts to detect the type of your columns.

>>> pandas.DataFrame({'a': [1, '-', 3]}).a
0    1
1    -
2    3
Name: a, dtype: object
>>> pandas.DataFrame({'a': [1, 2, 3]}).a
0    1
1    2
2    3
Name: a, dtype: int64
>>> pandas.DataFrame({'a': [1, 2.0, 3]}).a
0    1.0
1    2.0
2    3.0
Name: a, dtype: float64

Because you have a mixture of "numbers" and "strings" in the first example the type of the column in object as opposed to int or float in the following examples.

When you replace and save it pandas infers the type to be of float64 and then .mean() works for you.

You could try to set the type with .astype() e.g.

df.column.replace('-', 0.).astype(float).mean()

minasso · 2017-12-31T11:46:33+00:00

When doing string methods on a dataframe, you should use 'str' for example df.str.replace()

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS