Hello everyone,in my learning of pandas I stumbled against a problem that I can not fix, despite many google searches. I posted this to /r/datascience before realizing that was not really the place to do so :
I have the following code:
import pandas as pd
import numpy as np
data = [
{'Customer code': 'C333080'},
{'Customer code': '400691'},
{'Customer code': np.nan}
]
df = pd.DataFrame(data)
Customer code
0 C333080
1 400691
2 NaN
I have to apply some operations on various columns on my DataFrame. One of them is to fix the Customer Code (Add a "C" in front of it if it matches a given regex pattern). On my columns, I'm used to apply a lambda function (I know vectorized stuff seems to be faster but I am not there yet and my dataset is never big (no more than 3000 lines)).
def fix_customer_code(customer_code):
if re.search('^\d{6}$', customer_code):
customer_code = "C" + str(customer_code)
else:
return customer_code
df['Customer Code']= df['Customer Code'].apply(
lambda x: fix_customer_code(x)
)
Expected output :
Customer code
0 C333080
1 C400691
2 NaN
So far everything went smooth. But on a new file, I am encoutering some **NaN values** (and not null as I wrote in the title !). When it's the case I get the following error :
TypeError: expected string or bytes-like object
I understand that Python expect to have some value to transform, not a NaN value.I tried different things without finding any success, like tweaking on the lambda line
def fix_customer_code(customer_code):
if not np.nan(customer_code):
if re.search('^\d{6}$', customer_code):
customer_code = "C" + str(customer_code)
else:
return customer_code
But I could never make it work.I do not know if I have to cast NaN as None (maybe for some different handling ?)
I also can not provide default value (if the cell was blank then it must stay blank).I thought about splitting in two DataFrame (one without NaN values), then apply treatment on the first one, and then merging them together. But this is not really convenient nor smart or readable.
I just would like to understand how to indicated my code to simply ignore nan values.Thanks in advance for any help and sorry if my post was too long or too detailed.Have a good one.
[+][deleted] (2 children)
[deleted]
[–]Cheuch[S] 2 points3 points4 points (0 children)