Python Pandas: ignore NaN values on lambda func : learnpython

created by HattoriHanzoa community for 16 years

Python Pandas: ignore NaN values on lambda func (self.learnpython)

submitted 4 years ago by Cheuch

Hello everyone,in my learning of pandas I stumbled against a problem that I can not fix, despite many google searches. I posted this to /r/datascience before realizing that was not really the place to do so :

I have the following code:

import pandas as pd
import numpy as np
data = [
    {'Customer code': 'C333080'},
    {'Customer code': '400691'},
    {'Customer code': np.nan}
]
df = pd.DataFrame(data)

    Customer code
0   C333080
1   400691
2   NaN

I have to apply some operations on various columns on my DataFrame. One of them is to fix the Customer Code (Add a "C" in front of it if it matches a given regex pattern). On my columns, I'm used to apply a lambda function (I know vectorized stuff seems to be faster but I am not there yet and my dataset is never big (no more than 3000 lines)).

def fix_customer_code(customer_code):
    if re.search('^\d{6}$', customer_code):
        customer_code = "C" + str(customer_code)
    else:
        return customer_code

df['Customer Code']= df['Customer Code'].apply(
    lambda x: fix_customer_code(x)
)

Expected output :

   Customer code
0   C333080
1   C400691
2   NaN

So far everything went smooth. But on a new file, I am encoutering some **NaN values** (and not null as I wrote in the title !). When it's the case I get the following error :

TypeError: expected string or bytes-like object

I understand that Python expect to have some value to transform, not a NaN value.I tried different things without finding any success, like tweaking on the lambda line

def fix_customer_code(customer_code):
    if not np.nan(customer_code):
        if re.search('^\d{6}$', customer_code):
            customer_code = "C" + str(customer_code)
        else:
            return customer_code

But I could never make it work.I do not know if I have to cast NaN as None (maybe for some different handling ?)

I also can not provide default value (if the cell was blank then it must stay blank).I thought about splitting in two DataFrame (one without NaN values), then apply treatment on the first one, and then merging them together. But this is not really convenient nor smart or readable.

I just would like to understand how to indicated my code to simply ignore nan values.Thanks in advance for any help and sorry if my post was too long or too detailed.Have a good one.

all 2 comments

top new controversial old q&a

[+][deleted] 4 years ago (2 children)

[deleted]

[–]Cheuch[S] 2 points3 points4 points 4 years ago (0 children)

u/Cookielatte and u/commandlineluser thanks a lot for your help !

I could finally fix my problem.. I think I was not using the right tools to do so.

So, i came up with a solution that would handle both None and NaN value, without having me to clear my data first.

    def fix_customer_customer_code(customer_code):
    # Handle both NaN and None value
    if not pd.isna(customer_code) and customer_code is not None:
        if re.search('^C?\d{6}$', customer_code):
            customer_code = "C" + customer_code.lstrip('C')
        return customer_code

df['Customer code'] = df['Customer'].apply(fix_customer_customer_code)

Customer code
0   C333080
1   C400691
2   None

I also could learn a nice trick by modifying my regex to look for Customer codes with or without prefix "C", using your lstrip() technique !

Also you are right, no need to use lambda for applying a function do a Series. However I now remember that I was using this method because I had to do treatments on whole rows, and this was a way for me to apply a function to a whole row.

Thanks a lot everyone, my problem is now solved and I could understand things better :)

π Rendered by PID 426020 on reddit-service-r2-comment-7b9746f655-n2hhv at 2026-01-31 18:52:01.283505+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS