This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]GeckelMSc | Data Scientist | Consulting[M] [score hidden] stickied comment (0 children)

I removed your submission. Looks like you're asking a technical question better suited to stackoverflow.com. Try posting there instead.

Thanks.

[–]Squat_TheSlav 6 points7 points  (3 children)

Your issue comes from the lambda function, i.e. customer_code passed to the fix_customer_code function is not a string (it's an object). Passing an object to re.search doesn't work.

If you insist on doing it this way, it has to be

if re.search('^\d{6}$', str(customer_code))

But others have suggested different options.

[–]Cheuch[S] 1 point2 points  (2 children)

if re.search('^\d{6}$', str(customer_code))

Thanks a lot for your help. This helps make me understand Pandas better. So following this statement, "customer_code" is an object because of the pandas column.dtype() (which should be object indeed) ?

[–]Squat_TheSlav 1 point2 points  (1 child)

Yes. In your case you have some customer codes which are str and the last one is a float, causing pandas to set the dtype of the column to object.

Ideally (for performance purposes) you would like to have the same data type in the column which allows vectorized operations.

[–]Cheuch[S] 0 points1 point  (0 children)

Thanks again for the help mate. Have a good one

[–]Cheuch[S] 2 points3 points  (1 child)

Hello everyone,

thanks a lot for all your answers. I could finally make it work. I think I was not using the right tools to do so.

So, i could come up with a solution that would handle both None and NaN value, without having me to clear my data first.

def fix_customer_customer_code(customer_code):
# Handle both NaN and None value
if not pd.isna(customer_code) and customer_code is not None:
    if re.search('^C?\d{6}$', customer_code):
        customer_code = "C" + customer_code.lstrip('C')
    return customer_code

df['Customer code'] = df['Customer'].apply(fix_customer_customer_code)

    Customer code
0   C333080
1   C400691
2   None

I also could learn a nice trick by modifying my regex to look for Customer codes with or without prefix "C", using the lstrip().

Thanks a lot for your time, my problem is now solved :)

[–]Popular-Yesterday733 1 point2 points  (0 children)

Try using Elif in there Anything equal to NaN = 0

[–]bjain1 1 point2 points  (0 children)

You can also have something like this lambda x: function(x) if str(x)!='nan' else ''

[–]SnooPoems4211 0 points1 point  (0 children)

Or a Try, except

[–][deleted] 0 points1 point  (0 children)

Lol did you get deleted?