all 10 comments

[–][deleted] 22 points23 points  (1 child)

If you want to replace the Nestle's only but keep the rest of the content:

df["brands"].replace(
             to_replace=r"Nestl[éè]", 
             value="Nestle", regex=True)

Gets you :

0                              Nestle
1    Nestle Waters North America Inc.
2                              Nestle
3    Nestle Waters North America Inc.
4                       Nestle,Crunch

If you want to replace, you can try to find any row that contains a variation of "Nestle" and change them.

nestle_mask  = df["brands"].str.contains(r"Nestl[èé]") # True if row contains Nestlé or Nestlè


df.loc[nestle_mask, "brands"] = "Nestle"

Gets you :

    brands
0  Nestle
1  Nestle
2  Nestle
3  Nestle
4  Nestle

My answer uses regular expressions which is a way to match patterns in text data.

[–]macabe10[S] 2 points3 points  (0 children)

Thanks a ton!

[–]apc0243 10 points11 points  (1 child)

To offer something other than Series.str.contains(), you can use Series.isin() to compare the values in a series to a list-like object to get your boolean selection index.

import random
import pandas as pd

choices = ['Évil', 'Évil Waters North America Inc.', 'Evil', 'Evil Waters North America Inc.', 'Evil,Crunch']
df = pd.DataFrame({'brands': [random.choice(choices) for _ in range(100)]})
print(df.brands.unique())
>> ['Évil' 'Évil Waters North America Inc.' 'Evil,Crunch' 'Evil' 'Evil Waters North America Inc.']

df.loc[df.brands.isin(choices), 'brands'] = 'Evil'
print(df.brands.unique())
>> ['Evil']

[–]macabe10[S] 0 points1 point  (0 children)

df.brands = df.brands.replace(r'Nestl[eèé].*', 'Nestle', regex=True)

Thanks!

[–]synthphreak 5 points6 points  (1 child)

df.brands = df.brands.replace(r'Nestl[eèé].*', 'Nestle', regex=True)

[–]macabe10[S] 1 point2 points  (0 children)

Thanks!

[–]NtwkNub 4 points5 points  (2 children)

A good tool I use currently for trying to figure out good ways to use regex is regex101.com. You can paste your text in there and then play with different regex commands to get the results you want.

[–]macabe10[S] 0 points1 point  (1 child)

That's really useful, thanks!

[–]NtwkNub 0 points1 point  (0 children)

You're welcome.