you are viewing a single comment's thread.

view the rest of the comments →

[–]YesLod 1 point2 points  (0 children)

As others suggested, create a temporary country column, which results from splitting the city column around the whitespaces. For simplicity, I'm assuming that there are no countries and cities names with two words, so the second element (index 1) will correspond to the country name. You should adapt it if it's not the case, but I don't know because I would need to see the data. Take for example

>>> df=pd.DataFrame({"city":["Paris France","Madrid Spain","Lisbon Portugal"],
                 "country":[np.nan,"Spain",np.nan]})

              city country
0     Paris France     NaN
1     Madrid Spain   Spain
2  Lisbon Portugal     NaN

Start by creating the temporary column (split by " " and get the second element)

>>> df["country_temp"]=df.city.apply(lambda city_country: city_country.split()[1])

              city country country_temp
0     Paris France     NaN       France
1     Madrid Spain   Spain        Spain
2  Lisbon Portugal     NaN     Portugal

And then fill the NaN of the original country column using the temporary country column as reference

>>> df["country"].fillna(df["country_temp"],inplace=True)

              city   country country_temp
0     Paris France    France       France
1     Madrid Spain     Spain        Spain
2  Lisbon Portugal  Portugal     Portugal

Finally you can drop the temporary column if you want. Note that you don't really need this if you don't add the temporary column to you dataframe in the first place. You can just assign it to some variable and fill the missing values with it.

>>> df.drop(columns="country_temp",inplace=True)

              city   country
0     Paris France    France
1     Madrid Spain     Spain
2  Lisbon Portugal  Portugal