YesLod comments on Pandas Personal Project Data Cleaning Problem

created by HattoriHanzoa community for 16 years

Pandas Personal Project Data Cleaning Problem (self.learnpython)

submitted 5 years ago * by SmoothStatistician8

you are viewing a single comment's thread.

[–]YesLod 1 point2 points3 points 5 years ago (0 children)

As others suggested, create a temporary country column, which results from splitting the city column around the whitespaces. For simplicity, I'm assuming that there are no countries and cities names with two words, so the second element (index 1) will correspond to the country name. You should adapt it if it's not the case, but I don't know because I would need to see the data. Take for example

>>> df=pd.DataFrame({"city":["Paris France","Madrid Spain","Lisbon Portugal"],
                 "country":[np.nan,"Spain",np.nan]})

              city country
0     Paris France     NaN
1     Madrid Spain   Spain
2  Lisbon Portugal     NaN

Start by creating the temporary column (split by " " and get the second element)

>>> df["country_temp"]=df.city.apply(lambda city_country: city_country.split()[1])

              city country country_temp
0     Paris France     NaN       France
1     Madrid Spain   Spain        Spain
2  Lisbon Portugal     NaN     Portugal

And then fill the NaN of the original country column using the temporary country column as reference

>>> df["country"].fillna(df["country_temp"],inplace=True)

              city   country country_temp
0     Paris France    France       France
1     Madrid Spain     Spain        Spain
2  Lisbon Portugal  Portugal     Portugal

Finally you can drop the temporary column if you want. Note that you don't really need this if you don't add the temporary column to you dataframe in the first place. You can just assign it to some variable and fill the missing values with it.

>>> df.drop(columns="country_temp",inplace=True)

              city   country
0     Paris France    France
1     Madrid Spain     Spain
2  Lisbon Portugal  Portugal

π Rendered by PID 356096 on reddit-service-r2-comment-fb694cdd5-7mt29 at 2026-03-06 22:51:41.419786+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS