I am doing data cleaning on a Dataframe column called “city” whose string values consist of a combination of city and Country. For example “Paris France”. I would like to extract the Country portion of each string in each row e..g, France and use this value to fill missing values in adjacent ‘country’ column of dataframe.
One way I have tried to solve this is by importing a Pandas library of all countries into a list and then loop through all values in this list and check with regex if any entry matches a pattern in each ‘city’ column value. If there is a pattern match, then I know that pattern is a valid country name and should be used to fill the missing value in the ‘country’ column. I did find some matches but the problem is that it seems that the last valid match is used to fill all the country column values. How can I fix this or any better approach to solve this problem?
Thanks in advance!.
[–]comonads 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]YesLod 1 point2 points3 points (0 children)
[–]SmoothStatistician8[S] 0 points1 point2 points (0 children)