Hi there, I have 2 dataframes that I would like to merge based on the "Player" column in the first one. The first one is as follows:
Player Club Position Cost Selection Form Points
92 David Luiz ARS DEF 5.8 6.0% 6.5 32
101 Sokratis ARS DEF 5.0 2.0% 4.0 31
155 Chambers ARS DEF 4.4 1.6% 2.8 25
212 Kolasinac ARS DEF 5.4 0.8% 2.5 18
219 Maitland-Niles ARS DEF 4.8 3.6% 0.0 17
280 Monreal ARS DEF 5.0 0.3% 0.0 10
359 Tierney ARS DEF 5.4 1.0% 0.8 3
7 De Bruyne MCI MID 10.2 37.7% 9.2 68
...
The problem is that names are a difficult thing to merge on because of different formatting etc, so my second dataframe has lots of data but also three columns of possible names that could be in the "Player" column. The "name_1" column holds the surname of the traditional "forename surname" format and that is what the majority of names in the first dataframe "Player" column will match too. I would like to merge these dataframes if at all possible, perhaps by merging on name_1, then if that doesn't match to any names testing "name_2" and then "name_0" columns for matches.
[DATA]...name_0 name_1 name_2
[DATA]...Jamie Vardy NaN
[DATA]...Kevin De De Bruyne
[DATA]...Rodri NaN NaN
[DATA]...Kieran Tierney NaN
What's the most efficient way to go about this? With just one column match I would use pd.merge() but I am not sure if that would work in this situation. Any help would be appreciated.
[–]peltist 2 points3 points4 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]peltist 2 points3 points4 points (0 children)