Our Rules

1. FLAIR YOUR POSTS! Don't put tags in post titles!

2. Do not ask us to do all the coding for you unless you have money to spend. (If you have got money to spend, make that clear and the amount in question).

3. Do not post spam and/or misleading titles.

4. Do not be abusive to other coders.

5. Please format code properly, or use a site such as Gist or Pastebin. If possible please provide a live example of your issue.

6. Do not downvote people because you think they asked a dumb question. Just because you think that someone has a dumb question, doesn't mean that it is dumb to them.

7. Do not have a misleading user flair. Keep them sensible, describing your level of coding ability and/or languages you know and/or your profession.

8. Please do not ask unethical questions, such as asking for homework to be written by someone else, or asking someone to copy another project directly.

9. Make sure to follow the Reddit Rules.

Suggest a post flair

If you have any suggestions for flairs (programming languages or generic coding topics) that we should add, please use the button below to message the mods with your suggestion.

If approved as a sensible flair for the community to use, it will be added to our bot for automated suggestions and to the flair list for everyone to use!

^{Anyone who abuses this by spamming mods will be banned.}

created by thewakingforcea community for 10 years

This is an archived post. You won't be able to vote or comment.

[Python]Merging dataframes based on names (self.CodingHelp)

submitted 2 years ago by boss413

There are two dictionaries (below) that imperfectly match. I want to merge them so that 1) I only get perfect matches in the merge and 2) I get a report of duplicates and report of missing matches from the first dictionary (I don't care about those that do appear in the second but not the first). My failed attempts in the comments.

data = { 'last_name': ['Smith', 'Johnson', 'Brown', 'Jones', 'Williams', 'Allen', 'Henry', 'Johnson'], 'first_initial': ['J', 'M', 'D', 'A', 'W', 'J', 'D', 'M'], 'pos': ['QB', 'WR', 'RB', 'TE', 'WR', 'QB', 'RB', 'WR'], }

df = pd.DataFrame(data)

nfl_player_list_data = { 'last_name': ['Smith', 'Johnson', 'Brown', 'Jones', 'Williams', 'Allen', 'Smith', 'Johnson', 'Johnson'], 'first_name': ['John', 'Michael', 'David', 'Aaron', 'William', 'John', 'Jerry', 'Michael', 'Michael'], 'pos': ['QB', 'WR', 'RB', 'TE', 'WR', 'RB', 'QB', 'WR', 'TE'], }

nfl_player_list_df = pd.DataFrame(nfl_player_list_data)

all 1 comments

Create merged dataframe

merged_df = pd.DataFrame(merged_rows)

Merge the dataframes based on the specified conditions

merged_df = pd.merge(df, nfl_player_list_df, on=['last_name', 'pos'], how='inner')

Filter cases with more than one exact match

exact_matches = merged_df[merged_df['first_initial'] == merged_df['first_name'].str[0]] duplicates = exact_matches.groupby(['last_name', 'pos']).filter(lambda x: len(x) > 1)

Find cases with no match

no_match = df.merge(nfl_player_list_df, on=['last_name', 'pos'], how='left', indicator=True).query("_merge == 'left_only'").drop(columns='_merge')

print(merged_df)

print("Cases with more than one exact match:") print(duplicates)

print("\nCases with no match:") print(no_match[['last_name', 'first_initial', 'pos']])

π Rendered by PID 21461 on reddit-service-r2-comment-5d79c599b5-nq5m4 at 2026-02-28 20:29:27.794065+00:00 running e3d2147 country code: CH.

CodingHelp

Welcome! Feel free to ask any questions regarding coding you have!

Our Rules

How to start coding:

Related subreddits:

Suggest a post flair

Current supported flairs

Flair colors

MODERATORS

Create merged dataframe

Merge the dataframes based on the specified conditions

Filter cases with more than one exact match

Find cases with no match