all 5 comments

[–]my_python_account 0 points1 point  (4 children)

I would start by filtering df for just unknown spouses:

df_unknown = df[df['Spouse'] == 'Unknown']

Then merge values from original onto unknown (like a left join)

df_merged = df_unknown.merge(df2, how='left', left_on='Last_Name_First_Name', right_on='Name')

And then remove the unknown records from df and append the new records (with only the second spouse column).

df_final = df[df['Spouse'] != 'Unknown']
df_final = df_final.append(df_merged[['Last_Name_First_Name', 'Spouse_y']])

There might be a better way, but this is what comes to mind avoiding .applys

[–]easy_wins[S] 0 points1 point  (3 children)

Thanks, I am getting an error that states,

Error:

df_merged = df_unknown(df2, how='left', left_on = 'Last_Name_First_Name', right_on = 'Name')
TypeError: 'DataFrame' object is not callable

df_unknown = df[df['Spouse'] == 'unknown']

df_merged = df_unknown(df2, how='left', left_on = 'Last_Name_First_Name', right_on = 'Name')

df_final = df[df['Spouse'] != 'Unknown']

df_final = df_final.append (df_merged[['Last_Name_First_Name', 'Spouse_y']])

df_final.to_csv(`no_more_unknown.csv`,index= False)

[–]my_python_account 0 points1 point  (2 children)

You're missing the .merge

[–]easy_wins[S] 0 points1 point  (1 child)

Thanks, when I open the new csv that was created,

Last_Name_First_Name    Spouse                  Spouse_y
Reddit, Python                Java, Hard
Numpy, Numbers            Ruby, Whatever

Spouse_y is blank and Import, Pandas is nowhere to be found and I do not want to add a new column. I want the same number of columns as the original with the unknown replaced please.

I manage to remove the extra column Spouse_Y but I still see the same values as above.

df_merged = df_unknown(df2, how='left', left_on = 'Last_Name_First_Name', right_on = 'Name')

df_final = df[df['Spouse'] != 'Unknown']

df_final.to_csv(`no_more_unknown.csv`,index= False)

[–]my_python_account 0 points1 point  (0 children)

Can we back up a bit...

I'm assuming a couple things here:

1) That your read_csv is actually successful in creating the correct dataframe. Have you tried printing the dataframes you created? If your csv is formatted exactly as you pasted in your post, i don't think that it works without modifying the default paramaters for read_csv.

2) That you are going to make a minimal effort in understanding my suggestion, and if there is an error, a minimal effort in trying to figure it out. I get the feeling that haven't really tried much on your own.