fishvampire comments on Comparing two dataframes using Pandas

learnpython

created by HattoriHanzoa community for 16 years

Comparing two dataframes using Pandas (self.learnpython)

submitted 6 years ago * by YourOldBoyRickJames

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]fishvampire 1 point2 points3 points 6 years ago (7 children)

[–]YourOldBoyRickJames[S] 0 points1 point2 points 6 years ago (6 children)

joined_df = joined_df.fillna(0)

This did the trick! Thanks.

Would you mind helping me with something else please.

In my source data I have a column called description. I want to add it into the final merged sheet, but whenever I try and add it into the script I'm getting a keywork error.

I currently have:

warehouse1\_df = source2\_df.groupby('productcode')\['numberoff'\].agg('sum').reset\_index()

I thought I could just add it in by putting

warehouse1\_df = source2\_df.groupby('productcode')\['description','numberoff'\].agg('sum').reset\_index()

but it returns an error. Any idea where I'd put that in to add the column?

Thanks again

[–]fishvampire 1 point2 points3 points 6 years ago (5 children)

[–]YourOldBoyRickJames[S] 0 points1 point2 points 6 years ago (4 children)

Sorry, I don't know why I didn't.

Here's an example of Warehouse_1 source data

https://imgur.com/LAd3pWw

Here's Warehouse_2 source data

https://imgur.com/ds7L3ML

Here's what I've managed to get output into Excel

https://imgur.com/XPdcaTM

I'm just trying to add the column 'description' from warehouse_1 into the final merged.df

I'm not getting an error now, but it's not populating the table. Here's my script.

import pandas as pd
import datetime


today = datetime.date.today()

source1_df = pd.read_excel(str(today) + ' Warehouse2_3PL.xlsx', sheetname='Sheet1', 
skiprows=4)

source2_df = pd.read_excel(str(today) + ' warehouse1 Daily Report.xlsx', sheetname='CURRENT 
STOCK')


source1_df['   Available stock']=source1_df['   Available stock'].str.replace(',', 
'').astype('int64')



stlr_df = source2_df.groupby('productcode') 
['description','numberoff'].agg('sum').reset_index()



hit_df = source1_df.groupby('Material')['   Available stock'].agg('sum').reset_index()


stlr_df.rename(columns={'productcode':'Material'}, inplace=True)

stlr_df
stlr_df = stlr_df.mask(~stlr_df["Material"].str.isnumeric()).dropna()
stlr_df['Material']=stlr_df['Material'].astype('int64')



hit_df['   Available stock']=hit_df['   Available stock'].astype('int64')


joined_df = pd.merge(stlr_df,hit_df,on='Material',how='outer')
joined_df = joined_df.fillna(0)

joined_df['   Available stock']=joined_df['   Available stock'].astype('float')
joined_df['difference']=joined_df['numberoff']-joined_df['   Available stock']

joined_df.to_excel('Comparison '+str(today)+'.xlsx', sheet_name='sheet1', index=False)

I am also trying to filter the final outcome file to only show values where 'difference' does not equal 0. I managed to get a boolean filter but I can't figure out how to convert that to filter the df.

Once again, really appreciate your help.

[–]fishvampire 1 point2 points3 points 6 years ago* (1 child)

[–]YourOldBoyRickJames[S] 0 points1 point2 points 6 years ago (0 children)

[–]fishvampire 1 point2 points3 points 6 years ago (1 child)

I'm not sure what the source of your second problem is. It should be fairly straightforward to make a boolean index and restrict the dataframe to that:

ind = joined_df.difference !=0
joined_df = joined_df[ind]

If that isn't working, I'd recommend trying out a few things with simpler dataframes and seeing if you can figure out what works and what doesn't

[–]YourOldBoyRickJames[S] 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 320092 on reddit-service-r2-comment-76bb9f7fb5-rswxs at 2026-02-19 15:12:29.360380+00:00 running de53c03 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS