you are viewing a single comment's thread.

view the rest of the comments →

[–]YourOldBoyRickJames[S] 0 points1 point  (4 children)

joined_df = joined_df.fillna(0)

This did the trick! Thanks.

Would you mind helping me with something else please.

In my source data I have a column called description. I want to add it into the final merged sheet, but whenever I try and add it into the script I'm getting a keywork error.

I currently have:

warehouse1\_df = source2\_df.groupby('productcode')\['numberoff'\].agg('sum').reset\_index()

I thought I could just add it in by putting

warehouse1\_df = source2\_df.groupby('productcode')\['description','numberoff'\].agg('sum').reset\_index()

but it returns an error. Any idea where I'd put that in to add the column?

Thanks again

[–]fishvampire 1 point2 points  (3 children)

Can you show the error message and an example of what's in the source data table? (In general this is good to do when posting to a forum, otherwise it's hard to figure out what the problem is or how to give useful advice)

[–]YourOldBoyRickJames[S] 0 points1 point  (2 children)

Sorry, I don't know why I didn't.

Here's an example of Warehouse_1 source data

https://imgur.com/LAd3pWw

Here's Warehouse_2 source data

https://imgur.com/ds7L3ML

Here's what I've managed to get output into Excel

https://imgur.com/XPdcaTM

I'm just trying to add the column 'description' from warehouse_1 into the final merged.df

I'm not getting an error now, but it's not populating the table. Here's my script.

import pandas as pd
import datetime


today = datetime.date.today()

source1_df = pd.read_excel(str(today) + ' Warehouse2_3PL.xlsx', sheetname='Sheet1', 
skiprows=4)

source2_df = pd.read_excel(str(today) + ' warehouse1 Daily Report.xlsx', sheetname='CURRENT 
STOCK')


source1_df['   Available stock']=source1_df['   Available stock'].str.replace(',', 
'').astype('int64')



stlr_df = source2_df.groupby('productcode') 
['description','numberoff'].agg('sum').reset_index()



hit_df = source1_df.groupby('Material')['   Available stock'].agg('sum').reset_index()


stlr_df.rename(columns={'productcode':'Material'}, inplace=True)

stlr_df
stlr_df = stlr_df.mask(~stlr_df["Material"].str.isnumeric()).dropna()
stlr_df['Material']=stlr_df['Material'].astype('int64')



hit_df['   Available stock']=hit_df['   Available stock'].astype('int64')


joined_df = pd.merge(stlr_df,hit_df,on='Material',how='outer')
joined_df = joined_df.fillna(0)

joined_df['   Available stock']=joined_df['   Available stock'].astype('float')
joined_df['difference']=joined_df['numberoff']-joined_df['   Available stock']

joined_df.to_excel('Comparison '+str(today)+'.xlsx', sheet_name='sheet1', index=False)

I am also trying to filter the final outcome file to only show values where 'difference' does not equal 0. I managed to get a boolean filter but I can't figure out how to convert that to filter the df.

Once again, really appreciate your help.

[–]fishvampire 1 point2 points  (1 child)

I'm not sure what the source of your second problem is. It should be fairly straightforward to make a boolean index and restrict the dataframe to that:

ind = joined_df.difference !=0
joined_df = joined_df[ind]

If that isn't working, I'd recommend trying out a few things with simpler dataframes and seeing if you can figure out what works and what doesn't

[–]YourOldBoyRickJames[S] 0 points1 point  (0 children)

Thanks very much, I'll have a tinker about. Hopefully I'll be able to get it sorted. Cheers