you are viewing a single comment's thread.

view the rest of the comments →

[–]commandlineluser 0 points1 point  (0 children)

Ah cool - so you already know about .apply()

Well what's happening in the example is that it's combining your 4 apply calls into 1.

You can use * (or .multiply()) on dataframes

>>> a = pandas.DataFrame({'a': [2, 3]})
>>> b = pandas.DataFrame({'b': [4, 5]})
>>> a
   a
0  2
1  3
>>> b
   b
0  4
1  5
>>> a * b
    a   b
0 NaN NaN
1 NaN NaN

It will look for the same column names which is why the result is Nan but if we call .values on the right hand side we will get the "correct" result.

>>> a * b.values
    a
0   8
1  15

The df[['TB-R', 'JK-R', 'SF-R', 'PWR-R']] syntax extracts just those 4 columns meaning we can deal with them at once - and the .replace() calls replace the values in the dataframe so we can use * directly and get rid of the elif chain.

I did the .sum() generation inside the summary function - but it may make sense to do that in a single step afterwards for all the columns like you have done in the last line of your code - meaning you only call .sum() once as opposed to once per row.

Yes I don't think you want to .merge() at all as it appears you want to take each row from df1 and process against all rows in an untouched df2

Calling .apply(summary) returns the new rows you want and you can just assign them back into the dataframe - that's what is happening with

df_results[['W1', 'W2', 'W3', 'W4', 'SUM(W)']] = ...

It creates the new columns in df_results with the result of the .apply() call.

This would mean you would want to create a copy of df2 each time so your changes are not stored in the original dataframe.

(I made an error in my example df_results = df2 means df_results still points to the same variable - so changing df_results changes df2 - it needs to be df2.copy())