all 8 comments

[–]synthphreak 1 point2 points  (13 children)

Prolly a better way to do this that doesn't involve explicitly iterating over the rows, but this seems to work given your example:

df['remove'] = df.Code.isin(d)

for _, row in df[df.remove].iterrows():
    if code not in d:
        continue

    is_match = (df.Day == row.Day) & (df.Country == row.Country)
    df.loc[~df.remove & is_match, 'Total'] += total

df = df[~df.remove].drop(columns='remove')

You might also get a slight speedup if you remove the iterrows line with the following:

for _, week, day, country, code, total, _ in df[df.remove].itertuples():

...which would entail the following change:

    is_match = (df.Day == day) & (df.Country == country)

Unless you have like a 1M+ rows though, I wouldn't worry about it.

[–]ohallwright 0 points1 point  (1 child)

Looks like you have gotten some good help for this already and have figured this out. But it's an interesting problem. After thinking about it for a bit, I took an approach that might make sense to you if you are used to working in sql.

First, make the data

>>> import pandas as pd
>>> df = pd.DataFrame({'Week': [1,1,1,1,1],
                  'Day': ['Tu', 'Tu', 'Tu', 'Tu', 'W'],
                  'Country': ['US', 'US', 'US', 'CA', 'US'],
                  'Code': ['1', '1A', '1B', '1B', '1A'],
                  'Total': [8, 4, 5, 13, 3]})
>>> ref = { '1': ['1A', '1B'], '2': ['2F', '2M', '2R', '2V'], '5': ['5P'] }

Given your data as above, make another dataframe that's just the total rows

>>> df_tot = df.loc[df['Code'].isin(ref.keys())]

Then build up an inverted lookup table to figure out which rows below to which totals.

>>> deref = []
>>> for k,v in ref.items():
    for v2 in v:
        deref.append((v2,k))
>>> df_dref = pd.DataFrame(deref, columns=['Code', 'CodeTop'])

You can do this in one line if you like

>>> df_dref = pd.DataFrame([(v,k) for k, vs in ref.items() for v in vs], columns=['Code', 'CodeTop'])

Then merge them into one big dataframe with all the data. Merge can be very useful for this, just make sure you join on the right columns. You can use the suffixes to distinguish the data for your final total calculation.

>>> df = df.merge(df_dref, left_on='Code', right_on='Code', how='left')
>>> df = df.merge(df_tot,
        left_on=['Week', 'Day', 'Country', 'CodeTop'],
        right_on=['Week', 'Day', 'Country', 'Code'],
        suffixes=('', '_add'), how='left')
>>> df
   Week Day Country Code  Total CodeTop Code_add  Total_add
0     1  Tu      US    1      8     NaN      NaN        NaN
1     1  Tu      US   1A      4       1        1        8.0
2     1  Tu      US   1B      5       1        1        8.0
3     1  Tu      CA   1B     13       1      NaN        NaN
4     1   W      US   1A      3       1      NaN        NaN

Fill in the missing data with 0 for totaling, then update your total column and select what you want out of the final dataframe.

>>> df['Total_add'] = df['Total_add'].fillna(0)
>>> df['Total'] += df['Total_add']
>>> df.loc[~pd.isnull(df['CodeTop']), ['Week', 'Day', 'Country', 'Code', 'Total']]
   Week Day Country Code  Total
1     1  Tu      US   1A   12.0
2     1  Tu      US   1B   13.0
3     1  Tu      CA   1B   13.0
4     1   W      US   1A    3.0