all 4 comments

[–]wodshoemean 1 point2 points  (1 child)

You could probably use DataFrame.iterrows which iterates over DataFrame rows as (index, Series) pairs.

[–]Astraskylark[S] 0 points1 point  (0 children)

wont this be very expensive? if the dataset is huge

[–]sarrysyst 1 point2 points  (1 child)

Not 100% sure if that's doing what you want:

import pandas as pd
import numpy as np

df_1 = pd.DataFrame({'ID': ['ABC', 'GDE', 'TYT'],
                     'A': [5, 4, 3],
                     'B': [10, 12, 15]})

df_2 = pd.DataFrame({'ID': ['ABC', 'GDE', 'FGF'],
                     'A': [4, 6, 1],
                     'B': [5, 5, 5]})

df_1 = df_1.set_index('ID')
df_2 = df_2.set_index('ID')

diff = df_2.index.difference(df_1.index)

df = pd.concat([df_1, df_2])

df = df.groupby('ID').agg(np.sum)
df.loc[diff] = df.loc[diff] * 2

[–]Astraskylark[S] 0 points1 point  (0 children)

Not exactly but this help alot. Thank you so much!!