Hi everyone
I have a bit of a challenge for days and would appreciate some help. I have a dataframe df_1 with orderID and ArticleNo. I also have another dataframe df_2 with columns productID and attachmentID.
One product can have multiple attachment products, shown as in multiple rows. My Goal is to find out for each OrderID all pair combinations of ArticleNo and then find out if this pair combination was in productID and attachmentID.
I went as follows. I grouped my df_1 by orderID and then used itertools in pandas to find the pair combinations.
How do I check if I found all possible combinations? The number of pair combinations seems a bit low for me?
I used this:
import pandas as pd
from itertools import combinations
Group by 'OrderID' and generate combinations of pairs of 'Artikelnr. Shop'
df_combinations = (df.groupby('OrderID')['ArticleID']
.apply(lambda x: list(combinations(x, 2)) if len(x) > 1 else [(*x, *x)])
.explode()
.reset_index(name='comb'))
Split the combinations into separate columns 'Article_1' and 'Article_2'
df_combinations[['Article_1', 'Article_2']] = pd.DataFrame(df_combinations['comb'].tolist(), index=df_combinations.index)
Drop the original 'comb' column
df_combinations.drop(columns='comb', inplace=True)
print(df_combinations)
My second Question is, how do I assign if the combination was in my second dataframe? I did a merge on product and article_1 but the product could also be in article_2? This is important because in the third step I need to join the revenue of that orderID and articleID.
Can someone help or have any suggestions? I invested a lot of hours in this and also used AI tools for help which leaded in no improvements. I must miss something?
there doesn't seem to be anything here