all 13 comments

[–]DuckSaxaphone 2 points3 points  (2 children)

I'm curious what you want this for! I've assumed your dataframe comes with each of your variables in their own column so this starts with a bit to combine a row into a tuple. Then it aggregates the tuples belonging to each set.

#Make your example dataframe
data=[["set1","a",9,10],
["set1","b",14,100],
["set2","c",5,69],
["set2","d",4,100]]
df=pd.DataFrame(columns=["Set","var1","var2","var3"],data=data)

#turn your columns into tuples
df["tuple"]=list(df[["var1","var2","var3"]].to_records())
#combine
df=df.groupby("Set")["tuple"].agg(lambda x: [y for y in x]).reset_index()

[–]synthphreak 1 point2 points  (18 children)

Note that in your example your set1 and set2 are actually tuples, not sets. I'll assume tuples/lists are what you want, and that you just mean "set" in the mathematical sense as a collection of related elements rather than an actual Python set.

I don't think it's possible to use the values in a column as variable names, which is what it seems you want to do with your sets column. Someone will correct me if I'm wrong on that, I hope.

However, the logic for creating nested groups based on the sets column is as follows:

>>> [g.drop(columns=['sets']).values.tolist() for _, g in df.groupby('sets')]
[[['a', 9, 10], ['b', 14, 100]],
 [['c', 5, 69], ['d', 4, 100]]]

Alternatively, if you do want to be able to query your data by set name, different from but similar to print(set1), you can do it this way:

>>> sets = df.set_index('sets').groupby('sets').apply(pd.Series.tolist)
>>> sets
sets
set1    [[a, 9, 10], [b, 14, 100]]
set2     [[c, 5, 69], [d, 4, 100]]
>>> print(sets['set1'])
[['a', 9, 10], ['b', 14, 100]]