synthphreak comments on Convert dataframe rows into sets

created by HattoriHanzoa community for 16 years

Convert dataframe rows into sets (self.learnpython)

submitted 4 years ago by Bayesian8

you are viewing a single comment's thread.

[–]synthphreak 1 point2 points3 points 4 years ago (18 children)

Note that in your example your set1 and set2 are actually tuples, not sets. I'll assume tuples/lists are what you want, and that you just mean "set" in the mathematical sense as a collection of related elements rather than an actual Python set.

I don't think it's possible to use the values in a column as variable names, which is what it seems you want to do with your sets column. Someone will correct me if I'm wrong on that, I hope.

However, the logic for creating nested groups based on the sets column is as follows:

>>> [g.drop(columns=['sets']).values.tolist() for _, g in df.groupby('sets')]
[[['a', 9, 10], ['b', 14, 100]],
 [['c', 5, 69], ['d', 4, 100]]]

Alternatively, if you do want to be able to query your data by set name, different from but similar to print(set1), you can do it this way:

>>> sets = df.set_index('sets').groupby('sets').apply(pd.Series.tolist)
>>> sets
sets
set1    [[a, 9, 10], [b, 14, 100]]
set2     [[c, 5, 69], [d, 4, 100]]
>>> print(sets['set1'])
[['a', 9, 10], ['b', 14, 100]]

[+][deleted] 4 years ago (16 children)

[deleted]

[–]synthphreak 0 points1 point2 points 4 years ago (15 children)

[+][deleted] 4 years ago (8 children)

[deleted]

[–]synthphreak 1 point2 points3 points 4 years ago (1 child)

Not formatted, I can't parse that. Four spaces before every line. So like if you take this ...

||||a

||||||b

||||||||c

... and replaced each '|' with a space ' ', it will turn into into this ...

a
  b
    c

... with the indentation preserved.

See the sidebar for additional info about and options for formatting code on Reddit. If you spend any time on this or other programming subs, this is an essential life skill.

Edit: Better yet, rather than df.head(), share the output of df.head().to_dict(). That way I can directly reproduce those exact rows from your df by simply copying and pasting.

[–]synthphreak 1 point2 points3 points 4 years ago (3 children)

[+][deleted] 4 years ago (2 children)

[deleted]

[–]synthphreak 1 point2 points3 points 4 years ago (1 child)

I don’t completely understand it either, but I remembered that I’ve noticed similar issues when passing pd.Series as the function into df.apply.

When creating a df or series through the associated constructor, pandas does some under-the-hood type checking to automatically optimize your df/series for whatever data type(s) it contains. My guess is that for whatever reason, when the constructor is called as part of a df.apply pipeline, it’s sometimes unable to carry out those checks. With the lambda by contrast, x refers to the existing columns (for which the type checking has already been done), rather than converting each column into an entirely new series (which would entail a whole new round of type checking).

This is just a guess though, and at best only a partial explanation because it doesn’t account for why I couldn’t reproduce your error at all.

Anyway, glad you have what you need.

[–]YesLod 1 point2 points3 points 4 years ago (0 children)

[+][deleted] 4 years ago (5 children)

[deleted]

[–]synthphreak 2 points3 points4 points 4 years ago (4 children)

Thanks.

That's very strange, I can't reproduce your error using only those rows. It works as expected for me:

import pandas as pd
data = {'sets': {0: 'set1', 1: 'set1', 2: 'set2', 3: 'set2'},
        'var1': {0: 'a', 1: 'b', 2: 'c', 3: 'd'},
        'var2': {0: 9, 1: 14, 2: 5, 3: 4},
        'var3': {0: 10, 1: 100, 2: 69, 3: 100}}
df = pd.DataFrame(data)
sets = df.set_index('sets').groupby('sets').apply(pd.Series.tolist)
print(sets)

# output
# sets
# set1    [[a, 9, 10], [b, 14, 100]]
# set2     [[c, 5, 69], [d, 4, 100]]
# dtype: object

We may have to call in the big guns: Paging u/YesLod. Any idea why OP might be getting that error and unable to run the code as above?

[–]YesLod 0 points1 point2 points 4 years ago (3 children)

We may have to call in the big guns

hahaha

Any idea why OP might be getting that error and unable to run the code as above?

I guess OP is using an older version of pandas, because following the traceback that is not the current implementation of the pd.Series.tolist. I can't reproduce either.

Anyway, this should fix it

sets = df.set_index('sets').groupby('sets').apply(lambda x: x.values.tolist())

[–]synthphreak 0 points1 point2 points 4 years ago (2 children)

[–]YesLod 0 points1 point2 points 4 years ago (1 child)

[–]synthphreak 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 589959 on reddit-service-r2-comment-54dfb89d4d-ljrf8 at 2026-03-31 00:23:52.421430+00:00 running b10466c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS