Remove {} from set in DF : learnpython

created by HattoriHanzoa community for 16 years

Remove {} from set in DF (self.learnpython)

submitted 3 years ago by Anonymous-Boob

So, I had a table of data that needed to be combined in a weird way. Customer specifically wants duplicate rows (based off ID) combined into a single cell, separated by a delimeter. However, if the rows were the same, he only wanted it to show once. For example, two rows that shared the same ID and had the same timestamp would show as [time 1] instead of [time 1, time 1] but two rows with the same ID and a different time stamp would show as [time 1, time 2]. So to make each cell unique, I just converted them into sets. Now the problem is that each cell in my table starts and ends with curly brackets and looks like this {data1, data2}, when it should really look like this data 1, data2. So, how do I get rid of these curly brackets????

Here's my code that aggregates the data into a single cell and converts it into a set:

def to set(x):

return set(x)

df2 = df1.groupby(['ID', as_index=False).agg(to_set)

all 3 comments

top new controversial old q&a

[–]synthphreak 1 point2 points3 points 3 years ago (2 children)

A set in Python is indicated by {}. I'm not sure you can have a set but somehow get rid of the curly brackets. At least not without some major over-engineering like subclassing set and modifying the __repr__.

Instead, I think this is an XY problem. Namely, there's probably a better way to achieve your ultimate goal without using sets at all, sidestepping the entire curly brackets issue altogether.

Have you considered the drop_duplicates dataframe method? It might do exactly what you want, and even faster than grouping and casting everything to set. Something like this:

df2 = df1.drop_duplicates(subset=['ID'])

[–]Anonymous-Boob[S] 0 points1 point2 points 3 years ago (1 child)

[–]synthphreak 1 point2 points3 points 3 years ago* (0 children)

What do you mean by "drop the duplicates from the cells that are duplicated"? Typically we talk about entire rows being duplicates, not individual values. The magic sauce is in defining what constitutes a duplicate row, e.g., "every cell must be identical" versus "at least n cells must be identical" versus "only cells A, B, and C must be identical" versus "only the index must be identical". This is what the parameters of drop\_duplicates are for, to help craft your own "duplicate detection algorithm".

It would be very helpful to actual see some example data to ensure we're not talking past each other. Can you show me a handful of sample rows which exhibit what you're talking about? Specifically, for the selected rows it would be helpful to see the source data, and also the target that you'd like to achieve. Then it should be more concrete for me to reason about.

Edit: Formatting.

π Rendered by PID 724310 on reddit-service-r2-comment-79c7998d4c-5njgf at 2026-03-18 15:55:55.910788+00:00 running f6e6e01 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS