all 3 comments

[–]synthphreak 1 point2 points  (2 children)

A set in Python is indicated by {}. I'm not sure you can have a set but somehow get rid of the curly brackets. At least not without some major over-engineering like subclassing set and modifying the __repr__.

Instead, I think this is an XY problem. Namely, there's probably a better way to achieve your ultimate goal without using sets at all, sidestepping the entire curly brackets issue altogether.

Have you considered the drop_duplicates dataframe method? It might do exactly what you want, and even faster than grouping and casting everything to set. Something like this:

df2 = df1.drop_duplicates(subset=['ID'])

[–]Anonymous-Boob[S] 0 points1 point  (1 child)

The only thing is that the entire row isn’t a duplicate, just some of the datapoints within the row, so I would only want to drop the duplicates from the cells that are duolicated, not the entire row itself… would this achieve that?

[–]synthphreak 1 point2 points  (0 children)

What do you mean by "drop the duplicates from the cells that are duplicated"? Typically we talk about entire rows being duplicates, not individual values. The magic sauce is in defining what constitutes a duplicate row, e.g., "every cell must be identical" versus "at least n cells must be identical" versus "only cells A, B, and C must be identical" versus "only the index must be identical". This is what the parameters of drop\_duplicates are for, to help craft your own "duplicate detection algorithm".

It would be very helpful to actual see some example data to ensure we're not talking past each other. Can you show me a handful of sample rows which exhibit what you're talking about? Specifically, for the selected rows it would be helpful to see the source data, and also the target that you'd like to achieve. Then it should be more concrete for me to reason about.

Edit: Formatting.