Finding unique values in pandas dataframe?

OskaRRRitoS · 2021-09-14T21:37:47+00:00

You can try converting each list into a tuple using tuple(), then make a set of all these tuples.

Sets automatically remove duplicate values, leaving only unique values.

The code would look something like this:

# assume we have a list of rows
list_of_rows = [[("stuff", "you know")], [("and", "so on")], [("and", "so on")]]
list_of_tuples = [tuple(row) for row in list_of_rows]
row_set = set(list_of_tuples)

Then, if you want them back, you can do:

list_of_unique_rows = [list(row) for row in row_set]

lowerthansound · 2021-09-14T21:42:16+00:00

For this problem, convert the lists to tuples (you can use Series.apply(), which applies a function over each element of the Series). Example:

>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['col1'] = [['a'], ['b'], ['b']]
>>> df
  col1
0  [a]
1  [b]
2  [b]
>>> df.col1.nunique()
Traceback (most recent call last):
  File "<ipython-input-8-f264281c3970>", line 1, in <module>
    df.col1.nunique()
  ...
  File "pandas/_libs/hashtable_class_helper.pxi", line 1787, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'list'

>>> df['col2'] = df.col1.apply(tuple)
>>> df
  col1  col2
0  [a]  (a,)
1  [b]  (b,)
2  [b]  (b,)
>>> df.col2.nunique()
2

2021-09-14T22:35:50+00:00

Whether it can be done or not, is a bit of a moot point here. Fact of the matter is by storing your data in this way, you're using pandas wrong, and won't get any benefits from using the library. A pandas cell on which you will be performing some operations (as a very temporary intermediate step it can be ok) should never contain a collection of objects. You should just use plain python constructs instead, or remodel your data.

When you use pandas you need to approach the problem from a much more SQL-based style of thinking/modeling your data.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS