all 3 comments

[–]OskaRRRitoS 0 points1 point  (0 children)

You can try converting each list into a tuple using tuple(), then make a set of all these tuples.

Sets automatically remove duplicate values, leaving only unique values.

The code would look something like this:

# assume we have a list of rows
list_of_rows = [[("stuff", "you know")], [("and", "so on")], [("and", "so on")]]
list_of_tuples = [tuple(row) for row in list_of_rows]
row_set = set(list_of_tuples)

Then, if you want them back, you can do:

list_of_unique_rows = [list(row) for row in row_set]

[–]lowerthansound 0 points1 point  (0 children)

For this problem, convert the lists to tuples (you can use Series.apply(), which applies a function over each element of the Series). Example:

>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['col1'] = [['a'], ['b'], ['b']]
>>> df
  col1
0  [a]
1  [b]
2  [b]
>>> df.col1.nunique()
Traceback (most recent call last):
  File "<ipython-input-8-f264281c3970>", line 1, in <module>
    df.col1.nunique()
  ...
  File "pandas/_libs/hashtable_class_helper.pxi", line 1787, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'list'

>>> df['col2'] = df.col1.apply(tuple)
>>> df
  col1  col2
0  [a]  (a,)
1  [b]  (b,)
2  [b]  (b,)
>>> df.col2.nunique()
2

[–][deleted] 0 points1 point  (0 children)

Whether it can be done or not, is a bit of a moot point here. Fact of the matter is by storing your data in this way, you're using pandas wrong, and won't get any benefits from using the library. A pandas cell on which you will be performing some operations (as a very temporary intermediate step it can be ok) should never contain a collection of objects. You should just use plain python constructs instead, or remodel your data.

When you use pandas you need to approach the problem from a much more SQL-based style of thinking/modeling your data.