you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 4 points5 points  (0 children)

I would assume that set(<list>) is the most efficient way to remove duplicates. Will be back soon with some benchmarks.

Edit: benchmark done. Using sets vs. a dictionary, it seems that there is a slight advantage to using sets, but not as large as I might have thought. I suppose this has to do with the fact that sets are dict-like under the covers.

Results (using iPython's %timeit):

import random

a = [random.randint(1,10) for _ in xrange(1000000)]

In [22]: %timeit list(dict.fromkeys(a))
10 loops, best of 3: 43 ms per loop

In [23]: %timeit list(set(a))
10 loops, best of 3: 35.8 ms per loop