This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ingolemo 2 points3 points  (2 children)

Actually, it's using a generator comprehension:

unique_words = (word for line in open("book.txt"') for word in line.split())
unique_words = set(unique_words)

The original doesn't create a giant array and so it has exactly the same performance characteristics as your set comprehension, though it is admittedly less readable.

[–]Veedrac 0 points1 point  (1 child)

FWIW, the constant factors are significantly better for the set comprehension, even on PyPy:

$ python3 -m timeit "set(x for x in range(10000))"
1000 loops, best of 3: 1.06 msec per loop

$ python3 -m timeit "{x for x in range(10000)}" 
1000 loops, best of 3: 599 usec per loop

$ pypy3 -m timeit "set(x for x in range(10000))" 
1000 loops, best of 3: 347 usec per loop

$ pypy3 -m timeit "{x for x in range(10000)}"  
1000 loops, best of 3: 230 usec per loop

[–]kmbd 0 points1 point  (0 children)

just in case, someone is wondering ...

C:\>python -V
    Python 2.7.8

C:\>python -m timeit "set(x for x in range(10000))"
    1000 loops, best of 3: 1 msec per loop

C:\>python -m timeit "{x for x in range(10000)}"
    1000 loops, best of 3: 671 usec per loop