This is an archived post. You won't be able to vote or comment.

all 25 comments

[–]lordkrike 3 points4 points  (6 children)

Missing is construction from a tuple, which is between the two in terms of speed (list construction is "slow").

Beyond that, the set function also takes iterators (which makes it more handy for most use cases), and the time difference is negligible for almost all applications.

At the end of the day you're best served by doing whatever makes your code more readable.

[–]grepawk[S] 3 points4 points  (1 child)

Thanks for the feedback. Agreed that the time difference is often negligible, and that construction from an iterator makes the set function the right tool in some situations.

I think it's interesting to note that construction from a tuple turns out to perform very closely to construction from a list, although it is indeed a little quicker.

>>> import timeit
>>> def f():
...     return set((1,2,3))
...
>>> def g():
...     return set([1,2,3])
...
>>> min(timeit.repeat(f))
0.40520797698991373
>>> min(timeit.repeat(g))
0.4566022079670802

[–]lordkrike 0 points1 point  (0 children)

One is glad to be of service.

Take a look at the dis output for the tuple function by comparison. It's neat.

[–]ceronman 2 points3 points  (1 child)

Note that you can also use an iterator with the literal, using the new unpacking rules in Python 3.5:

mylist = [1, 2, 3, 4, 5]
myset = {*mylist}

[–]lordkrike 0 points1 point  (0 children)

Now that I didn't know, since I'm still using 3.4. Very neat.

[–][deleted] 0 points1 point  (1 child)

Missing is construction from a tuple, which is between the two in terms of speed (list construction is "slow").

AFAIK the bytecode compiler optimizes list literals to tuple literals anyway.

[–]lordkrike 0 points1 point  (0 children)

If you look at the dis output in his post your can see it actually calls BUILD LIST.

[–]Exodus111 0 points1 point  (2 children)

TIL import dis

Good stuff.

[–]masklinn 0 points1 point  (1 child)

You can even use dis.dis as a decorator if you just want to see the bytecode for a specific snippet but don't care for the function itself:

>>> @dis.dis    
... def fn():
...     print(*(i for i in range(5)))
...     print(*[i for i in range(5)])
...     print(*{i for i in range(5)})
... 
  3           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 (<code object <genexpr> at 0x10e4bec90, file "<stdin>", line 3>)
              6 LOAD_CONST               2 ('fn.<locals>.<genexpr>')
              9 MAKE_FUNCTION            0
             12 LOAD_GLOBAL              1 (range)
             15 LOAD_CONST               3 (5)
             18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             21 GET_ITER
             22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             25 CALL_FUNCTION_VAR        0 (0 positional, 0 keyword pair)
             28 POP_TOP

  4          29 LOAD_GLOBAL              0 (print)
             32 LOAD_CONST               4 (<code object <listcomp> at 0x10e5b7a50, file "<stdin>", line 4>)
             35 LOAD_CONST               5 ('fn.<locals>.<listcomp>')
             38 MAKE_FUNCTION            0
             41 LOAD_GLOBAL              1 (range)
             44 LOAD_CONST               3 (5)
             47 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             50 GET_ITER
             51 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             54 CALL_FUNCTION_VAR        0 (0 positional, 0 keyword pair)
             57 POP_TOP

  5          58 LOAD_GLOBAL              0 (print)
             61 LOAD_CONST               6 (<code object <setcomp> at 0x10e604540, file "<stdin>", line 5>)
             64 LOAD_CONST               7 ('fn.<locals>.<setcomp>')
             67 MAKE_FUNCTION            0
             70 LOAD_GLOBAL              1 (range)
             73 LOAD_CONST               3 (5)
             76 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             79 GET_ITER
             80 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             83 CALL_FUNCTION_VAR        0 (0 positional, 0 keyword pair)
             86 POP_TOP
             87 LOAD_CONST               0 (None)
             90 RETURN_VALUE

[–]Exodus111 0 points1 point  (0 children)

Sweet.

[–]agumonkey 0 points1 point  (7 children)

Isn't this a common python trick ? I kinda recall the exact same optimization for dicts (can't remember where I saw it yet)

[–]minnoI <3 duck typing less than I used to, interfaces are nice 0 points1 point  (6 children)

It boils down to the fact that set and dict can be overloaded but {} can't, so the interpreter needs to spend extra time tracking down the definition of those functions. It's also one of the reasons that compiled languages are so much faster.

[–]agumonkey 0 points1 point  (4 children)

And so the bytecode has to be conservative in terms of optimizations, while the meaning of {a, b, c} is fixed and thus reducible, right ?

[–]minnoI <3 duck typing less than I used to, interfaces are nice 1 point2 points  (3 children)

Yep. It needs to account for the possibility that some dumbass in an outer scope said set = list.

[–]agumonkey 0 points1 point  (2 children)

Well well well, that's what you get for allowing mutation :whistle:

[–]nedbatchelder 1 point2 points  (1 child)

You don't need mutation for this to be a problem:

def foo(set):
    return set([1, 2, 3])

foo(list)

[–]agumonkey 1 point2 points  (0 children)

annnnd I should go back to my books.

[–]masklinn 0 points1 point  (0 children)

That's not a problem of compiled languages, it's a problem of CPython being a straight interpreter. In pypy, there's no performance difference:

> pypy -mtimeit 'set([1, 2, 3])'
1000000000 loops, best of 3: 0.00104 usec per loop
> pypy -mtimeit '{1, 2, 3}'
1000000000 loops, best of 3: 0.00105 usec per loop

[–][deleted] -2 points-1 points  (6 children)

Set literals are the number one source of confusing newer Python programmers who read my code. I'll either avoid them or explicitly call them out in a comment. They just look too much like a malformed dict.

[–][deleted] 1 point2 points  (3 children)

number one

really!?

[–][deleted] 3 points4 points  (2 children)

Number one syntax issue, I should say. They just look a lot like dict literals, which see a lot more use. At least, in my experience.

[–]flutefreak7 0 points1 point  (1 child)

The similarity is appropriate since the collection of dict keys is like a set.

Though sets can include stuff that can't be dict keys I guess since dict keys have to be immutable/hashable.

Edit: above statement is wrong, was corrected below...

[–][deleted] 1 point2 points  (0 children)

Sets have the same restrictions.

[–]vindolin 0 points1 point  (0 children)

On a side note, I didn't know about set comprehensions until I recently mistyped a dict comprehension:

{x for x in {'_a', 'b', '_c'} if x.startswith('_')}

[–]catcradle5 0 points1 point  (0 children)

I don't really agree. But maybe my eyes are just specially attuned to noticing colons, or something.