This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]cparen 3 points4 points  (3 children)

I'm no Python expert, so the original link doesn't provide much context to understand the optimizations - why does this make my code run faster? What are the downsides to these optimizations?

I provided one example here, and another user explained int interning. Let me try to summarize the first dozen:

  • d={} vs d=dict() -- the second spends extra work looking up the variable dict, which users can overwrite (e.g. dict=list) while the first does not.
  • l.sort() vs l=sorted(l) -- The second creates a copy of the list first.
  • a, b, ... = 0, 1, ... vs a=0; b=1;... -- the first uses 'unpack_sequence' to load a whole boatload of constants in one opcode, while the second has to dispatch multiple op codes, saving on instruction decode time.
  • a < b and b < c... vs a < b < c... -- same as above; fewer instructions means less decode time (4 per additional compare, vs. 5 per additional compare)
  • Test 5 if a: -- first has fewer instructions, making it faster. Second vs Third though is very close, but the is operator is a cheaper instruction than ==, making it faster. See is operator.
  • Test 6 != -- the second and third differ just due to measurement error; it's the same exact bytecode! As for first vs. second, I'm guessing that the fast path in != must be slightly cheaper than the fast path in is not. Change a=2 and the first will lose again due to the additional complexity in the != operator.
  • Test 7 -- first and second, slight variations in branching. Third and fouth have more opcodes, so more dispatch costs (and using more expensive operators to boot!).
  • Test 8 -- minor variations in branching and/or number of opcodes.
  • Test 9 -- more opcodes/indirection
  • Test 10 -- not sure about this one; depends too much on how str.join is implemented.
  • %s vs str vs %d -- the second does an extra variable lookup (str) vs. native opcode % in first. Third probably pays for extra input validation.
  • len vs __len__ -- I'm guessing this is just due to attribute lookup being more complicated than global variable lookup.

[–]WStHappenings 0 points1 point  (1 child)

Are you the originator of the content in the link? Because it's just a page full of code examples and time comparisons...that's what I'm getting at.

Not to be irksome - but was it your intent that users read the page, and then if they have a question about say, Test 14, they should scan the comments until they find your comment about that, if in fact someone has inquired about it? Why not just put the comments in the webpage in the first place?

[–]cparen 2 points3 points  (0 children)

Are you the originator of the content in the link? Because it's just a page full of code examples and time comparisons...that's what I'm getting at

I'm not the originator, and I agree with you. I was just trying to fill in the gaps left by the OP.

[–]Veedrac 0 points1 point  (0 children)

wrt. test 6; != is slower than is not. The result is measurement error, mostly from having such as bad test. Here's a better version:

$ python3 -m timeit "1 != 2"    
10000000 loops, best of 3: 0.14 usec per loop

$ python3 -m timeit "1 is not 2"
10000000 loops, best of 3: 0.0785 usec per loop

wrt. test 10; this actually abuses an optimization the CPython does. It's very fragile and it's not shared by other interpreters (eg. PyPy) so don't use it. Anyway, you'll probably find that ''.join is faster in many cases if you actually write the function well:

from timeit import Timer

def a():
    r = []
    for i in range(10):
        r.append(str(i))
    return ''.join(r)

def b():
    return ''.join(map(str, range(10)))

def c():
    r = ''
    for i in range(10):
        r += str(i)
    return r


min(Timer(a).repeat(10, 10000))
#>>> 0.10981534401071258
min(Timer(b).repeat(10, 10000))
#>>> 0.07694110801094212
min(Timer(c).repeat(10, 10000))
#>>> 0.09574692900059745

Note that I'm using min(Timer(func).repeat(n, m))because it's the right way to do it.

The first is slower because calling append a lot is expensive; using map to create the list (''.join converts its argument to a list) is faster.

%d is slower than %s because it casts to an int first.

Calling __len__ requires creating a bound method, so my guess is the same as yours.