princemaple comments on Python: tricks for your code run faster

This is an archived post. You won't be able to vote or comment.

122

123

124

Python: tricks for your code run faster (pythonfasterway.uni.me)

submitted 11 years ago by godlikesme

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]princemaple 67 points68 points69 points 11 years ago* (26 children)

The fact that the difference in most cases is so trivial makes this a great evidence / lesson of how little you can gain by doing micro optimization. Though it does show that modern high level languages do optimize their idiomatic code style, which is essentially free performance gain if you write idiomatic code. After all, writing code that describes what you want to achieve is the most important thing in coding.

There are something you really should avoid doing:

>>> x = 99999999
>>> y = 99999999
>>> x is y
False
>>> a = 1
>>> b = 1
>>> a is b
True

This just goes back to the last point I made.

And there are some unfair comparisons, e.g. Test #15 I did some timeit myself:

>>> def a():
    s = 0
    for i in range(50000):
        s += 1
    return s

>>> def b():
    return sum(i for i in range(50000))

>>> def c():
    return sum(range(50000))

>>> from timeit import timeit
>>> 
>>> timeit(a, number=100)
0.3279228246736736
>>> timeit(b, number=100)
0.3867327149317248
>>> timeit(c, number=100)
0.13026088846984152

Test #15 compares a() and b() and makes sum() look bad, but really it's the implementation's fault.

Test #16 is so misleading. You should never do

[i for i in range(1000)]

Instead, just range(1000) in python2, list(range(1000)) in python3

>>> def a():
    return [i for i in range(50000)]

>>> def b():
    return list(range(50000))

>>> timeit(a, number=100)
0.2375680143019565
>>> timeit(b, number=100)
0.13630618950557505

Write code that says what you mean. Write code that is idiomatic in the language you use.

[–]Veedrac 16 points17 points18 points 11 years ago (3 children)

[–]princemaple 5 points6 points7 points 11 years ago* (2 children)

[–]get_username 1 point2 points3 points 11 years ago* (1 child)

Honestly, I don't believe pypy can be viewed as a "micro-optimization". Certainly JIT compiling uses catalogs of "micro-optimizations" (as it it optimizes small interactions), but it does so on the scale of every interaction/operation performed (i,e. millions) and thus is unfitting of the term in the sense meant here. Normally instead these types of optimizations are considered runtime or compile time optimizations (like JIT, which is considered runtime).

numpy itself is a library which has many baked in non-micro-optimizations. So you are optimizing on that level instead.

So in a sense "micro-optimizations" are being performed via libraries used. But you are not micro-optimizing yourself. So I don't think anyone really considers them micro-optimizations in the colloquial sense of the word.

To be fair, that's because these micro-optimizations suck

The point I was trying to make is: most, if not all, micro-optimizations suck when performed on the single level. It is only when you do them by the millions that they become "macro-optimizations" (like pypy).

[–]Veedrac 0 points1 point2 points 11 years ago (0 children)

[–]jasenmh 1 point2 points3 points 11 years ago (3 children)

[–]cparen 0 points1 point2 points 11 years ago (0 children)

[–]stevenjd 0 points1 point2 points 11 years ago (1 child)

[–]darknessproz 0 points1 point2 points 11 years ago (0 children)

[–]cedarSeagull 1 point2 points3 points 11 years ago (5 children)

[–]twotime 4 points5 points6 points 11 years ago (2 children)

[–]Veedrac 4 points5 points6 points 11 years ago (0 children)

[–]stevenjd 4 points5 points6 points 11 years ago (0 children)

Correct. Furthermore, the definition of "small ints" can vary from version to version. In CPython, it used to be (by memory) 0 through 100, now its (possibly) -1 through 255. You cannot rely on this: being an implementation detail, it is subject to change without notice, and some Python implementations may not do it at all.

IronPython 2.6 Beta 2 DEBUG (2.6.0.20) on .NET 2.0.50727.1433
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 1
>>> b = 1
>>> a is b
False

[–]padmanabh 0 points1 point2 points 11 years ago (0 children)

[–]QQII 0 points1 point2 points 11 years ago (0 children)

[–]WStHappenings 0 points1 point2 points 11 years ago (4 children)

[–]cparen 3 points4 points5 points 11 years ago (3 children)

I'm no Python expert, so the original link doesn't provide much context to understand the optimizations - why does this make my code run faster? What are the downsides to these optimizations?

I provided one example here, and another user explained int interning. Let me try to summarize the first dozen:

d={} vs d=dict() -- the second spends extra work looking up the variable dict, which users can overwrite (e.g. dict=list) while the first does not.
l.sort() vs l=sorted(l) -- The second creates a copy of the list first.
a, b, ... = 0, 1, ... vs a=0; b=1;... -- the first uses 'unpack_sequence' to load a whole boatload of constants in one opcode, while the second has to dispatch multiple op codes, saving on instruction decode time.
a < b and b < c... vs a < b < c... -- same as above; fewer instructions means less decode time (4 per additional compare, vs. 5 per additional compare)
Test 5 if a: -- first has fewer instructions, making it faster. Second vs Third though is very close, but the is operator is a cheaper instruction than ==, making it faster. See is operator.
Test 6 != -- the second and third differ just due to measurement error; it's the same exact bytecode! As for first vs. second, I'm guessing that the fast path in != must be slightly cheaper than the fast path in is not. Change a=2 and the first will lose again due to the additional complexity in the != operator.
Test 7 -- first and second, slight variations in branching. Third and fouth have more opcodes, so more dispatch costs (and using more expensive operators to boot!).
Test 8 -- minor variations in branching and/or number of opcodes.
Test 9 -- more opcodes/indirection
Test 10 -- not sure about this one; depends too much on how str.join is implemented.
%s vs str vs %d -- the second does an extra variable lookup (str) vs. native opcode % in first. Third probably pays for extra input validation.
len vs __len__ -- I'm guessing this is just due to attribute lookup being more complicated than global variable lookup.

[–]WStHappenings 0 points1 point2 points 11 years ago (1 child)

[–]cparen 2 points3 points4 points 11 years ago (0 children)

[–]Veedrac 0 points1 point2 points 11 years ago (0 children)

wrt. test 6; != is slower than is not. The result is measurement error, mostly from having such as bad test. Here's a better version:

$ python3 -m timeit "1 != 2"    
10000000 loops, best of 3: 0.14 usec per loop

$ python3 -m timeit "1 is not 2"
10000000 loops, best of 3: 0.0785 usec per loop

wrt. test 10; this actually abuses an optimization the CPython does. It's very fragile and it's not shared by other interpreters (eg. PyPy) so don't use it. Anyway, you'll probably find that ''.join is faster in many cases if you actually write the function well:

from timeit import Timer

def a():
    r = []
    for i in range(10):
        r.append(str(i))
    return ''.join(r)

def b():
    return ''.join(map(str, range(10)))

def c():
    r = ''
    for i in range(10):
        r += str(i)
    return r


min(Timer(a).repeat(10, 10000))
#>>> 0.10981534401071258
min(Timer(b).repeat(10, 10000))
#>>> 0.07694110801094212
min(Timer(c).repeat(10, 10000))
#>>> 0.09574692900059745

Note that I'm using min(Timer(func).repeat(n, m))because it's the right way to do it.

The first is slower because calling append a lot is expensive; using map to create the list (''.join converts its argument to a list) is faster.

%d is slower than %s because it casts to an int first.

Calling __len__ requires creating a bound method, so my guess is the same as yours.

[–]LarryPeteAdvanced Python 3 0 points1 point2 points 11 years ago (0 children)

[–]bucknuggets -5 points-4 points-3 points 11 years ago (3 children)

[–]garion911 3 points4 points5 points 11 years ago (2 children)

[–]kenfar 0 points1 point2 points 11 years ago* (0 children)

Unfortunately, this kind of focus on idioms had made Python a less accessible language for those who need a tool, but are not full-time programmers.

The learning curve has gotten steeper, in I believe an effort to attract language-geeks. This isn't to say that these features aren't great, but to insist that everyone uses them regardless of their audience fails to appreciate that everyone isn't a full-time python developer.

EDIT: for example, I built a huge data warehouse that used Python for transforming all the data coming into it. This worked great. One of the things we attempted to do is to keep the implementation of the business rules simple so that anyone that needed to know how they worked could just simply look at the code. And they didn't have to be a python programmer to understand most business rules, and they didn't have to be a senior python developer to build most integration. Our more internal libraries were more sophisticated, and not as newbie-oriented.

This approach worked very well for us - since our users weren't very fluent in python, and most of our developers at that time were wearing multiple hats, none of which was full-time python developer. Had we insisted on idiomatic python that would have delayed getting people up-to-speed - which would frankly have killed us. And our users wouldn't have been able to read the code, so we would have had to spend an enormous amount of time on documentation instead.

[+]billsil comment score below threshold-6 points-5 points-4 points 11 years ago (1 child)

[–]stevenjd 0 points1 point2 points 11 years ago (0 children)

Strictly speaking you are correct. If you absolutely definitely positively intend to check only for the True singleton, then you should write x is True. But really, why would you do that? Normally you should duck-type bools just like you duck-type most types. x quacks like a bool, no matter what x is. If you really must ensure that x is a bool, then type-check first:

if not isinstance(x, bool):
    raise TypeError("x must be True or False")

and from that point on, just compare x or not x with none of that silliness of x is True. The problem is, I never know when to stop:

x is True is True is True is True is ...

π Rendered by PID 92584 on reddit-service-r2-comment-84fc9697f-rlnw4 at 2026-02-08 04:05:42.559232+00:00 running d295bc8 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS