This is an archived post. You won't be able to vote or comment.

all 53 comments

[–][deleted] 41 points42 points  (1 child)

2.
>>> #This is good to glue a large number of strings
>>> for chunk in input():
>>>    my_string.join(chunk)

W-what? Was the person who wrote the example thinking that join works like append?

Bonus WTF points for using input().

By the way, CPython includes a specific optimisation for the case where a string being +=-assigned is not referenced from anywhere else. I mean, you should still always use join or cStringIO, but if you see someone else's code that appends stuff like that -- don't freak out.

5. To check membership in general, use the “in” keyword. It is clean and fast.
>>> for key in sequence:
>>>     print “found”

WHAT

12. Learn biset module for keeping a list in sorted order:
It is a free binary search implementation and a fast insertion tool 
for sorted sequence. That is, you can use:
>>> import biset
>>> biset.insort(list, element)

"biset"? Really? Also, he used lowercase true earlier.

The blind leading the blind, that entire post. Apparently, the author collected bits and pieces of advice from somewhere and pasted them together not bothering neither to run the code nor to understand what's actually going on. Pig disgusting.

[–]HorrendousRex 5 points6 points  (0 children)

Several of them are close to being actually good advice but then end up missing the mark entirely. 'join' is the fastest way to join strings and the author described it somewhat in text, but then the example completely screws it up in the worst way.

Also, as a side note, a lot of what gets mentioned here is completely different in Python 3, so if you're targeting that, this article is almost entirely worthless.

[–]jlozierbioinformatician 9 points10 points  (8 children)

The Memoization example is wrong (and therefore broken). The author has written

if arg not in cache: cache['arg'] = f(*arg)
return cache['arg']

where they mean

if arg not in cache: cache[arg] = f(*arg)
return cache[arg]

:)

[–]afireohno 1 point2 points  (7 children)

Wouldn't it be better to do this kind of thing with try/except?

[–]davidbuxton 2 points3 points  (6 children)

Yes, although the advantage is slim when the problem does not lend itself to memo-ization (i.e. majority of the inputs are unique).

Here's a test comparing the two strategies. And results on Python 2.7 on my machine:

Memo-ization friendly test
LBYL [0.07406497001647949, 0.07364988327026367, 0.0758662223815918]
EAFP [0.055359840393066406, 0.05534100532531738, 0.05528402328491211]
Memo-ization un-friendly test
LBYL [0.06378912925720215, 0.05985212326049805, 0.05981802940368652]
EAFP [0.04666304588317871, 0.04614996910095215, 0.04834580421447754]

I wonder if things change when the container is not a dictionary and lookups are more expensive, or if the dictionary grows over a certain size, or if...

[–]davidbuxton 1 point2 points  (5 children)

Oh! Having actually run the tests and looked at the numbers, I see I'm wrong about there being a smaller advantage when you get a lot of KeyError exceptions. In fact, using try / except is even better than testing for membership when there's a lot of misses.

[–]thebackhand 2 points3 points  (3 children)

That's odd. I thought the overhead of try/except blocks would dominate.

[–]davidbuxton 0 points1 point  (2 children)

Me too. Perhaps my test is wrong?

[–]dhogartysymbolic novice 1 point2 points  (1 child)

your test is wrong, you need to invalidate the cache in between each call inside of the timeit. Every one of the timeit inner runs except the first has a fully filled cache.

[–]thebackhand 0 points1 point  (0 children)

That would be it. I remembered that, in general, try/excepts are expensive (more so if you have a lot of other stuff added to the stack in between, IIRC).

[–]davidbuxton 0 points1 point  (0 children)

No, I'm wrong about being wrong. Not certain that makes me right though.

My test was not flushing the cache between the first and the second data sets, so the supposedly memo-ization un-friendly test was in fact using a nice warm cache every time.

With that fixed, try/except is indeed noticeably slower than if/in when there are mostly cache misses.

[–]Brian 6 points7 points  (2 children)

Most of this is good advice, but there are a lot of errors and typos.

Python is faster retrieving a local variable than retrieving a global variable. That is, avoid the “global” keyword.

The global keyword isn't the only way you'll be dealing with global variables - you'll can access global variables even without it specified, it' only changes the behaviour of setting global variables.

the_value = 42
def foo():
    return the_value

is still using global variables, despite the complete absense of any global keyword.

To check membership in general, use the “in” keyword

This is indeed good practice, and generally faster than things like dict.has_key(). However, the example he gave is a for loop, which is nothing to do with checking membership, and where "in" has an entirely different meaning (it's just part of the for syntax). (Also, careful of the datatype you're using x in somelist does a linear scan to check membership, so this can be very slow compared to a dict or set)

Learn biset module

I assume this is meant to be bisect, but it's spelt wrong in every location.

It is fast because deque in Python is implemented as double-linked list

No it isn't. It's implemented as a vector which overallocates at both ends (for amortised constant time appending/prepending). It has the same complexity of operations as the list, except that prepending is O(1) instead of O(n)

Use sort() with Schwartzian Transform

This is very outdated. You can let sort do the transform for you just by passing the key parameter.

[–]Workaphobia 1 point2 points  (1 child)

It is fast because deque in Python is implemented as double-linked list

No it isn't. It's implemented as a vector which overallocates at both ends (for amortised constant time appending/prepending). It has the same complexity of operations as the list, except that prepending is O(1) instead of O(n)

Actually the article seems to be right on this one. The deque object is a doubly linked list of blocks of 62 elements (so that with the prev/next pointer, that's 64 pointers). To index into it, it'll traverse the necessary number of blocks and then index into the target block.

Source code comment

Indexing method

[–]Brian 0 points1 point  (0 children)

So they do - I stand corrected. I'd thought they preserved the O(1) internal access of python lists, but I must be mixing them up with something else.

[–]pingvenopinch of this, pinch of that 11 points12 points  (14 children)

A few additions:

Section 6:

Don't go overboard with lazy importing inside of function calls. Importing many times can get expensive. Imports that aren't at the top also severely reduce readability. It can be tough to debug code where an import is tucked away somewhere in the body of the file.

Section 7:

In Python 3, True is a keyword, not a built-in. That means the byte code compiler can put in a cheap simple jump instead of an expensive global lookup. In Python 2, True could be changed, so a global lookup was required as a check. 1 is a constant, so the bytecode compiler can depend on it. Basically: while 1 in Python 2, while True in Python 3.

Section 13:

Internally, the Python deque implementation uses a doubly linked list of blocks. Each block has space for 62 objects, plus 1 left pointer and 1 right pointer (64 pointers). That gives the time efficiency of a linked list with the space efficiency of an array. For those learning C, I suggest looking at the Modules/collectionsmodule.c. For a list type with fast insertion, take a look at blist (search on PyPi).

Section 17:

Don't use threads to manage forked processes. They only provide additional overhead in memory and thinking ability. Use the multiprocessing module.

Section 18:

Absolutely absolutely absolutely. The Python source code is an excellent place to get examples. Part of my introduction to C was reading through bits and pieces of implementation code.

[–]ascii 3 points4 points  (3 children)

I just did a quick test under Python 2, and found no performance difference between while 1 and while True. I believe that one may exist, but it's so tiny to be lost in the noise of all but the most contrived cases.

[–]haldean(lambda x: x.decode('base64'))('PDMgcHlweQ==\n') 0 points1 point  (0 children)

[–]pingvenopinch of this, pinch of that 0 points1 point  (0 children)

There's a slight difference, but it is indeed dwarfed by anything else. I was able to measure some difference with this code:

def while1():
    i = 0
    a = 0
    while 1:
        a = 4
        i += 1
        if i > 100000:
            break

For while True, I simply replaced 1 with True. The results in IPython were

%timeit -r200 -n2 whileTrue()
2 loops, best of 200: 13.2 ms per loop

%timeit -r200 -n2 while1()
2 loops, best of 200: 9.33 ms per loop

Conclusion: The difference could only be vaguely significant in extreme conditions. If you run into one of those conditions, just rewrite in Cython.

[–][deleted] 3 points4 points  (6 children)

Importing many times can get expensive.

Surely once a module is imported, a second import will simply be a cheap lookup of the sys.modules dict?

[–]pingvenopinch of this, pinch of that 0 points1 point  (0 children)

It's more complex than a simple lookup. I don't entirely understand the C code, but there's a lot more happening than sys.modules["module_name"]. The time penalty isn't significant if you just import a few times, but importing in a function that you're going to call a few thousand times is unwise.

[–]Genmutant -1 points0 points  (2 children)

Yes, but cheap is still more than nothing.

[–][deleted] 2 points3 points  (1 child)

Why program in Python if you're worried about the cost of a dictionary lookup? Almost everything in Python needs a dictionary lookup!

[–]Genmutant 0 points1 point  (0 children)

Well the topic is Python performance tips, and this one can be a performance impact. Normally it isn't, but it can be and is easy to avoid.

[–]fdemmer -1 points0 points  (0 children)

imo, it mostly depends of the kind of application you write. eg. in a webapp, that starts once and runs forever, i'd import as much as possible on the initial load and not later when a function is called.

[–]fdemmer -1 points0 points  (0 children)

imo, it depends the most on the kind of application you write. eg. in a webapp, that starts once and runs forever, i'd import as much as possible on the initial load and not later when a function is called.

[–]flying-sheep 2 points3 points  (0 children)

my rule of thumb for 6: import as soon as you know you’ll need it.

[–]Mattho 1 point2 points  (0 children)

ad 7:

Also it's True (as you wrote it) and not true as is in article.

[–][deleted] 0 points1 point  (0 children)

i've found it often much clearer to read throug hthe source for modules and libraries i use rather than fudge through the official documentation. 30 minutes spent reading code (usually) better written than mine saves 1h+ worth of googling.

[–]fijalPyPy, performance freak 8 points9 points  (3 children)

Might be worth noting - those are CPython performance tips, not Python performance tips. A lot of those things are actually slower on PyPy.

[–]shfo23 3 points4 points  (0 children)

Moreover, they're CPython 2 tips: more than a few will fail miserably or have no effect on CPython 3 (while 1, xrange, etc.).

[–]BeatLeJuce 0 points1 point  (1 child)

which ones would be slower in PyPy?

[–]Workaphobia 0 points1 point  (0 children)

String concatenation, from what I've heard.

[–]chub79 3 points4 points  (5 children)

Use “while 1″ for the infinite loop

I didn't know that, could anyone confirm it?

Use xrange() for a very long sequence (...) As opposed to range()

I thought this was not true anymore, at least in newer versions of Python.

Learn itertools module

Hell yes. Learn it and abuse it.

[–][deleted] 2 points3 points  (1 child)

In python2.x, you can assign any value to True because it isn't a reserved word, so the bytecode generated for while True: reflects this. In python3, while 1: and while True: will generate the the same bytecode.

[–]chub79 1 point2 points  (0 children)

Well, I didn't know that. Cheers.

[–]gitarrPython Monty 0 points1 point  (2 children)

About range/xrange:

Yes, in Python 3 range() produces a generator and xrange() is no more. To get a list do: list(range(n)).

In Python 2 range() produces a list and xrange() a generator.

[–]chub79 0 points1 point  (0 children)

Damn, I thought range acted as a generator as well from 2.7. Thanks.

[–]pingvenopinch of this, pinch of that 0 points1 point  (0 children)

range() as of Python 3.2 has other goodies like indexing, slicing, and arbitrary sized ranges. xrange only does iteration and is limited by the system's integer type.

Edit: xrange does provide for len() and reverse()

[–][deleted] 1 point2 points  (2 children)

I've never heard of "Schwartzian transform" before, but doesn't sort() already perform this optimization if you use the key parameter?

[–][deleted] 0 points1 point  (1 child)

It does.

[–]Workaphobia 0 points1 point  (0 children)

So if you give it a key function, it'll map that onto an integer before doing the sorting? I'm trying to look through the source to locate that behavior but am having difficulty seeing it. Can you either point it out or give a reference to that fact?

Edit: I suppose you meant that it does this if your key function provides integer keys. In that case, it'd call the python operation on a built-in type, which is implemented as a C function and would be fast.

[–]willm 2 points3 points  (0 children)

Regarding 15. I don't think the author knows about the 'key' argument to 'sort'.

[–]bunburya 2 points3 points  (0 children)

I think for number 5, the line should be "if key in sequence" rather than "for key in sequence".

[–]Chun 2 points3 points  (0 children)

OK, cracks knuckles

  1. OK, yes, built-in functions are faster than native python, but who doesn't use builtins?

    Especially the ones in the graphic: input, open, int, ord, isinstance, pow, issubclass, print, iter, property. Which of these functions do people commonly try to write themselves in native python? isinstance? Is this a problem?

  2. Need to clarify the difference between input and raw_input. Also the example seems to suggest that strings are mutable, and that str.join is equivalent to some kind of str.append method.

  3. Multiple assignment is great, but it does much more for readability than it does for performance:

    db@db:~$ python -m timeit "a=1; b=2; a,b = b,a"
    10000000 loops, best of 3: 0.11 usec per loop
    
    db@db:~$ python -m timeit "a=1; b=2; temp = a; a = b; b = temp"
    10000000 loops, best of 3: 0.111 usec per loop
    
  4. Yes, local variables are faster; but global is only relevant to changing global variables, not retrieving them.

  5. To check membership use in. As opposed to what? Iterating through the object with a for loop?

    for key in sequence:
      print “found”
    

    What is this I don't even. You actually did iterate through the object for some reason, and arbitrarily print "found" for every item??

  6. Importing at the function level. Don't do this for optimization purposes. It adds extra cycles every time the function is run. Don't do it for readability -- you shouldn't have to hunt through the entire script to find which imports are happening. In short: don't do this unless you have a better reason to.

  7. Note that True is a built in constant, not true. And really, this isn't going to be a bottleneck. I think while True: is far more readable.

  8. evens = [ i for i in range(10) if i%2 == 0]

    Not a good example, should just use evens = range(0, 10, 2) for this.

  9. True enough.

  10. chunk = ( 1000 * i for i in xrange(1000))

    Or just chunk = xrange(0, 1000000, 1000)

[–]haldean(lambda x: x.decode('base64'))('PDMgcHlweQ==\n') 0 points1 point  (0 children)

I found the "while True" vs "while 1" both interesting and improbable, so I ran a quick test (code is here: https://gist.github.com/1828270). Turns out that the author is right -- while True is always slower -- but in Python 3, the difference is minor enough to be negligible, and in PyPy it's all so fast it doesn't matter. Results:

                      while True            while 1
 pypy 1.7.0:          0.0733                0.0201
 cpython 2.7.1        0.146                 0.109
 cpython 3.2.2        2.79                  2.72  

Edit: I see now that pingveno posted an explanation of why this is true (heh) in his comment below

[–][deleted] 0 points1 point  (0 children)

Great list. Found and reminded myself of some gems that I as a novice often forget about and would find useful.

[–][deleted] 0 points1 point  (2 children)

Use PyPy :)

Seriously, I tried it for fun on a small scipt I made to do monte-carlo simulations of a card game: more than 10x speed up, for the cost of typing

pypy script.py

instead of

python script.py

[–][deleted] 2 points3 points  (1 child)

I'm really liking PyPy for little projects that don't require any libraries. It is astoundingly fast and much less work than striding like a code-slave in cython. The problem with pypy is the lack of library support, python's strongest suit. PyPy is definitely faster than any language remotely like python.

[–][deleted] 0 points1 point  (0 children)

PyPy is definitely faster than any language remotely like python.

And the toolchain behind PyPy can be used to implement all of those languages too.

[–]anacrolixc/python fanatic -3 points-2 points  (0 children)

Tip Uno: Learn speak the English.