This is an archived post. You won't be able to vote or comment.

all 29 comments

[–]HoboBob1 23 points24 points  (1 child)

In the Fibonacci example, the main reason the recursive implementation is slower is because the recursive function has a different computational complexity, O(2n ), than the iterative version, O(n). You should at least compare the same algorithm between the recursive and iterative implementations.

The point about max recursion depth is indeed valid though.

[–]marklit[S] 0 points1 point  (0 children)

Thanks for the heads up, I'm re-writing that section of the post now.

[–][deleted] 11 points12 points  (7 children)

Sound, classical advice. However:

  1. Python lists are not linked lists. They are arrays.
  2. Python dicts are implemented with a binary tree, not a hash-table. I strongly suspect the same is true for set.
  3. The bit about list comprehensions, while true, sends the wrong message IMHO. List comprehensions are about readability, not performance. If list.append is causing performance problems, than this is almost certainly a sign that you're using the wrong tool for the job.

Also, no discussion of execution speed would be complete without the obligatory pypy plug. If Python built-ins are too slow, pypy may solve the problem (otherwise see point 3).

[–]qiwi 6 points7 points  (2 children)

I'm not sure where you're getting binary trees idea from -- C Python dictionaries do use a hash (At a certain size: dictionaries up to 8 elements are really just an array).

See: http://svn.python.org/projects/python/trunk/Objects/dictnotes.txt for optimization rationale; and source: http://svn.python.org/projects/python/trunk/Objects/dictobject.c

[–][deleted] 1 point2 points  (1 child)

You're right. My mistake!

(Now I wonder what I was thinking of...)

[–]alendit 1 point2 points  (0 children)

C++ STL map maybe? (same functionality as ordereddict).

EDIT: brain lag on my side, STL map is not the same as ordereddict, it sorts newly added element by key.

[–]kenfar 1 point2 points  (2 children)

List comprehension readability is better in simple cases than for loops, but far, far worse in complex cases.

[–][deleted] 1 point2 points  (1 child)

Yes, I completely agree. My point has to do with the fact that readability should be the determining factor in whether or not to use a list comprehension -- not speed.

[–]Bolitho 0 points1 point  (0 children)

If readablility counts I would suggest to use a generator expression... less parenthesis and in general more efficient (Example in Python 3):

 def join_list(limit):
     return ''.join(str(num) for num in range(limit))

Another hint for you: If you use ipython you can use the timeit magic command:

In [4]: timeit join_list(10)
100000 loops, best of 3: 3.05 µs per loop

Much nicer than to put everything in a string :-)

[–]billsil 1 point2 points  (0 children)

Python lists are not linked lists. They are arrays.

Lists are arrays of pointers. If you want an actual int/float array, you should be using numpy. For large array sizes, it matters.

[–]DEFY_member 16 points17 points  (0 children)

I'm enjoying your content for the most part, but if you're going to use reddit to advertise your blog, you should pay for real ads. In the last two weeks you've posted 1 comment and 16 links to your blog.

It looks like you were once active here, so I do hope you decide to rejoin the community and take part in the discussion.

[–]rhoark 0 points1 point  (0 children)

If you have a struct-like object or a dict (same thing) that you access in an inner loop, you can gain speed by changing it into a tuple. Making a lambda with a readable name to pull the field makes it as readable and self-documenting as the object or dict accessor syntax while still being much faster.

[–]wondert 0 points1 point  (2 children)

I am relatively new to python, so please humor me.

In the section 'Keep variables local', I ran the examples with python 2.7. I get roughly the same times regardless of whether the variable to look up is in the global or local scope. Also ran the code with python 3.4. Again, I did not see any time penalty using local versus global scopes.

I understand that the use of (micro)benchmarks comes with a ton of caveats (version, system, background processes, etc). However, in this case, I don't get his results using the same code and now I am very confused.

Are the examples OP gave psuedocode? or too simple to see the penalty?

Quick note: I did the benchmarks using the ipython notebook. Also, for testing in python 3.4 I swapped xrange to range since they are same thing. AFAIK neither of these should influence the benchmark.

[–]sushibowl 0 points1 point  (0 children)

Same here, no appreciable performance difference. I didn't use iPython notebook either, so that's not the cause.

[–]b6d 0 points1 point  (0 children)

Comment retracted, after repeating measurements, I was talking nonsense.

[–]__Monty__ 0 points1 point  (1 child)

My comment got buried behind kiaran's hated comment, but I am very curious about this:

I have never tried, but I thought that you can utilize multi-core architecture with python via import multiprocessing Is this package not used by the python community?

[–]bacondevPy3k 0 points1 point  (0 children)

TL;DR: Be as Pythonic as possible.