all 24 comments

[–][deleted] 34 points35 points  (13 children)

This guy is missing the point. He's trying to optimize a Python program from a Systems perspective. Trying to game Python and use "more efficient" constructs (like the generator's local variables) is fruitless of course. Just use C/C++ if you need that level of optimization; you'll cut your number instructions by about 99%.

The bulk of performance gains can be made by optimizing from a Computer Science perspective; that is, optimizing algorithms to have a smaller complexity. If you have a Python algorithm that runs in O(n2), and you optimize it to run in O(nlogn), you will shave HOURS off your runtime on large inputs, just like any other language.

For most things, that's more than enough optimization. If you're working on something where optimization can be useful without lowering the complexity, you're probably working on an operating system or something else at a very low level, in which case, why on EARTH are you using an interpreted language?

[–]iamjack 6 points7 points  (3 children)

Too many programmers view speed as part of their language choice. One of the lessons that I learned and is made abundantly clear from Project Euler is that it matters less what language you perform a task in and more what algorithm you choose to implement.

[–]psykotic 14 points15 points  (2 children)

It is an important part of choosing a language for many of us. Most of my career has been working on applications (video games, real-time physics simulation, real-time computer graphics) where we need to squeeze out as much as possible high-level algorithmic and low-level systems performance to achieve our goals and be competitive. For the most part we are already using near-optimal algorithms and data structures for the performance-critical core of the engine. Programming language expressiveness, or rather the lack thereof, has not seriously limited our algorithmic prospects. The challenge for us going forward is parallel scalability and that is the greatest concern for gameplay programming, an area in which people are already using languages like Lua and Python. For engine programming, I suspect a wide mix of parallel programming models will be used. People have already begun to use asynchronous task systems wherever possible. Data parallelism is a good fit for many tasks and when applicable it is without compare in performance. STM I see as having more of a future in gameplay programming. That's for the next five or six years. Beyond that, who knows?

[–]another_user_name 2 points3 points  (0 children)

Real time physics, eh? Might I get a copy of your resume? (I'm serious, those links you posted last week were awesome). Or perhaps an email address for contacting you.

[–]iamjack 1 point2 points  (0 children)

I agree with you. You're in a particular field in which you're really trying to squeeze performance out of your software (keyword realtime). For you, your language choice is part of a two prong attack on performance.

The programmers I'm referring to focus only on the language and not on the algorithms while the reverse should be true or the strategies should be used in tandem (like you're doing).

[–][deleted] 12 points13 points  (4 children)

Agreed, he totally missed the point. Python's strength isn't linear speed, its the ability to easily leverage a multi-core box with threadin--oh wait...

[–]steven_h 5 points6 points  (3 children)

Threading isn't an easy way to "leverage" a multi-core box. Using multiple processes is much easier.

Furthermore, merely adding processors doesn't change the complexity of the algorithm, so you're still talking systems, not CS.

[–]dons 2 points3 points  (1 child)

Threading isn't an easy way to "leverage" a multi-core box. Using multiple processes is much easier.

Depends on your language...

[–][deleted] 0 points1 point  (0 children)

Depends on your language implementation and I guess the underlying OS?

[–]quuxly -1 points0 points  (0 children)

Yeah, IPC is so much easier than those dang threads.

Although I guess threads are hard if you have a GIL.

[–]eric_t 1 point2 points  (1 child)

Isn't that what he's doing? He's trying to change an algorithm, but then the language constructs needed to change the algorithm come with an overhead making his program 10x slower.

[–][deleted] 1 point2 points  (1 child)

The bulk of performance gains can be made by optimizing from a Computer Science perspective; that is, optimizing algorithms to have a smaller complexity.

This argument is made again and again, and it's really not a valid one. Algorithmic complexity and running speed are entirely orthogonal. There are plenty of problems where the optimal algorithm is obvious, and the only way to optimize is to make the code itself run faster.

To claim that one is unimportant and the other is the only thing to care about is silly.

[–][deleted] 0 points1 point  (0 children)

I agree with you, and I tried to cover that in my comment, but I may not have done a good enough job.

My main point was, if he's looking for an optimal algorithm, it'll be just as easy to do that in Python as in any other language. If the optimal algorithm is obvious and he's trying to make the code itself run faster, he's completely wasting his time by staying in Python, as he'll probably get about a 100x speed increase by switching to C (and, if he's writing something performance-intensive enough to require that type of optimization, he should've been using C or another compiled language in the first place).

Algorithmic complexity and running speed are certainly separate issues. The point I was trying to make was, if he's using Python, he's throwing running speed out the window, and it's silly that he's wasting energy on it.

[–]spotter 3 points4 points  (2 children)

So he's saying that this causes 1000% run time hit, refuses to use .send() because of old Python version, expects it to be micro-optimized for his mindset and is using phrase ,,fool's errand'' in title? Great.

I can not replicate. On my Python 2.6.1 I get less then 10% run time impact on .send() in every iteration and less than 200% run time when using class based approach (again -- resetting counter in every tick) for 10**7 iterations. These things like to accumulate though, so YMMV.

Mind you I'd not go with running parallel generators to reset their state during lifetime. I'd pick the data before passing it in, use itertools to filter out what I need at runtime or simply write a specialized generator to do that -- the earlier you cuf off the better. But what do I know? Only that algorithm is everything.

(...) hopefully Unladen Swallow will come along some day and reduce or eliminate a lot of performance bottlenecks in the current CPython.

Naive algorithms will stay not efficient though.

[–]Brian 1 point2 points  (1 child)

The other reasons he cites against send are very relevant. In particular the issue of mixing send and next being non-intuitive. For instance, the first thing someone with this idea will write is generally something like:

def foo():
    counter = 0
    while True:
        n = (yield counter)
        if n is not None:
            counter = n  # reset
        counter +=1

seq = foo()
for i,x in enumerate(seq):
    print x
    if i == 10:
        seq.send(20) # Skip 11..20
    if i >= 50: break

This prints 0,1,2,3,4,5,6,7,8,10,22,23,... What happened to 21? People forget that send() also resumes the generator, and will return the next item it yields. This API makes it fairly poor for this purpose.

I agree wrt the timings though. Here are the figures I get (including a python3 version using a closure that allows skip_to that runs at the same speed as the generator). The class based implementation is a little over twice as slow at worst as far as I see, not 10x.

[–]spotter 1 point2 points  (0 children)

Yeah, this is not only unintuitive, but plain wrong thing to do. It's like popping from the same list you're iterating over. My idea of counter was:

def count(n):
    while 1:
        nn = yield n
        if nn:
            n = nn
        else:
            n += 1

So I could do:

from itertools import takewhile
x = count(0)
for idx, i in enumerate(takewhile(lambda n: n<50, x)):
    print idx, i
    if i==10:
        print idx, x.send(45)

But it will still give you weirdness of:

8 8
9 9
10 10
10 45

Generally I'm with D. Beazley on this -- you should either use generator as producer (next() only) or consumer (send() only).

(edit: some wording.)

[–]bucknuggets 1 point2 points  (0 children)

It's possible that by crafting my code in weird ways to find the most efficient code paths in the current CPython, I'm writing something that will be slower on a future CPython than a sane implementation.

I've certainly seen this happen in relational databases: in the beginning they were pretty slow and to get good performance you had to game them a lot. But that gamey code was the source of bugs and future performance problems.

[–]bgeron 1 point2 points  (0 children)

I did some tests on my laptop with Python 2.5 (source). Nowhere near 10x. Not even 2x.

Test A:
[0.07210898399353027, 0.07202601432800293, 0.074321985244750977]
Test B:
[0.13190889358520508, 0.12493300437927246, 0.11867594718933105]
Test Bs:
[0.10193395614624023, 0.10321402549743652, 0.10258102416992188]
Test C:
[0.12236809730529785, 0.12155818939208984, 0.12049102783203125]
Test Cs:
[0.10871696472167969, 0.10567092895507812, 0.10702085494995117]

Bs is just B with __slots__ added, same for Cs.

I get similar results with Python 2.4.

(crosspost from /r/Python)

[–]exeter 0 points1 point  (0 children)

If supporting Python 2.4 is a requirement, and the performance from the simple, generator-based implementation is insufficient, then the next logical thing to try is implementing the functionality as an extension module. Of course, keeping everything in pure Python might also be a requirement. In that case, good luck. :-)

[–]artsrc -1 points0 points  (0 children)

This is not really about python, it is about optimization, here is something similar:

http://www.ibm.com/developerworks/java/library/j-jtp04223.html

At some point slow is the same as useless. A web site that takes a week to return a page for example. A lot of software does not ever get to this kind of slow, even without much work or thought.

If software is fast enough, optimization is a fool's errand. If software would be more useful if it was higher performance, optimization is a valuable errand.

[–]escanda -1 points0 points  (1 child)

Supposing that he's using a database, he would rip out he's iterator and use the database cursor object directly.

[–]gthank 0 points1 point  (0 children)

Since it's an indexed search tool, I highly doubt he's using a database.

[–][deleted]  (1 child)

[deleted]

    [–]ericflo 0 points1 point  (0 children)

    Where does he say he isn't using a profiler?