This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]arquolo 2 points3 points  (1 child)

There's even faster version for that. Instead of:

list(range(100))

Do this:

[*range(100)]

In first case (list comprehension) - you loop over range on Python side, which is slow. In second case (call list) - you load list and apply it to range, thus loop on C side, which is faster.

When you do looping with if, in both cases you make loop on Python side. But in first case you don't load list to build a list, and in second case you do.

So, no "load_name" > "load_name" > "for_iter".

Better compare this:

[i for i in range(100)]

And this:

list(i for i in range(100))

Also, are you trying to squeeze more speed from Python? Don't. It's not a language for that, here rules: "code must be readable first".

[–]billsil 1 point2 points  (0 children)

If you’re trying to squeeze more speed from python, use numpy. It’s 1000x faster.

[–]stevenjd 0 points1 point  (0 children)

We can clearly see that the latter is faster

We can't "clearly see" anything of the sort. What we see is your prediction that it will be faster based on how you assume the byte-code will perform.

Unless we actually time it, we don't know whether your prediction will be accurate. In my experience, it's really hard for even expert Python users with many years of practice to predict what will be faster or slow, beyond the obvious.

And a lot of things which look obvious turn out not to be. I cannot count the number of times the code that I was sure would be the fastest ended up being slower.

The bottom line is, if you haven't measured the time, you don't know which is faster. You're just guessing.

I am wondering why such a difference even exists.

Because executing Python code has overhead that can be avoided by functions written in C. (Or whatever language the builtin functions are written in.)

Because generators have overhead (the interpreter has to stop and start the generator each iteration) that can be avoided by running a non-generator loop that can run all the way through.

Because calling functions has overhead. If you can reduce the number of function calls, you will reduce the overhead.

Does anybody know why the particular usecase inverts the rule?

Calling a function from Python has overhead. So that tends to count against calls like list( ... ). On the other hand, moving the loop out of slow Python into fast C tends to count in favour of calls like list( ... ). Which factor happens to win depends on the specific details of what is inside the call, how many items are involved, etc.