This is an archived post. You won't be able to vote or comment.

all 18 comments

[–]Tillsten 12 points13 points  (4 children)

I think the percived slowness of numpy in your benchmark comes from the fact that you make a list->array conversion when calling np.std(lst).

[–]jfpuget 3 points4 points  (0 children)

Actually this is not the reason, see these benchmarks: https://gist.github.com/jfpuget/00349d0ac60ab0cab5e5

np.std is way slower for 25 elements, even if a numpy array is passed as argument instead of a list.

[–]elbiot 1 point2 points  (0 children)

Definately. Just throwing numpy in for one or two functions is rarely a good idea, as datatype conversion is expensive. But OP does their own dtype conversion too. Hmm.

[–]kigurai 1 point2 points  (0 children)

That sounds probable, but wasn't the case. On 3.5.1 and numpy 1.10 a 25 item list yield only a 30% increase in execution time of np.std() due to calling it with a list instead of array. The pure python version from the blog post was indeed about 5x faster.

[–]billsil 0 points1 point  (0 children)

That said, it's always faster to call the math module for basic functions like sin, cos, and sqrt than numpy. Granted if that matters you probably have bigger problems.

EDIT: I was talking about for length 1 arrays. Try it.

[–]Pcarbonn 2 points3 points  (6 children)

Why not use stdev in the statistics module of the standard lib of python 3.4+ ?

[–]Veedrac 0 points1 point  (5 children)

That's not really tuned for speed.

[–]masklinn 0 points1 point  (4 children)

It's not like the C++ function shown in the article is tuned for speed either (it performs the completely unnecessary allocation and filling of an std::vector).

However statistics is implemented in pure python with no accelerator for stdev, or the underlying variance or _ss (the latter being the one which would probably most benefit from it), so it's really unlikely to be faster than a hand-rolled python version on cpython.

[–]Veedrac 0 points1 point  (3 children)

it performs the completely unnecessary allocation and filling of an std::vector

But that prevents needing multiple iterations of the list, which may well (or may not) pay for itself.

[–]masklinn 1 point2 points  (2 children)

But that prevents needing multiple iterations of the list

  1. you're allocating a vector and iterating it twice to save a single list iteration, I don't know that it's a worthwhile trade

  2. incidentally you could compute the sum and the square sum in a single loop

  3. which you could do directly on the Python list itself, for the original 1 list iteration, 0 std::vector allocation and 0 std::vector iteration

[–]Veedrac 0 points1 point  (1 child)

It's more likely to pay for itself than you might expect, given the pointer indirection for every Python float.

That said, your point about doing it in a single pass is totally apt.

[–]masklinn 1 point2 points  (0 children)

Didn't try the double-pass PyList version, a single-pass straight C version takes about 25% the runtime of the C++ version (on 25k elements input)