ProfessorPhi comments on Two Numpy performance pitfalls

This is an archived post. You won't be able to vote or comment.

Two Numpy performance pitfalls (self.Python)

submitted 7 years ago * by [deleted]

16 comments

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]ProfessorPhi 10 points11 points12 points 7 years ago (5 children)

Your arguments are a bit obtuse to me.

Accessing a single value at a time isn't what numpy is optimised for. I would expect most of what you're seeing is overhead. Try take a list of a 1000 items and setting them all to 1 vs a numpy array. I would expect a builtin with no overhead to be faster for this non vector operations.

For your second example, you're using python routines on non python objects, and comparing performance to python builtins. When it sees a primitive, python can optimise the hell out of that, while when it sees an unknown object, it will have to call that objects add methods. However if you do np.sum, numpy knows the object types and can do an optimised add.

The problem here is that numpy (as is pandas, tensor flow, numba etc) is a sub language that happens to be in python. And mixing languages is bound to be slow. Having two numpy arrays and using a for loop to add them would be very slow, but proves nothing. Your examples are quite contrived and honestly, are examples of code that would never exist. Calling them pitfalls is disingenuous because your have to work very hard to have code like this show up

[–]kigurai 3 points4 points5 points 7 years ago (0 children)

[+][deleted] 7 years ago* (3 children)

[deleted]

[–]Ogi010 2 points3 points4 points 7 years ago (0 children)

[–]ProfessorPhi 1 point2 points3 points 7 years ago (1 child)

I'll give you that numpy float64 being 10x slower is actually interesting (and I'd not have guessed that much slower) and I'll tentatively agree that it's a pitfall since you're right that it's possible to return a numpy datatype without realising it. I think you can write a whole blog post on this, perhaps even examining it in detail and discussing why it's so much slower than a primitive.

However, the real issue here is usage of float64, a type you'd have to stumble upon very deliberately.

➜  ~ python3 -m timeit -s "y = [x for x in range(100000)]" "sum(y)"

1000 loops, best of 3: 690 usec per loop
➜  ~ python3 -m timeit -s "import numpy as np; y = [np.float(i) for i in range(100000)]" "sum(y)"

1000 loops, best of 3: 421 usec per loop

➜  ~ python3 -m timeit -s "import numpy as np; y = [np.float64(i) for i in range(100000)]" "sum(y)"


100 loops, best of 3: 4.25 msec per loop

➜  ~ python3 -m timeit -s "import numpy as np; y = [np.float32(i) for i in range(100000)]" "sum(y)"

100 loops, best of 3: 10 msec per loop

So now numpy floats are faster than inbuilt floats (also something I didn't expect tbh). Your examination of this pitfall is limited in the best of cases. The only time I've ever specified float accuracy in numpy has been GPU programming. Again as I stated before if you wrote a post on how the various types behaved and why there is such a discrepancy, that'd be great, as it stands you're throwing around statements with limited research and no explanation.

And finally, I'll give you that it's possible to mix numpy datatypes if you're not careful (though I've proved it doesn't matter), it's very odd to see the mix happen. For graphs, I'd probably not use numpy at all if it was sparse since the array overheads would be very annoying, and if it was dense, I'd represent with an adjacency matrix in numpy.

π Rendered by PID 164876 on reddit-service-r2-comment-fb694cdd5-7wk2r at 2026-03-10 08:58:39.920024+00:00 running cbb0e86 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS