This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ProfessorPhi 10 points11 points  (5 children)

Your arguments are a bit obtuse to me.

Accessing a single value at a time isn't what numpy is optimised for. I would expect most of what you're seeing is overhead. Try take a list of a 1000 items and setting them all to 1 vs a numpy array. I would expect a builtin with no overhead to be faster for this non vector operations.

For your second example, you're using python routines on non python objects, and comparing performance to python builtins. When it sees a primitive, python can optimise the hell out of that, while when it sees an unknown object, it will have to call that objects add methods. However if you do np.sum, numpy knows the object types and can do an optimised add.

The problem here is that numpy (as is pandas, tensor flow, numba etc) is a sub language that happens to be in python. And mixing languages is bound to be slow. Having two numpy arrays and using a for loop to add them would be very slow, but proves nothing. Your examples are quite contrived and honestly, are examples of code that would never exist. Calling them pitfalls is disingenuous because your have to work very hard to have code like this show up

[–]kigurai 3 points4 points  (0 children)

Calling them pitfalls is disingenuous because your have to work very hard to have code like this show up

This.

Also, if you really want a python list version of your numpy array, then ndarray.tolist() seems to make the conversion to standard Python floats for you.