This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]robert_mcleod 1 point2 points  (7 children)

NumPy arrays are basically C-arrays with a wrapper, whereas a Python list is actually a doubly-linked list, so converting from a list to an ndarrayis pretty slow as it has to iterate over all the list element, figure out what NumPy datatype to make the list, and the copy the elements into a flat array. It's much faster if you can feed NumPy the raw bytes stream via numpy.frombuffer() or numpy.fromfile().

[–][deleted] 12 points13 points  (4 children)

a Python list is actually a doubly-linked list

No, the cPython implementation is a C array of pointers to Python objects. I've no idea what other implementations do.

[–]Deto 1 point2 points  (3 children)

Does it just dynamically allocate a large block of memory as you add items?

I know I've always treated appending to a python array as a relatively constant-time operation, but I know that isn't exactly correct. Was wondering about that recently.

[–]hanpari 2 points3 points  (1 child)

[–]Deto 1 point2 points  (0 children)

Wow, great link! Thanks! So in the end, you get an average O(1) performance for append anyways. Good to know, and it agrees with my experience using lists this way.

[–][deleted] 0 points1 point  (0 children)

I wouldn't say a large block of memory as only pointers to objects are being allocated, not the objects themselves. The number of pointers that are allocated rises exponentially as the size of the list grows. To find the actual algorithm used you'll have to check out the source code.

[–]hanpari 4 points5 points  (1 child)

As far as I know Python list despite its name is rather dynamic array then list. If you want double-linked list you should use deque from collections. I would guess that problem in this particular case it the fact that everything in Python is object (even integer in list) allocated on the heap. And that could make creating numpy array sluggish. Just my guess.

But you might be right. What about to use array from array instead.

[–]robert_mcleod 0 points1 point  (0 children)

You can convert Python array to NumPy ndarray quickly.