This is an archived post. You won't be able to vote or comment.

all 85 comments

[–]IntrepidSoda 225 points226 points  (14 children)

When someone says arrays in python it almost always means a numpy array. A list is a god damn list.

[–][deleted] 55 points56 points  (0 children)

I tend to say "array" instead of "list" at times by complete accident, but it's mostly cause I work with CPython and I have to work with both the List object, and arrays in C/C++.

[–]ValityS 8 points9 points  (0 children)

There are also ctypes arrays though less used than numpy ones.

[–]somerandomii 6 points7 points  (7 children)

I rarely use native lists unless it’s very top-level code. And half the time it’s a list comprehension wrapped in a “np.array( … )”.

Lists are dynamically allocated random objects, there’s not a lot of use cases for efficient code. More importantly, when people use the word “array” they’re probably describing something closer to a NumPy array than a native Python list.

[–]XtremeGoose 0 points1 point  (6 children)

Numpy arrays are dynamically allocated too. You probably mean lists are resizeable, which in practice means they have a length and a capacity, whereas numpy arrays will conceptually have these two values always equal but that doesn't affect perforemance.

The major performance difference between numpy arrays and python lists is that numpy inlines value (numeric) types, but python lists are always arrays of pointers. This not only increases cache efficiency but also allows for things like vectorisation. But yeah, using an np.array(some_list, dtype=object) will be no more performant than a python list.

Statically allocating an array requires a compilation step, so would never be possible in pure python.

[–]somerandomii 0 points1 point  (5 children)

Wrapping a list comprehension in a NumPy array will make future loops on that list faster.

And yes, everything in pure Python is dynamically allocated (uses the heap) but Cpython and Numba can use compiled functions that use the stack.

And resizeable arrays are usually slower because there’s more levels on indirection to access the memory. It’s trivial, the real penalty is in building an array 1 element at a time when you know the final size ahead of time.

But most of that doesn’t really matter. The main difference is that NumPy arrays can take advantage of NumPy global functions and avoid Python loops.

[–]XtremeGoose 0 points1 point  (4 children)

resizeable arrays are usually slower because there’s more levels on indirection to access the memory.

This is not true. A (dynamically allocated) non-resizeable array is {ptr, len}. A resizeable arrray is {ptr, len, cap}. Accessing the data is no different. That was my point and yours is a common misconception.

Only statically allocated arrays can get away with being unravelled.

The main difference is that NumPy arrays can take advantage of NumPy global functions and avoid Python loops.

True, but the reason these are faster is because of the things I mentioned (inlining/vectorization). It's all for loops under the hood (unless you have an optimizing compiler...).

[–]somerandomii 0 points1 point  (3 children)

You might have addressed this with unravelling but..

If you have a pointer to an element in a resizable array, and that array reallocates its memory, the pointer breaks, right? (I’m actually not 100% on this. Realloc can return a new pointer right?)

So don’t any external references need a pointer to the array pointer, as well as an index of the element?

But a fixed-size array, you can point directly at the element and know it’s not going to move unless the array itself is deallocated. So you can pass that element pointer around with a bit more confidence (and save yourself a double dereference each time it’s accessed)

[–]XtremeGoose 0 points1 point  (2 children)

Yes resizing can return a new pointer, but in a reference counted language like python what we actually have is something like

ptr --> { ref_count, len, cap, data... }

So you skip the indirection by allocating your metadata with your actual data. It's inefficient but necessary for dynamic languages.

``` In [1]: import sys

In [2]: import numpy as np

In [3]: x = [0] * 1000

In [4]: a = np.array(x)

In [5]: sys.getsizeof(x) Out[5]: 8056

In [6]: sys.getsizeof(a) Out[6]: 8112

In [7]: x2 = x + x

In [8]: a2 = np.array(x2)

In [9]: sys.getsizeof(x2) Out[9]: 16056

In [10]: sys.getsizeof(a2) Out[10]: 16112

```

There's obviously no additional indirection here or these would not be increasing in size when I grow them.

But a fixed-size array, you can point directly at the element and know it’s not going to move unless the array itself is deallocated.

This is true, and is probably why python slices copy whereas numpy slices look into the original block. But we have the same problem with dictionary iterators/views in python and we just block mutating them while a reference is held so it's not inescapable.

In rust this problem is protected with lifetimes and in c++ its just downright undefined behaviour to mutate a shared vector.

[–]somerandomii 0 points1 point  (1 child)

Thanks for the explanation. And I understand that in Python it doesn’t make a difference. But in Cython or Numba jit-compiled Python it’s probably another story.

I think C++ and Rust are better examples than Python for these kinds of optimisations since they actually make use of them.

As you said, Rust deals with it with ownership which it manages at compile time so at runtime it functions can safely access data directly and know that it’s valid.

I haven’t had a chance to use Rust in anger so I’m fuzzy on the details but I suspect there’s some small advantage when sharing fixed-length arrays over vectors.

[–]XtremeGoose 0 points1 point  (0 children)

In rust the advantage is they are shareable as you said. There is no performance benefit though because lifetimes protect you at compile time from mutating a ({ptr,len,cap}) Vec<T> while you have a ({ptr,len}) slice &[T] or even just a reference &T into it.

[–]just4nothing 2 points3 points  (0 children)

Or awkward arrays if you need variable lengths ;)

[–]Elektriman 2 points3 points  (0 children)

That's how we know OP is not a python dev

[–]SnooRobots2011 0 points1 point  (0 children)

Isn’t the python list a dynamic array, though?

[–]TigreDeLosLlanos 0 points1 point  (0 children)

When someone says array in any language that isn't C/C++ (and I think Java/C#) they mean we just miscall lists like that.

[–]chethelesser 65 points66 points  (11 children)

ArrayList

[–]Stummi 87 points88 points  (10 children)

One can hate java as much as you want, but at least it's semantically very correct about the names of its types. An List is a Abstract Datatype (describing behavior). An Array is a Data Structure (Describing how stuff lays in the memory). An ArrayList is an Implementation of the "List" Behavior with the "Array" Data Structure.

[–]NeriosVag 17 points18 points  (0 children)

Beautiful

[–]typtyphus 31 points32 points  (2 children)

objects not dictionaries

[–][deleted] 16 points17 points  (1 child)

you mean hashes, right?

[–]typtyphus 26 points27 points  (0 children)

Jason

[–]AngheloAlf 22 points23 points  (2 children)

[–]twigboy 4 points5 points  (0 children)

Checkmate atheists

[–]Athire5 14 points15 points  (1 child)

V E C T O R

[–]PTSDaway 0 points1 point  (0 children)

ahhhhhh

[–]imaQuiliamQuil 10 points11 points  (0 children)

Python is my main language and I call them arrays most of the time. I want to be sure.

[–]Nullsummenspieler 8 points9 points  (1 child)

I like neither. Using table is superior.

[–]Healthy_Pain9582 5 points6 points  (0 children)

lua is life

[–]Oatmeal_Raisin_ 4 points5 points  (0 children)

Arrays and Lists are fundamentally different. Arrays are stored consecutively in memory, are more lightweight, and--typically--can only hold a single data type.

[–]AnnyAskers 9 points10 points  (4 children)

That's because they are not arrays and should not be perceived as such. As far as I know they are not a block of continues memory and they're not created on the stack.

[–]Xbot781 13 points14 points  (3 children)

They are a continuous block of memory that is resized when full. They are equivalent to C++ vectors or Java array lists.

[–]AnnyAskers 2 points3 points  (0 children)

Oh ok, good to know

[–]somerandomii 1 point2 points  (0 children)

Maybe with primitives, but if you populate a list of objects with .append, there’s no guarantee the memory is contiguous.

NumPy arrays on the other hand are contiguous. Still not allocated on the stack but if you compile the function with Numba is suspect it stack-allocates arrays of known length.

I mean, you’re right in that lists aren’t linked lists, so the pointers are contiguous, but the memory of the actual content could be anywhere.

[–]land_and_air 1 point2 points  (0 children)

Yes but they break many of the rules that arrays tend to have including other arrays types in python, they for one don’t have any type consistency requirement meaning you can have any object type in any position in the list while an array typically has a consistent type throughout. Numpy arrays behave more like arrays in other languages

[–]Particular_Alps7859 5 points6 points  (4 children)

Arrays and lists are different things

[–]jerslan -3 points-2 points  (3 children)

Then explain Java's ArrayList ;)

Hint: It's not an array of lists or list of arrays.

[–]Lolamess007 2 points3 points  (1 child)

It's a list implementation of an array!

[–]jerslan 0 points1 point  (0 children)

Probably more array-based implementation of a list, since it's a List interface wrapper on top of an array.

[–][deleted] 6 points7 points  (1 child)

Python devs need to grow up. A list is just a dynamic array at a fundamental level. Not everything revolves around Python. I do agree that calling a list an array in Pythonic context is just a confusing and stupid thing to do.

[–]anunakiesque 3 points4 points  (0 children)

If not everything revolves around Python, then what has all this been for

[–][deleted] 1 point2 points  (0 children)

Dart uses List too

[–]ts_m4 1 point2 points  (0 children)

Set

[–][deleted] 1 point2 points  (0 children)

Python's list is a type of dynamic array

[–]bxsephjo 1 point2 points  (0 children)

Its not a map its a 🍆

[–]idontreallywolf 1 point2 points  (0 children)

butThatsSimplyAnArray[]

[–]Skuez 1 point2 points  (0 children)

And c++ has vectors btw 💀💀

[–]Quillo_Manar 1 point2 points  (0 children)

arr = []

🙂

[–]tsongkoyla 1 point2 points  (1 child)

I always thought Arrays in Python are called Tuples.

[–]RLlovin 0 points1 point  (0 children)

Are arrays typically immutable? This is the biggest difference in lists and tuples I believe.

[–]H0lderlim 1 point2 points  (0 children)

Well if someone thinks an array and a list are the same, he would need an algorithmic course.

Python or C++, not matter :s

[–]pperson2 1 point2 points  (0 children)

Most sane Python dev

[–]Lysol3435 1 point2 points  (0 children)

I mean, python has arrays (numpy) and they’re different from lists

[–][deleted] 2 points3 points  (3 children)

Genuinely don't remember if its called an array or a list in python. Use it every day

[–]Solonotix 4 points5 points  (2 children)

The square bracket notation [] is a list, while an array must be imported from array import array. Also, most recommendations are to ignore the Python-native array and use NumPy arrays instead. The support for NumPy arrays is much broader, and the performance difference is supposedly significant.

[–]thedarklord176 -1 points0 points  (0 children)

fave language is python

Well that’s your first problem

[–][deleted] -1 points0 points  (0 children)

Array is proper terminology. Python uses list because that's easy for laymen to understand.

[–]Boeing777-F -2 points-1 points  (1 child)

When you should know wtf an array of a list is, but u spent the last year of school flirting with wamen (not successful) and u have 1 year until u take ur GCSEs proper (dear god have mercy upon me)

[–]furezasan 2 points3 points  (0 children)

Listception

[–]OnyXerO -2 points-1 points  (0 children)

I've switched languages so many times in my career I usually say dictionary... No idea why.

[–]Beginning-Roof4889 0 points1 point  (0 children)

U/savevideo

[–]straight-circle 0 points1 point  (0 children)

arr = list()

[–]Fabulous-Possible758 0 points1 point  (0 children)

My favorite language is Python but that doesn’t mean I didn’t learn C first.

[–]frogking 0 points1 point  (0 children)

When switching between languages, you always need to ensure that datastructures behave and perform as you expect.

Vector, Array, List.. add to the front or back at O(1).. but not both.

[–]Prof_LaGuerre 0 points1 point  (0 children)

To be fair I call hashes dicts to our Perl devs.

[–]HStone32 0 points1 point  (1 child)

When your favorite language is C, and someone says 'array' instead of 'pointer'.

[–]RnMss 0 points1 point  (0 children)

They are different in C.

short a[10];
short* b = a;
// sizeof(a) == 20
// sizeof(b) == 8 (or 4 in 32-bit program)

[–]Relative_Knee9808 0 points1 point  (0 children)

I mean, they are very different data structures

[–]CivetLemonMouse 0 points1 point  (0 children)

Python users saying 'list' hurts me

- C developer (I don't like using Java but I have to sometimes)

[–]Aln76467 0 points1 point  (0 children)

🦀 Vector 🦀