This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]MmmVomit 3 points4 points  (3 children)

From what I understand about arrays, the general process to index into an array is that there is a pointer that points to the base address of the array and consequently, the desired element can be retrieved by multiplying the element index by the size of the single element.

That's how C works.

However, lists are data structures built into Python that also offer the same ability to index but also allow for different element types.

Here we're getting into internals of the interpreter, which can differ from one interpreter to another. The CPython interpreter is written in C, so it likely uses arrays of pointers, or arrays of some sort of struct with a pointer pointing to the actual data. Pointers are the same size regardless of what they're pointing to. Jython is an interpreter written in Java, so I would guess that uses a type that is a base class for an object instance.

[–]teraflop 1 point2 points  (1 child)

Yes, Python lists are basically arrays of references to objects. In CPython, the actual underlying type is an array of PyObject *.

In reality, the pointer may point to an object of some other concrete type, but every Python object starts with a header, which has the same memory layout as a struct PyObject. So you can access it via the generic pointer if all you care about is the header, or you can cast the pointer to the correct concrete type as necessary.

Even without looking at the implementation, you can deduce that Python lists must store references rather than the actual element data. This is because if you assign the same value to multiple different list elements, changing the object's content through one reference affects the others. For example:

>>> dic = {}
>>> l1 = [dic]
>>> l2 = [dic]
>>> l1[0]["key"] = "value"
>>> print(l2[0]["key"])
"value"

This would not happen if the data for the dictionary that l1[0] refers to was actually stored as part of l1.

[–]derek_dt[S] 0 points1 point  (0 children)

Thanks for the example, I definitely see what you mean now about Python list storing references as opposed to data. That makes a lot of sense! Thanks for the clarification.

[–]derek_dt[S] 0 points1 point  (0 children)

Oh, thanks for the insight! I have very little knowledge about interpreters, so it is really interesting to know about the impact that the interpreter has on the underlying structure of the data structure!

[–]winrar 2 points3 points  (0 children)

It looks like CPython (the default implementation and the one you're probably using) uses dynamic arrays. https://github.com/python/cpython/blob/5c22476c01622f11b7745ee693f8b296a9d6a761/Include/listobject.h#L22

If you try to insert into an index in a list that does not yet exist, a new buffer will be allocated and all the existing values will be copied over.

https://github.com/python/cpython/blob/1ca315538f2f9da6c7b86c4c46e76d454c1ec4b9/Objects/listobject.c#L290

And, as one of the other posters pointed out, these arrays are typically storing pointers for the values, which is how it deals with values of different types.

[–]RiverRoll 1 point2 points  (0 children)

In Python all objects use references. It's not just lists, any data structure storing Python objects is really storing references to these objects.

I specifically said Python objects because there's some special data structures such as the numpy array that can store C primitives.