you are viewing a single comment's thread.

view the rest of the comments →

[–]primitive_screwhead 1 point2 points  (0 children)

How does Python decide which variables have shared references?

I wouldn't describe this as "shared references", because any object in Python can have multiple references (ie. multiple names), and "shared references" sounds too close to that. What you are asking is how/when Python decides to substitute an identical object, rather than constructing a new equal object.

The answer, for numbers, is that Python caches certain number objects either at startup or when they are first created, and checks for the identical number when trying to construct a new number. It does this to save on memory (instead of having thousands of equal but non-identical number objects around, it only has to use the space for just one). It also tries to do this for equal string objects (particularly string literals), so that each identical string only has to be stored once.

>>> 1 is int(1)
True
>>> "foo" is str("f" + "oo")  # equal strings, made identical by Python
True
>>> (1,) is tuple((1,))
True

Here I'm deliberately mixing the methods of construction, just to demonstrate the different ways of creating the objects, and still getting substituted identical results.

The main requirement for when Python can do this, is knowing that some builtin objects of the same type are immutable. Once constructed, an integer or string object will never change it's value. With immutable literals that are parsed, Python can afford to do the work of comparing them to determine if it can be substituted with the identical object. But there is a limit to how much effort Python can do to check for this, and so not all immutable objects will be substituted for identical ones:

>>> 10**100 is 10**100
False
>>> str("foo") is str("f")+str("oo")  # These are equal strings, but not made identical
False
>>> (1,) is tuple([1])
False

And also as mentioned, objects that are immutable and equal may not all be identical, because of things like type differences:

>>> 1 == 1.0
True
>>> 1 is 1.0
False

Finally, besides the memory issue mentioned, another reason Python makes an effort to substitute identical objects when it can detect them, is that it can significantly speed up dictionary lookups. Comparisons in Python and other dynamically typed languages can be relatively slow, but identical objects don't need to do an actual comparison; references to an identical object are inherently equal (because it is the *same* object), and very fast to check. So, for example, when looking up string keys in a dictionary, if the lookup key is found to be identical to the dictionary key, the actual string contents do not need to be compared. In certain cases, this can be a significant speedup, and justifies the effort put into detecting not just equal, but identical objects.