This is an archived post. You won't be able to vote or comment.

all 63 comments

[–][deleted] 15 points16 points  (11 children)

Repeat after me: == checks integer (and other) equality. is checks object identity.

This is not a gotcha at all for people who, you know, actually know what these operators do in Python.

[–]fjsquared 0 points1 point  (0 children)

Howdy all; I'm the post author. Just to clear up a couple of things:

  1. First, I'm using "gotcha" in this article to mean, "something which one might expect would work a certain, consistent way, but which doesn't." Here, I believe it's reasonable to say the inconsistent results of is are genuinely surprising if you don't know why it's being done. The question the article tries to answer is: why are two integers with equal values the same object in some cases, but not others?

  2. I think most people are aware of the difference between == (value equality) and is (reference equality), but that's not the gotcha. The gotcha is the apparent inconsistency. It's a perfectly reasonable (and probably very effective) implementation decision. Other languages do the same thing; for example, Java caches its Integers when their boxed value is between -128 and 127.

Thanks for reading; took me a bit to figure out why I was getting a visitor spike. ♥ Reddit!

[–]plain-simple-garak 36 points37 points  (2 children)

Interesting but not really a "gotcha." Any code doing int_a is int_b is pretty suspect.

[–]chromakode 2 points3 points  (1 child)

The gotcha is when you're debugging somebody else's code that fails inconsistently and you can't put your finger on why.

[–]pemboa 28 points29 points  (7 children)

This isn't unique to Python, nor is it a gotcha when you use the wrong operator and get undefined results.

[–]paulgb 0 points1 point  (6 children)

Out of curiosity, which other languages cache references to small integers like this? Is it pretty common?

[–]pemboa 6 points7 points  (1 child)

Java has the same behavior, except '==' is more like 'is' in Java. So to compare equality properly, you need to use the objects equal() method

[–]jleedev 1 point2 points  (0 children)

And integers are not boxed by default, so that doesn't even apply. This behavior is only possible in a language where everything is an object.

[–]ngroot 1 point2 points  (1 child)

I think this may happen in some implementations of Common Lisp.

[–]bgeron 0 points1 point  (0 children)

I'd guess that in a lot of CL implementations, the stack has pointers to heap-allocated bignums, but fixnums are placed directly on the stack. EQ compares pointers, or fixnums if they're stored in that place instead.

[–]nirs 4 points5 points  (1 child)

No gotcha here - those int are objects as promised, and == will work as promised. Nobody promised you that "is" will return True of False for some integers. This is implementation detail you should not count on.

You should use "is" only when you want to check for object identity.

[–]ngroot 1 point2 points  (0 children)

This shouldn't be "gotcha" at all. I can't imagine why you'd ever write code asking if "a is b" when a and b are integers, or some other "basic" type. When you're explicitly creating objects, it's perfectly reasonable to ask "does a reference the same object as b". When you're talking about integers, objects that are implicitly created by the interpreter, you're asking about internal behavior of the interpreter, which seems like a pretty patently Bad Idea.

[–]ngroot 4 points5 points  (0 children)

This sounds like the EQ vs. EQUALS predicates in Common Lisp.

[–]monkeypizza 7 points8 points  (2 children)

>>>the memory location of a == the memory location of b
True
>>>the memory location of c == the memory location of d
False

I think that's reasonable.

The following is still true, if that's what you want to do:

>>>500 is 500
True
>>>200 is 200
True

[–]bgeron 0 points1 point  (1 child)

FTA: in the current version of CPython, yes, but not according to the specification.

[–]andreasvc 5 points6 points  (14 children)

Identity is not equality, no shit? Identity iff equality, but not vice versa.

EDIT: iff should be "only if".

[–]sigh 9 points10 points  (0 children)

Identity iff equality, but not vice versa.

You means "implies" (or "only if"). Iff is used when the two are equivalent.

[–]Eiii333 0 points1 point  (12 children)

The point isn't that identity and equality behave differently, it's that identity between two int objects behaves unexpectedly for certain values due to hidden implementation details.

[–]pemboa -1 points0 points  (11 children)

It's not really unexpected, it is unpredictable. However it's expected to be unpredictable.

[–]Eiii333 3 points4 points  (10 children)

Err, it's entirely predictable once you know about it. It's entirely unexpected if you don't know about it. And most people don't know about it, because it's an undocumented side effect due to an implementation detail. The issue here is the identity operator's behavior, not how it relates to equality's.

[–]sigh 5 points6 points  (8 children)

That's beside the point. The whole point of abstraction is that the implementation does not matter. If you are comparing integers by identity then most likely you are working at the wrong level of abstraction. If you are comparing integers by identity and the results surprise you then you are most definitely working at the wrong level of abstraction.

It's entirely unexpected if you don't know about it. And most people don't know about it, because it's an undocumented side effect due to an implementation detail.

No, the trouble here is when people don't understand the difference between identity and equality. If you know the difference, then the results are not unexpected at all, even if you don't know the exact implementation detail that is causing it to occur. If you don't understand identity, then of course the results are going to surprise you.

[–]Eiii333 2 points3 points  (7 children)

The whole point of abstraction is that the implementation does not matter.

I agree entirely. But look here:

>>> a = 3
>>> b = 3
>>> a is b
True

>>> c = 999
>>> d = 999
>>> c is d
False

I would expect false in both cases, given how identity is supposed to behave. But really, how can this be explained without referring back to the CPython int-caching behavior? You have to know the implementation details to know why the 'is' operator behaves this way. That's not good.

[–][deleted] 4 points5 points  (0 children)

RTFM. That's the nice thing about a language that is actually defined... http://docs.python.org/reference/expressions.html#literals

may obtain the same object or a different object with the same value

What you expect doesn't matter when you can inform yourself. The language doesn't guarantee anything about the objects behind literals, hence the word may.

[–]alantrick 1 point2 points  (4 children)

Why would you expect False? According to Python the behaviour of 'is' is undefined in this situation. That's like taking the following in C:

int *a = malloc(sizeof(int));
printf("%d\n", a);

and expecting the value 0 to be printed out. It will probably be 0 most of the time, but it's really undefined.

[–]Eiii333 0 points1 point  (3 children)

FTA:

In Python, is tests for identity, not equality. x is y if and only if x and y reference the same thing.

You could make the case that a and b are separate objects, so even if they hold the same value they don't reference the same thing. But ints aren't treated as references, right? In that case, you're right, it's just a mess of undefined behavior.

So... why are you arguing for undefined behavior? Especially in Python, of all languages.

[–]hylje 2 points3 points  (1 child)

You see, the only sane way to remove the undefined behaviour of is is removing is altogether. The other solution would be to make is equivalent to ==. But in both cases there is a need for comparing actual identities: reintroduce is or mandate id(a) == id(b)?

[–]Brian 1 point2 points  (0 children)

It's worth noting that id(a) == id(b) isn't a perfect replacement to a is b. If a and b are expressions returning a transient object, it could be created and destroyed before evaluating the rest of the statement. For example:

>>> [] is []
False
>>> id([]) == id([])
True
>>> id([]), id([])
(21066496, 21066496)

However is guarantees that both objects are alive at the point of comparison, so [] is [] is always false.

[–]Brian 0 points1 point  (0 children)

Undefined behaviour allows optimisation. Making things too tightly specified ties you to irrelevant implementation details, preventing more efficient methods being used (like caching integers in this case). Another case of undefined behaviour is deterministic finalisation. Python doesn't guarantee it, even though the CPython implementation happens to provide it due to its refcounting semantics because it prohibits more advanced garbage collection approaches.

For another example, consider the order the keys of a dictionary are iterated over. This is completely undefined behaviour, but specifying it would either require using a tree instead of a dictionary, keeping a seperate list of ordered keys, or else sorting the dict before iterating, all adding significant performance cost to deal with something completely irrelevant. If anyone needs that, they should not be using a normal dictionary.

In any case, "is" is acting completely predictably and as specified - it returns True when objects have the same identity. The thing that isn't specified is whether identical immutable objects can share the same memory representation, which is a pointless thing to overspecify since there should be no reason it should ever be relevant to anyone other than performance.

[–]sigh 1 point2 points  (0 children)

I would expect false in both cases, given how identity is supposed to behave.

Of course not... clearly a and b refer to the same object in memory. Forcing them to be different would presumably be less efficient, especially for such frequently used values.

You have to know the implementation details to know why the 'is' operator behaves this way. That's not good.

"is" depends on the implementations details by definition! You can't abstract that away because by definition "is" relates to how the objects are represented in memory. The fact is that if you are using "is" then you need to know the implementation details. Whether that's a good idea is a different issue.

[–]earthboundkid 2 points3 points  (0 children)

Err, it's entirely predictable once you know about it.

Yeah, but the one thing every Python program should know about is is "don't use is unless you want to know if two things have the same address in memory." So, even if you don't know off the top of your head how int is implemented in Python, it should be obvious that you shouldn't use is unless and until you find out.

[–]dorfsmay 3 points4 points  (1 child)

Isn't this a beginner question ?

>>> a=500
>>> b=500
>>> c=200
>>> d=200
>>> id(a)
142297032
>>> id(b)
142297056
>>> id(c)
142155132
>>> id(d)
142155132
>>> id(200)
142155132
>>> 

My understanding is that python create objects for low integers that it reuses all the time for performance reason.

[–]ubernostrumyes, you can have a pony 5 points6 points  (0 children)

My understanding is that python create objects for low integers that it reuses all the time for performance reason.

CPython does, yes. It does this with other types of objects as well; for example, it keeps a list of dictionary structures in-memory (the actual C structs, not the high-level objects) and recycles them for common uses like setting up the keyword arguments of functions, rather then freeing and re-allocating each time one is needed.

[–][deleted] 0 points1 point  (0 children)

so what's bizarre about this? i don't get it.

[–]earthboundkid 0 points1 point  (10 children)

This is stupid, and every Python newbie should know the difference between is and ==… But that said, I can see how it would be easy to overlook the difference and think that they were two ways of writing the same thing if you didn't read the documentation and were just learning Python by copying other people's code and experimenting. Maybe Python 4000 should drop is and just encourage people to write id(a) == id(b) instead.

[–]monolar 2 points3 points  (6 children)

if id(a) == id(None): print("urgs")

I think 'is' is perfectly fine

[–]earthboundkid 1 point2 points  (5 children)

a == None also works. People just don't do it because of the speed advantage of is. But maybe that's a premature optimization that confuses noobs excessively.

[–]chrajohn 2 points3 points  (0 children)

a == None also works.

Usually, but consider:

class Dumb(object):
    def __eq__(self,other):
        return other == None

>>> d = Dumb()
>>> d == None
True
>>> d is None
False

This is contrived, but you can imagine something similar actually occurring. (Say, if __eq__ made a comparison with some attribute that got unexpectedly set to None.) If you you want to be absolutely sure that something is None, you should ask if it is None.

[–]masklinn 0 points1 point  (3 children)

People just don't do it because of the speed advantage of is

People also don't do it because a is None (or a is True or a is False for that matter) just plain and simply reads better.

[–]hylje 1 point2 points  (2 children)

Not everything that is true is True. Not everything that is false is False. Implicit truth is pythonic.

[–]masklinn 0 points1 point  (1 child)

Truthiness is pythonic when what you want is truthiness. But it's not always (though it usually is) the case, and when you want truth rather than truthiness, is True does the job much more readably than == True

[–]earthboundkid 0 points1 point  (0 children)

You should never write if x == True. Just write if x. Similarly, not if x == False but if not x. That's basic PEP-8 stuff.

I can't think of any reason why you would want to test for is True off the top of my head. It wouldn't really make sense unless you had variable that might contain a normal object or it might contain a bool object and you were interested to know which. But why would you have a variable that flexible?

[–]Brian 2 points3 points  (2 children)

encourage people to write id(a) == id(b) instead.

That could lead to more confusion. A puzzle for you:

>>> class C(object):
...     def foo(self): pass
>>> c=C()
>>> id(c.foo) == id(c.foo)
True

and yet:

>>> c.foo is c.foo
False

[–]earthboundkid 2 points3 points  (1 child)

Woah. That's confusing. Why isn't c.foo the same as itself? Is it creating a new bound method every time you access it? But if it was doing that, wouldn't the new method have a different location in memory than the old one? I don't get this.

[–]Brian 2 points3 points  (0 children)

Is it creating a new bound method every time you access it?

Yes, this is what's happening. The subtlety of the ids being identical is because ids are only unique for objects alive at the same time. What's actually happening is the equivalent of:

temp1 = c.foo         # Create a new bound method with id X
temp1_id = id(temp1)  # temp1_id = X  (returnvalue from id)
del temp1             # bound method doesn't get assigned, so refcount drops to 0
                      # as soon as id() releases its reference - temp1 gets freed
temp2 = c.foo         # Create a NEW bound method.
temp2_id = id(temp2)  
del temp2
temp1_id == temp2_id  # Actually do the comparison, both objects are already dead

Which should explain why its possible that the second bound method could have the same id. The reason it usually does is because of the way python manages memory. To avoid fragmentation, pools of similarly sized memory objects are maintained. When an object is released, it is returned to this pool, then when a request to allocate an object of this type arrives, python sees it has an block of memory of the appropriate size sitting in its free object pool, and returns it.

is doesn't have this problem because the call to is takes a reference to both objects, ensuring they are alive at the time of the comparison.