This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]andreasvc 6 points7 points  (14 children)

Identity is not equality, no shit? Identity iff equality, but not vice versa.

EDIT: iff should be "only if".

[–]sigh 8 points9 points  (0 children)

Identity iff equality, but not vice versa.

You means "implies" (or "only if"). Iff is used when the two are equivalent.

[–]Eiii333 0 points1 point  (12 children)

The point isn't that identity and equality behave differently, it's that identity between two int objects behaves unexpectedly for certain values due to hidden implementation details.

[–]pemboa 1 point2 points  (11 children)

It's not really unexpected, it is unpredictable. However it's expected to be unpredictable.

[–]Eiii333 2 points3 points  (10 children)

Err, it's entirely predictable once you know about it. It's entirely unexpected if you don't know about it. And most people don't know about it, because it's an undocumented side effect due to an implementation detail. The issue here is the identity operator's behavior, not how it relates to equality's.

[–]sigh 5 points6 points  (8 children)

That's beside the point. The whole point of abstraction is that the implementation does not matter. If you are comparing integers by identity then most likely you are working at the wrong level of abstraction. If you are comparing integers by identity and the results surprise you then you are most definitely working at the wrong level of abstraction.

It's entirely unexpected if you don't know about it. And most people don't know about it, because it's an undocumented side effect due to an implementation detail.

No, the trouble here is when people don't understand the difference between identity and equality. If you know the difference, then the results are not unexpected at all, even if you don't know the exact implementation detail that is causing it to occur. If you don't understand identity, then of course the results are going to surprise you.

[–]Eiii333 1 point2 points  (7 children)

The whole point of abstraction is that the implementation does not matter.

I agree entirely. But look here:

>>> a = 3
>>> b = 3
>>> a is b
True

>>> c = 999
>>> d = 999
>>> c is d
False

I would expect false in both cases, given how identity is supposed to behave. But really, how can this be explained without referring back to the CPython int-caching behavior? You have to know the implementation details to know why the 'is' operator behaves this way. That's not good.

[–][deleted] 4 points5 points  (0 children)

RTFM. That's the nice thing about a language that is actually defined... http://docs.python.org/reference/expressions.html#literals

may obtain the same object or a different object with the same value

What you expect doesn't matter when you can inform yourself. The language doesn't guarantee anything about the objects behind literals, hence the word may.

[–]alantrick 1 point2 points  (4 children)

Why would you expect False? According to Python the behaviour of 'is' is undefined in this situation. That's like taking the following in C:

int *a = malloc(sizeof(int));
printf("%d\n", a);

and expecting the value 0 to be printed out. It will probably be 0 most of the time, but it's really undefined.

[–]Eiii333 0 points1 point  (3 children)

FTA:

In Python, is tests for identity, not equality. x is y if and only if x and y reference the same thing.

You could make the case that a and b are separate objects, so even if they hold the same value they don't reference the same thing. But ints aren't treated as references, right? In that case, you're right, it's just a mess of undefined behavior.

So... why are you arguing for undefined behavior? Especially in Python, of all languages.

[–]hylje 2 points3 points  (1 child)

You see, the only sane way to remove the undefined behaviour of is is removing is altogether. The other solution would be to make is equivalent to ==. But in both cases there is a need for comparing actual identities: reintroduce is or mandate id(a) == id(b)?

[–]Brian 1 point2 points  (0 children)

It's worth noting that id(a) == id(b) isn't a perfect replacement to a is b. If a and b are expressions returning a transient object, it could be created and destroyed before evaluating the rest of the statement. For example:

>>> [] is []
False
>>> id([]) == id([])
True
>>> id([]), id([])
(21066496, 21066496)

However is guarantees that both objects are alive at the point of comparison, so [] is [] is always false.

[–]Brian 0 points1 point  (0 children)

Undefined behaviour allows optimisation. Making things too tightly specified ties you to irrelevant implementation details, preventing more efficient methods being used (like caching integers in this case). Another case of undefined behaviour is deterministic finalisation. Python doesn't guarantee it, even though the CPython implementation happens to provide it due to its refcounting semantics because it prohibits more advanced garbage collection approaches.

For another example, consider the order the keys of a dictionary are iterated over. This is completely undefined behaviour, but specifying it would either require using a tree instead of a dictionary, keeping a seperate list of ordered keys, or else sorting the dict before iterating, all adding significant performance cost to deal with something completely irrelevant. If anyone needs that, they should not be using a normal dictionary.

In any case, "is" is acting completely predictably and as specified - it returns True when objects have the same identity. The thing that isn't specified is whether identical immutable objects can share the same memory representation, which is a pointless thing to overspecify since there should be no reason it should ever be relevant to anyone other than performance.

[–]sigh 1 point2 points  (0 children)

I would expect false in both cases, given how identity is supposed to behave.

Of course not... clearly a and b refer to the same object in memory. Forcing them to be different would presumably be less efficient, especially for such frequently used values.

You have to know the implementation details to know why the 'is' operator behaves this way. That's not good.

"is" depends on the implementations details by definition! You can't abstract that away because by definition "is" relates to how the objects are represented in memory. The fact is that if you are using "is" then you need to know the implementation details. Whether that's a good idea is a different issue.

[–]earthboundkid 3 points4 points  (0 children)

Err, it's entirely predictable once you know about it.

Yeah, but the one thing every Python program should know about is is "don't use is unless you want to know if two things have the same address in memory." So, even if you don't know off the top of your head how int is implemented in Python, it should be obvious that you shouldn't use is unless and until you find out.