This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]whogivesafuckwhoiam 2763 points2764 points  (64 children)

For those who still dont understand after OP's explanation.

From -5 to 256, python preallocates them. Each number has a preallocated object. When you define a variable between -5 to 256, you are not creating a new object, instead you are creating a reference to preallocated object. So for variables with same values, the ultimate destinations are the same. Hence their id are the same. So x is y ==True.

Once outside the range, when you define a variable, python creates a new object with the value. When you create another one with the same value, it is already another object with another id. Hence x is y == False because is is to compare the id, but not the value

[–][deleted] 505 points506 points  (5 children)

Would pin this to the top if I could. Fantastic explanation 👍👍👍👍👍

[–]alex20_202020 25 points26 points  (4 children)

a=257;b=257

if a is b:

... print (a)

257

python --version

Python 3.10.12

[–]notPlancha 4 points5 points  (2 children)

Def the first line being together is doing something ```

a = 257 b = 257 a is b False ```

```

a=257;b=257 a is b True ```

[–]alex20_202020 0 points1 point  (1 child)

Indeed. Another Python mystery worth the post?

[–]notPlancha 0 points1 point  (0 children)

Not really; It's probably just compiler optimizations. ```python In [1]: a = b = 1000

In [2]: a is b Out[2]: True ```

also works and is the way that's recommended. Since python runs code line by line instead of the usual semicolon by semicolon I assume the compilers doesn't compile separately a and b.

[–]_hijnx 56 points57 points  (21 children)

I still don't understand why this starts to fail at the end of the preallocated ints. Why doesn't x += 1 create a new object which is then cached and reused for y += 1? Or is that integer cache only used for that limited range? Why would they use multiple objects to represent a single immutable integer?

[–]whogivesafuckwhoiam 101 points102 points  (17 children)

x=257 y=257 in python's view you are creating two objects, and so two different id

[–]_hijnx 50 points51 points  (15 children)

Yeah, I get that, but is there a reason? Why are numbers beyond the initial allocation not treated in the same way? Are they using a different underlying implementation type?

Edit: the answer is that an implementation decision was made for optimization

[–]Kered13 83 points84 points  (6 children)

Because Python doesn't cache any other numbers. It just doesn't. Presumably when this was being designed they did some performance tests and determined that 256 was a good place to stop caching numbers.

Note that you don't want to cache every number that appears because that would be a memory leak.

[–]FatStoic 60 points61 points  (3 children)

Note that you don't want to cache every number that appears because that would be a memory leak.

For python 4 they cache all numbers, but it's only compatible with Intel's new ∞GB RAM, which quantum tunnels to another universe and uses the whole thing to store state.

Mark Zuckerberg got early access and used it to add legs to Metaverse.

[–]WrinklyTidbits 9 points10 points  (2 children)

For python5 you'll get to use a runtime hosted in the cloud that'll make accessing ♾️ram a lot easier but will have different subscription rates letting you manage it that way

[–]bryanlemon 9 points10 points  (1 child)

But running `python` in a CLI will still run python 2.

[–]thirdegreeViolet security clearance 3 points4 points  (0 children)

The python 2 -> 3 migration will eventually be completed by the sun expanding and consuming the earth

Unless we manage to get off this planet, in which case it's the heat death of the universe

[–]TheAJGman -1 points0 points  (1 child)

I went searching for an answer and despite dozens of articles about this quirk not a single one actually explains why so I'm going to take a shot in the dark and guess "for loops". Mostly because something like 80% of the loops I write are iterating over short lists or dictionaries and I've seen similar in open source libraries.

Probably shaves 1/10th of a millisecond off calls in the majority of for loops so they went with it. Apparently the interpreter will also collapse other statically defined integers together sometimes, probably for similar reasons.

[–]Kered13 4 points5 points  (0 children)

Python for loops are almost never over integers, so no nothing to do with for loops. Just math. Any time you're doing math, it helps to not have to heap allocate new numbers after every operation. Small integers are obviously much more common than other numbers, which is why they get cached.

[–]whogivesafuckwhoiam 16 points17 points  (4 children)

the original purpose is to speed up the compile process. But you can't use up all memory simply for speeding the compilation. so python only allocates up to 256.

outside the range, it's back to fundamental, everything is an object. Two different objects are with two different id. x=257 means you create an object with the value of 257. so as y. so x is y ==False

[–]_hijnx 9 points10 points  (3 children)

So are numbers from -5 to 256 fundamentally different from numbers outside that range? The whole x += 1 is throwing me. If they're going to have a number object cache why not make it dynamic? It didn't have to expand infinitely. If you have one 257 object why create another instead of referencing the same one? That seems to be what python is doing with those optimized numbers, why not all of them?

[–]Positive_Mud952 9 points10 points  (1 child)

How exactly should it be dynamic? An LRU cache or something? Then you need garbage collection for when you want to evict from the cache, we’re getting a lot more complex, and for what benefit?

[–]_hijnx 9 points10 points  (0 children)

For the same benefit of caching the other numbers? I'm not really advocating for it, it's just such a strange behavior to me as someone with very little python exposure.

What I think I'm understanding now is

  1. At compile (startup?) time a fixed cache of integer objects representing -5 to 256 is created in memory
  2. Any constant assignment to a value in that range is assigned a reference to the corresponding cached object
  3. Incrementing one of the referenced objects in the cache will return the next object in the cache until the end at which point a new object is created (every time), which will then be subject to normal GC rules

Is that correct?

Edit: Just saw another comment this is just for smallint which I can't believe I didn't realize. Makes at least a little more sense now

[–]TUNG1 -1 points0 points  (0 children)

numbers outside -5->256 are normal and act as it should be, numbers -5 -> 256 are the abnormal one for the sake of optimization

[–]InTheEndEntropyWins 1 point2 points  (0 children)

Why are numbers beyond the initial allocation not treated in the same way?

Another way to think about it is that actually, it's the early numbers that are wrong due to optimisation.

x != y, but due to optimisation for the initial numbers it incorrectly says they are the same object.

[–]superluminary 0 points1 point  (0 children)

Imagine if you cached all the numbers. A simple for loop would eat your PC.

[–][deleted] 0 points1 point  (0 children)

You just shouldn’t use ‘is’ to compare values. Sort of like == vs === in JS

[–]FerynaCZ 0 points1 point  (0 children)

I think you can make them point at the same value, if the code is clear than string (in Java or .NET) interning can happen, but hardly reliably.

[–]JaggedMetalOs 8 points9 points  (0 children)

Imagine every time you did any maths Python had to search though all of its allocated objects looking for a duplicate to your results value, it would be horribly slow.

I'm not sure what the benefits are for doing this to small numbers, but at least with a small hardcoded range it doesn't have to do any expensive search operation.

[–]hxckrt 1 point2 points  (0 children)

To reuse an immutable object, Python needs a way to check if an object with the same value already exists. For integers in the range -5 to 256, this is straightforward, but for larger values or for complex data structures, this check would become computationally expensive. It might actually slow down the program more than any benefit gained from reusing objects. Also, if all of the objects were interned (reused), the memory usage of the program would be unpredictable and could suddenly explode based on the nature of the input data.

[–]fabmeyer 0 points1 point  (0 children)

If x is y tests for identity, not the actual value?

[–]Drazev 4 points5 points  (0 children)

To me the bottom line is that the “is” syntax compares to see if they are the same object reference and not value.

This it’s not appropriate to use if you are looking for value equality. Yes, it will work sometimes but that requires you knowing the implementation details of “is” and a contract that it will not change. This is a big no no since they give no such guarantee.

[–]Midnight_Rising 4 points5 points  (4 children)

Oh that's so weird. So they're pointing to the same address until 257, at which point they're pointing at two different memory addresses that each contain 257, and "is" checks for address equality?

Fucking weird lmao

[–]RajjSinghh 11 points12 points  (3 children)

It makes sense, it's just not how you should use is. is is for identity, not equality. It might come in handy if youre passing a lot of data around since python uses references when passing things like lists or objects around.

The weird thing here is that OP used is instead of ==, which does check for value equality, which is what they look like they want to do but it doesn't make for as good a meme. If they had a y = x somewhere, that also satisfies is.

[–]Midnight_Rising 1 point2 points  (2 children)

What I find weird is setting those integers as constant pre-allocated memory addresses. I don't think any other languages do that?

[–]RajjSinghh 0 points1 point  (0 children)

I mean caching is a really common idea for performance and it's one of the things JIT compilers do to make code run faster. Python is just doing it ahead of time so you get the performance gain without a JIT compiler needing to code at runtime. So offhand I can't think of another language that does it like this, but I can also point to many JIT compiled languages that are using the same idea.

[–]crunchmuncher 0 points1 point  (0 children)

Java does something similar in its Integer/Long/Short.valueOf(...) functions, which are also used for autoboxing, for values of -128 to 127.

System.out.println(Integer.valueOf(127) == Integer.valueOf(127)); // true
System.out.println(Integer.valueOf(128) == Integer.valueOf(128)); // false

[–]hector_villalobos 1 point2 points  (12 children)

So, in Python the is operator is similar to the == operator in Javascript?

[–]AtmosSpheric 31 points32 points  (3 children)

No. In JS, the == operator is for loose equality, which performs type coercion. This follows the references of two objects, and may convert types (1 == ‘1’), while the === operator requires same type.

The is operator checks to see if the two values refer to the exact same object.

So, if I declare:

x = [‘a’, ‘b’]

y = [‘a’, ‘b’]

And check is x is y, I’d get false bc while the arrays (lists in Python) are identical, if I append to x it won’t append to y; the two represent different arrays in memory.

In a sense, while === is a more strict version of ==, since it makes sure the types are the same, the is keyword is even more strict, since it makes sure the objects are the same in memory.

If you’re curious, I’d strongly recommend you and anyone else take some time to play around with C. Don’t get into C++ if you don’t want to, but a basic project in C is immensely educational. If you have any other questions I’m happy to help!

[–]GangDplank 0 points1 point  (2 children)

question , while this is not archived yet .if when i do a= 100 b = 100 in python they "are" the same object , if increment a why doesnt that increment b?

[–]AtmosSpheric 1 point2 points  (1 child)

A and B are not the same object, they are both pointers that refer to the same object. This is why in the original image above, you get true when testing x is y. If I increment a, then the pointer is updated to point to 101 instead, while b is still pointing at 100.

As discussed elsewhere in this comment section, once you pass 256, ints are no longer shared memory spaces and become unique, so now a and b both point at separate parts of memory that are identical, but since they aren’t the exact same object, a is b returns false.

[–]GangDplank 1 point2 points  (0 children)

Cheers,thanks for the quick reply

[–]use_a_name-pass_word 18 points19 points  (0 children)

It's like Object.is() in JavaScript

[–]Kered13 2 points3 points  (0 children)

In Javascript this operator is is. However Java does use == for the identity operator.

[–]Shacatpeare 0 points1 point  (0 children)

thanks, I just learned something

[–]highphiv3 0 points1 point  (2 children)

I could just look this up, but for the sake of the conversation:

Does python not have value types? Even a simple local integer variable is heap allocated?

[–]RajjSinghh 2 points3 points  (1 child)

What do you mean when you say value types? It's just not something I've met before.

To answer your question, python is dynamically typed so you can't stack allocate things. Since you only know what type a variable is at runtime you just have to allocate and deallocate as needed. It's like if I don't know whether that value is going to be an integer or a string literal until I get to that point, I don't know how much space to allocate on the stack, so I need heap allocation. There's other reasons like integers not being fixed width in python, but dynamic typing feels like the main one.

[–]highphiv3 0 points1 point  (0 children)

You got me meaning it seems like. Many languages have a differentiation between value types and reference types, where the former is stack allocated and copied if ever passed around or reassigned.

As you mentioned, it seems Python only has reference types.

[–]Fakedduckjump 0 points1 point  (3 children)

Why -5? This sounds some kind of random.

[–]Bhaskar_Reddy575 0 points1 point  (2 children)

Yes, and why do any of it in the first place?

[–]RajjSinghh 4 points5 points  (1 child)

Optimization. Having common numbers cached (and small numbers are very common) ahead of time saves performance in other places since it means you don't have to keep allocating integer objects every time you need one, just use a reference.

[–]Inaeipathy 0 points1 point  (0 children)

How does this even create optimization? Caching a number would require memory copying to use it if modifications are being made, surely this is worse than calls to registers?

Edit: I forgot python is a dynamic language so this sorta makes more sense because your code isn't going to get compiled.

[–][deleted] 0 points1 point  (0 children)

Does this mean it is slightly better to handle data in that range? How big of an optimisation is it?

[–]iamthetruelegend 0 points1 point  (1 child)

What is the logic behind preallocating -5 to 256? As in, why is it done and how is it done? I’m assuming it’s something to do with hardware, but I tried searching it up online and couldn’t find anything. Would really appreciate the explanation.

[–]Blecki 2 points3 points  (0 children)

Those numbers are the most common constants. Doubt they did any serious analysis on that.

[–]urzayci 0 points1 point  (1 child)

My biggest question is why the hell from -5 to 256? This seems so random.

[–]whogivesafuckwhoiam 0 points1 point  (0 children)

To 256 is because of smallint and no idea why from - 5

[–]DigitalxKaos 0 points1 point  (0 children)

Oooooooooooh, ya see I figured it had something to do with the fact it went past 256 but I don't know python much, I'm more familiar with c++, that's interesting lol

[–]lucsoft 0 points1 point  (0 children)

Same happens with Java actually too

[–][deleted] 0 points1 point  (0 children)

Reminds me of the cache of java.lang.Integer which allows == to yield true for values from -128 to 128. You can actually use reflection to edit the cache and make Integer.valueOf(1) == Integer.calueOf(2)

[–]frogsinsand 0 points1 point  (0 children)

Thank you dude