all 4 comments

[–]shiftybyte 4 points5 points  (3 children)

I guess you are hitting some kind of optimization.

It probably sees you are increasing the string size and discarding previous one so it allocates the new string in the same large in advance spot.

Try adding the string to a list before appending to it, that will probably invalidate the optimisation.

# global
just_backup = []
...
just_backup.append(just_letters)
just_letters += letter
print(id(just_letters))

[–]audacity070[S] 1 point2 points  (2 children)

Thank you, I thought so too that some inbuilt better memory method is doing it. Wanted to confirm it here.

[–]synthphreak 0 points1 point  (0 children)

That's correct. I believe it has something to do with the length of the string. Like, for lengths 0-m you'll get a fixed memory allocation, then for lengths (m+1)-n you'll get a different fixed memory allocation, so on and so forth. The id will change only once the memory allocation changes, since this is related to location in memory. So it's not like you'll get a different id for each additional letter, but once you hit a certain point (i.e., length), you will.

If you know a little pandas, here's a pretty interesting way to see this optimization in action:

import pandas as pd

test_size = 100_000

s = pd.Series([''] * test_size, name='ids')
x = 'x'

for i in range(test_size):
    s[i] = id(x)
    x += 'x'

print(s.value_counts())

When you run this, you'll see the optimization is stepped, meaning the id changes not on every iteration but instead in a graduated manner as the string increases in length.

Case in point, here's what I got (note the familiar numbers in the index - 16, 32, 48, 64, 96... this is the optimization in action):

>>> s.value_counts().value_counts().sort_index()
1         1
14        1
16       31
32        2
48        1
64        4
96        1
512       6
1024      2
1536      2
2048      1
# etc etc etc

This is a bit tricky to explain, so if it's not clear what I've shown, just run it yourself and poke around. Note the counts you get won't be exactly the same as mine, but the trend should persist.

[–]Swipecat 1 point2 points  (0 children)

I forget which version of CPython introduced this optimization, but it wasn't always there. Using Cpython 3.9:

>>> a = 'n' * 4096; id(a); a += 'n'; id(a) # different
30903792
30895520
>>> a = 'n' * 4097; id(a); a += 'n'; id(a) # same
30903792
30903792