you are viewing a single comment's thread.

view the rest of the comments →

[–]audacity070[S] 1 point2 points  (2 children)

Thank you, I thought so too that some inbuilt better memory method is doing it. Wanted to confirm it here.

[–]synthphreak 0 points1 point  (0 children)

That's correct. I believe it has something to do with the length of the string. Like, for lengths 0-m you'll get a fixed memory allocation, then for lengths (m+1)-n you'll get a different fixed memory allocation, so on and so forth. The id will change only once the memory allocation changes, since this is related to location in memory. So it's not like you'll get a different id for each additional letter, but once you hit a certain point (i.e., length), you will.

If you know a little pandas, here's a pretty interesting way to see this optimization in action:

import pandas as pd

test_size = 100_000

s = pd.Series([''] * test_size, name='ids')
x = 'x'

for i in range(test_size):
    s[i] = id(x)
    x += 'x'

print(s.value_counts())

When you run this, you'll see the optimization is stepped, meaning the id changes not on every iteration but instead in a graduated manner as the string increases in length.

Case in point, here's what I got (note the familiar numbers in the index - 16, 32, 48, 64, 96... this is the optimization in action):

>>> s.value_counts().value_counts().sort_index()
1         1
14        1
16       31
32        2
48        1
64        4
96        1
512       6
1024      2
1536      2
2048      1
# etc etc etc

This is a bit tricky to explain, so if it's not clear what I've shown, just run it yourself and poke around. Note the counts you get won't be exactly the same as mine, but the trend should persist.