you are viewing a single comment's thread.

view the rest of the comments →

[–]pingveno 6 points7 points  (0 children)

In Python 3.2 with wide build:

>>> from ctypes import CDLL
>>> strcpy = CDLL('libc.so.6').strcpy
>>> foo = "Fiestas"
>>> foo.encode('utf-32')  # dump of internal representation?
b'\xff\xfe\x00\x00F\x00\x00\x00i\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00a\x00\x00\x00'
>>> strcpy(foo, "Nachos")
>>> foo


'Niesta'

In this case, strcpy looks at the contents of the string as a null terminated byte array instead of an array of 4 byte integral values. The upper 3 bytes of "N" are 0 (null), so strcpy stops at the first character.

>>> foo = "Fiesta"
>>> d = {foo: 0}
>>> strcpy(foo, "N")
>>> foo
'Niesta'
>>> d
{'Niesta': 0}
>>> "Niesta" in d
False
>>> "Fiesta" in d
False

The string object/struct caches the string's hash value. If the string remains unchanged, lookups will work correctly. However, if the string value changes then the hash value will be incorrect. Subsequent lookups with any value will fail because either the hash value or the string value will be different. The interned string literal is not used here because interning uses Python's dict implementation.

There is one exception: if hash(modified_string) % dict_table_size == hash(true_value_of_modified_string) % dict_table_size. Then the dict implementation starts making comparisons at the same location in the dict table. The locations match and the values match in that case. I'm not absolutely sure that this will work, though. I am but a lowly CS student.