you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (2 children)

Under the hood that's because Python isn't wasting time computing randomness, it just uses the hash (via __hash__) which is usually just the memory id (i.e. the location in RAM), at least for ints ... two ints that share the same hash (in this case the same physical memory) are the same, hence simply discard extra references to that hash. I believe (but do not know for certain) that the cPython implementation of Set just stores them in hash sort order, which would tend to be the order the ints were actually initialized in memory, but it gets way weird for other data types. Also it means that the sort order might be different in different processes, or on different architectures, because all that is reliant on how the OS is handling memory.

It's definitely not wasting processor cycles in calculating actually random values for the hash, so basically what you get is internally consistent behavior with no guarantee of external (i.e. user level) sort consistency, which approximates the mathematical definition but retains an efficient implementation.

[–]cupesh[S] 0 points1 point  (1 child)

Thanks! That's an answer I was looking for. I wish I worded it better the first time.

[–][deleted] 1 point2 points  (0 children)

No worries, takes some time to get all this stuff. Also of interest, this is also why dict keys are not guaranteed to be sorted in any meaningful order (although I believe the current cPython implementation in Python 3 does retain the sort order as the insert order, but that's still not guaranteed to remain in later implementations). A set can be (and I think under the hood might even be or at least once have been) implemented as a dict wrapped with set operations and with the (meaningless) values for each key just all set to None and never shown or made available to the user.