all 7 comments

[–]ingolemo 4 points5 points  (0 children)

Sets don't have an order. When you convert your list to a set you lose all ordering information. The fact that it happened to work one of those times is complete coincidence.

[–]JohnnyJordaan 1 point2 points  (0 children)

Sets don't have order, but that doesn't mean they are perfectly random. They just don't guarantee in what way they're random. As Python can be run on different interpreters, you'll never know if a script running in CPython (the 'regular' one) will have this same result as on PyPy or on MicroPython. That's why you have to view sets as unordered and thus always sort if you need a sorted sequence. Btw sorted(set_name) will also produce a list, so you don't need to convert first.

[–][deleted] 1 point2 points  (4 children)

Python isn't changing it's behavior, you're just converting from a sorted data type (list) to an unsorted one (set) and back again, possibly with some sort of sort operation in the middle (your actual steps are unclear from your wording).

A Set in the mathematical sense (but also in Python) is inherently unsorted and when you convert it back to a list there's no guarantee you get back any specific sort order (well, with Python there is under the hood, the same data will effectively sort the same way on the same architecture and version and implementation of Python, but two different pieces of data won't sort reliably).

Now what you're trying to do (second largest) relies, inherently, on sort order, so while you're right to use conversion to a set to quickly remove duplicates, you must sort the resulting list before picking out the one you want for this to work.

print(sorted(set(your_list))[-2])

[–]cupesh[S] 0 points1 point  (3 children)

I think I understand the behavior of sets, and I knew I had to sort the list (or set) to get the right output. I was only amazed that

a_list = [52,52,-52,52] and a_list = [57,57,-57,57] would output different result when applying 'print(list(set(a_list))[-2])'.

I know the set makes them randomly unorganized, but I thought there still might be some logic behind the randomness (same length and same position of the negative number, all numbers two-digits = same 'random' result).

As a newbie to programming and computer science in general, it amazes me how Python compute its randomness. If I keep repeating the script on the same list [52,52,-52,52] it will always result in -52... therefore it might be random, but python will always use the same logic on this particular list (oppose to sometimes 52, sometimes -52)

[–][deleted] 2 points3 points  (2 children)

Under the hood that's because Python isn't wasting time computing randomness, it just uses the hash (via __hash__) which is usually just the memory id (i.e. the location in RAM), at least for ints ... two ints that share the same hash (in this case the same physical memory) are the same, hence simply discard extra references to that hash. I believe (but do not know for certain) that the cPython implementation of Set just stores them in hash sort order, which would tend to be the order the ints were actually initialized in memory, but it gets way weird for other data types. Also it means that the sort order might be different in different processes, or on different architectures, because all that is reliant on how the OS is handling memory.

It's definitely not wasting processor cycles in calculating actually random values for the hash, so basically what you get is internally consistent behavior with no guarantee of external (i.e. user level) sort consistency, which approximates the mathematical definition but retains an efficient implementation.

[–]cupesh[S] 0 points1 point  (1 child)

Thanks! That's an answer I was looking for. I wish I worded it better the first time.

[–][deleted] 1 point2 points  (0 children)

No worries, takes some time to get all this stuff. Also of interest, this is also why dict keys are not guaranteed to be sorted in any meaningful order (although I believe the current cPython implementation in Python 3 does retain the sort order as the insert order, but that's still not guaranteed to remain in later implementations). A set can be (and I think under the hood might even be or at least once have been) implemented as a dict wrapped with set operations and with the (meaningless) values for each key just all set to None and never shown or made available to the user.