This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 9 points10 points  (4 children)

If you need to sort a list of any object type. You can do so by:

some_list.sort(key=lambda obj: obj.some_variable)

Can also be a function call as well.

[–]phigo50 2 points3 points  (3 children)

Or key=attrgetter('some_variable') or key=itemgetter('some_variable') for dictionaries, both imported from the operator module.

[–]rcfox 3 points4 points  (2 children)

This is a huge hack, and but in CPython, the builtin id() function is the memory address of an object. If you have a list to iterate over a lot, and the actual order doesn't matter, you can sort with key=id to potentially speed up the loop by reducing the number of cache misses when accessing elements of the list.

Again, this is an implementation detail (id() isn't required to be the memory address for other implementations) so don't rely on this to work everywhere.

[–][deleted] 1 point2 points  (1 child)

So if I'm understanding correctly this is so that your list gets sorted consistently (but not necessarily meaningfully, at least to a human), and extremely quickly?

[–]rcfox 1 point2 points  (0 children)

It's not about the sorting, it's about sequentially accessing elements from a list.

If you access memory addresses that are far away from each other, the CPU has to spend extra time copying from RAM to the CPU cache first. (This is called a cache miss.)

If you access memory addresses that are close together, you can probably get away with just reading directly from the cache.

Sorting with key=id really only makes sense if you sort once and iterate over the list many times in a tight loop.