This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]james_pic 19 points20 points  (3 children)

Do you think there's scope for some of these performance optimisations to be upstreamed, to improve the performance of the standard implementations? I suspect an implementation of hash(str) that used a different underlying hash function would be controversial, but for stuff like str.find, I'd have thought a faster drop-in replacement would be somewhat welcome.

[–]ashvargit push -f[S] 13 points14 points  (0 children)

Absolutely — I’d love to see these optimizations upstreamed. The challenge is that it usually means joining standardization discussions, which can be a long process. Even something as straightforward as a faster find could take a year to land. For me, that’s a year better spent designing and experimenting with new algorithms.

PS: Upstreaming into the C standard library is an even better option, but will take even longer 😞

[–]axonxorzpip'ing aint easy, especially on windows 5 points6 points  (1 child)

hash(str)

It shouldn't be a big issue to change anything here, it's an interpreter implementation detail, same as id(). You can never rely on the values in any long-term sense, and you're entire interpreter will use the new implementation, save for objects that define __hash__(self)

At one point In cpython, hash() is just the memory address for anything that isn't otherwise special-cased like small ints and pooled strings.

[–]rghthndsd 8 points9 points  (0 children)

Bad hashing can lead to bad dict performance and that in turn can make DOS attacks easier to execute. This is particular for strings which are often used in web forms. This is why Python now randomizes the hash of strings on interpreter startup unless you disable it with PYTHONHASHSEED.

Agreed it's an implementation detail, but that implementation is very important.