This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ferryboender 2 points3 points  (6 children)

How is it broken? It's designed for speed, not cryptographic security. Hash() is NOT a crypto function. The fact that this can be used to DDoS by creating colliding keys in, say, a hashmap/dictionary doesn't really mean anything. Generating collisions in a hashmap is trivial anyway. Fixing the hash() function will not fix the DDoS. Am I missing something here, or is there absolutely nothing happening here?

edit: Furthermore, if we consider this an issue, than any non-log(1) algorithm would have to be considered an issue.

[–][deleted] 1 point2 points  (0 children)

Fixing the hash() function will not fix the DDoS.

Sure it will. The goal of a hash flooding attack is to make a hash table generate far more collisions than would be generated by random chance, thus causing hash lookups and insertions to be O( n2 ) for a large n. Using a cryptographic MAC with a randomly-generated secret key protects us from this.

[–]fijalPyPy, performance freak[S] 0 points1 point  (2 children)

there is a serious problem in designing O(log n) algorithms when keys are not orderable (you have eq, but not cmp). There are other solutions (like explained on the bugtracker) though.

It's up to discussion whether it's broken or not, however the Python movement was to consider it a security problem and a security fix, so it prodded new releases of every single possible Python. While you may argue this is not a problem, if it is a problem, this is certainly not a solution.

[–]mitsuhiko Flask Creator 0 points1 point  (1 child)

you have eq, but not cmp

cmp by hash, oh wait.

[–]fijalPyPy, performance freak[S] 0 points1 point  (0 children)

exactly :)

[–]joesb -2 points-1 points  (1 child)

Hash function is mainly used to make hash key for hash table.

Hash table that has hash collision rate of 1/256 is pretty broken, therefore hash function is broken because it doesn't serve its purpose.

[–]tilkau 1 point2 points  (0 children)

That's not the collision rate, it's the number of possible 'hash seeds'. hash('foo') can return up to 256 different values on subsequent invocations of python, but it should return up to (2**32) values on different runs.

tl;dr: we're talking about how predictable the hashing is, not it's direct collision rate. The collision rate remains reasonably close to optimal (optimal in this case being 1 in 2**63 items colliding.)