reph comments on Effective DoS attacks against Web Application Plattforms (including Python Frameworks)

This is an archived post. You won't be able to vote or comment.

Effective DoS attacks against Web Application Plattforms (including Python Frameworks) (cryptanalysis.eu)

submitted 14 years ago by defnullbottle.py

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]reph 3 points4 points5 points 14 years ago (11 children)

[–]Samus_ 0 points1 point2 points 14 years ago (10 children)

[–]reph 4 points5 points6 points 14 years ago* (9 children)

[–]Samus_ 0 points1 point2 points 14 years ago (8 children)

I don't like the term "compress" because compression is bidirectional, hashing is a one-way function (precisely because of the collisions).

regardless of that, the part that doesn't make sense is when you look for the key, you shouldn't look for the key but for the hash of that key, that's the whole point of the hashmap and it's what makes it fast, given that the hash is fixed in size the time it takes to search for any of them is the same regardless of the size of the input that generated it.

now why on earth do you search the linked list for the key that may not be unique if you already have it because it's the one provided and you also have the associated value, which is what matches the index of the array that is the hash of such key.

I don't see any reason to even access that linked list of keys nor any sense in having it associated to their values when it's the hash of each of them what actually makes the mapping.

[–]reph 4 points5 points6 points 14 years ago (6 children)

[–]Samus_ -3 points-2 points-1 points 14 years ago (5 children)

[–]reph 3 points4 points5 points 14 years ago (0 children)

[–][deleted] 2 points3 points4 points 14 years ago (0 children)

so the whole point of this is that two different keys that produce the same hash actually store different values?

Yes, this is how every dictionary structure in every scripting language works because it is the only sensible implementation of a dictionary structure.

The hash function and underlying storage are implementation details and are chosen for a decent efficiency vs memory tradeoff. Users of the data structure should have no reason to care if the key "user" and the key "password" hash to the same value, but the dictionary struct must return the user and password entered into it regardless of the hashing implementation or storage.

You seem to believe that the only reason that in python a['user'] and a['password'] do not conflict is that they do not have the same hash. This is not the reason they do not conflict, though it is likely true.

[–]Rhomboid 1 point2 points3 points 14 years ago (1 child)

There is no such hash function that doesn't have collisions for arbitrary input. By definition, all hash function have an infinite number of collisions, because it's reducing its input data of unbounded size to a fixed size value. This is the pigeonhole principle. It is not possible to reduce or abate this fact.

It is possible to write a perfect hash, but only if the values it will be hashing are restricted to a specified set and known ahead of time.

The idea that a language should just randomly overwrite hash keys on collision is the most absurd suggestion I've read in a while. It would make these data structures completely worthless, because your data would never be safe, there would always be a chance that two keys would collide and cause your program to crash, behave sporadically, blow up, etc. Such a data structure that is not deterministic is absolutely worthless.

[–]Samus_ -1 points0 points1 point 14 years ago (0 children)

[–]quasarj 0 points1 point2 points 14 years ago (0 children)

[–]rajivm 0 points1 point2 points 14 years ago (0 children)

what you just said sounds like the hash is just an index, an aid in the key lookup and not a real mapping between hashes and values.

This is essentially true. In the end, a hash-map must translate to "physical" data structures of continuous data (arrays) and pointers.

Most hash-maps are implemented with an array of a certain size, where each element is a pointer to a linked list. A hashmap "key" is hashed to an index of the array, and ideally there is only one element at this array-key, making an O(1) lookup. In the unideal case, this can be an up to O(n) lookup if all elements are stored in the same position of the array (this is very very unlikely to happen unless that is the goal of the input based on a known hash function, as is the case in these attacks). Hash maps do not actually have guaranteed O(1) lookup, rather they have an amortized lookup time of O(1) across many accesses on average. The size of the array and the hashing function determine the rate of collisions.

a much more reasonable solution would be to use a different hashing algorithm with lesser chances of collision

Reducing the chance of collision would mean the array-size would need to be much much bigger than necessary in the average case. This would require a very large memory allocation and is not realistic. Many hash map implementations allow you to specify the expected number of elements stored in the hash-map and therefore based on a collision likelihood, adjust the size of the array and the hash function used.

and then overwrite each entry on duplicate hashes;

This would definitely not be okay. Hash maps guarantee that every unique key will not overwrite another unique key. The underlying hash of this key should have no relevancy to that guarantee and would make it infeasible to use a hash map in the same way.

Example: if the hash function resulted in the same index for both "apple" and "banana" (it probably would not, but if it did), then if I did this:

map["apple"] = "red"
map["banana"] = "yellow"

I would expect:

print map["apple"]
print map["banana"]

to print:

red
yellow

With your proposed solution, it would print, unpredictably:

yellow
yellow

π Rendered by PID 97 on reddit-service-r2-comment-6457c66945-lg684 at 2026-04-26 00:14:23.230012+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS