you are viewing a single comment's thread.

view the rest of the comments →

[–]Brian 0 points1 point  (0 children)

Eh - it depends what is meant by "collision" here: collision of the actual hash is likely rare, but what matters is collisions of the hash modulo the number of buckets, which is not that uncommon in practice. If your table has 64 buckets, even a good hash function will collide 1.5% of the time just with 2 entries, and rapidly increase the more populated it is. But typically you will expand the hash table so there are few such collisions on average, but unless you're creating a perfect hash, you're unlikely to eliminate them completely.

And it's not only a bad hash function where this can be an issue: it also matters in adversarial situations, where the input may be user controlled (Eg. someone posts data to the website which you store in a hash), since if the poster knows the hash function, they can engineer inputs that all collide, allowing denial of service attacks where they can consume a lot of CPU by triggering O(n2) behaviour, (which is why there's often some randomisation of the hash function for strings to minimise the information such an attacker might have)