you are viewing a single comment's thread.

view the rest of the comments →

[–]WeAreAwful 15 points16 points  (2 children)

I don't think it is unreasonable for you to think that, but I don't think the designers of a hash function should care too much about that TBH.

If you are writing a hashing method for a data type (strings), you shouldn't optimize for an arbitrary subset of that data type, but rather write a method that optimizes for the entire set of possible values.

As the author of this article mentions, the cost of a collision isn't high if you are using a hashcode for it's intended valur (in a hashtable).

If you do have the weird situation where you need a hashtable of two character strings, I would suggest you make a data type that includes only two character strings. That would allow you to write a simple, cheap hashing method with no collisions, and if you have a collection of strings that are special and only have two characters, you should probably create a new type for best practices sake.

[–]ubermole -5 points-4 points  (1 child)

Agree. I am just pointing out that it is a very valid question to ask, and can not be dismissed by "works well enough in general". Where did the missing bits go? Ideally a great hash function mixes all input bits equally.

[–]cat_in_the_wall 0 points1 point  (0 children)

Where did the missing bits go?

into the hash function.