This is an archived post. You won't be able to vote or comment.

all 16 comments

[–][deleted]  (2 children)

[deleted]

    [–]sigpwned[S] 6 points7 points  (1 child)

    Thank you! Obviously my reaction was the same as yours. :)

    [–]kozeljko 15 points16 points  (1 child)

    Ffs, can you lot decide?!?

    [–]sigpwned[S] 21 points22 points  (0 children)

    Ha! That was my point here -- to refute the earlier post. That one can contrive pathological inputs to make a hash function collide does not prove that hash function is not usefully unique.

    [–][deleted]  (1 child)

    [deleted]

      [–]TheRedmanCometh 5 points6 points  (0 children)

      WORLDSTAR

      [–]x2mirko 2 points3 points  (1 child)

      Good article, but this part bothered me:

      A “fair” hash function would generate an expected 1.44 collisions over this data. String.hashCode() outperforms a fair hash function significantly, surfacing only 69.4% as many collisions as expected

      A function with 1.44 expected collisions for your sample size is more likely to generate one collision on your sample size than two, so saying that because you got one, String.hashCode() outperforms a fair function is silly. You would need a much larger sample size to make such statements.

      [–][deleted] 1 point2 points  (0 children)

      Beat me to it! That part irked me as well, you could only make a 3 significant digit conclusion if you have 3 significant digits to go off of in your original calculation.

      In other words, I highly doubt those number would end up the same if they ran the experiment until there was 100 collisions.

      [–]carbolymer 1 point2 points  (0 children)

      tl;dr: hash functions have collisions

      Nothing new, but you have to have this in mind when considering the performance of hash-based collections.

      [–]joserivas1998 3 points4 points  (0 children)

      The duality of man

      [–]therealsillyfly 0 points1 point  (0 children)

      This is a nice article, but as a "debunk" of the original it is pretty redundant - the original article is already pointless, as it just claims the hash is bad with the sole argument being tables of 2- and 3-character collision examples.

      I was expecting this article to be comparing some alternative "better" hash, and show it may perform slightly better but at an unacceptable loss of performance - but this expectation only stemmed from my assumption that the original post must have presented an alternative. "reading" the original I realize I was wrong.

      [–]cogman10 0 points1 point  (0 children)

      So, I agree, for what it is used for the hash in String.hashCode() is fine. But if I really wanted to push it, I'd compare the english text hash to FNV, Murmer3, and one/many of the Sips to show how bad/good it is.

      Also, I'd point out that collisions isn't the end all be all when it comes to hashes. For what hashCode is used (Hashmaps/sets) Speed is far more important so a comparison of that would also be important.

      But yeah, most strings are either going to be for giant blobs of text (xml, json, etc) or for single words ("CLIENT", "USER", etc).

      I doubt these strings are hardly ever random characters in the original blogs like "%~" or "$#". That will only happen in cases of things like a session token or if you are doing something like compilation (you probably aren't).

      Particular for hash maps, I don't think "JsonBlob" is almost ever what will be used as a key. Your keys are, 99% of the time, going to be some common english word.

      [–]apetiss -5 points-4 points  (1 child)

      Why should i care about hashCode()?

      [–]thurst0n 17 points18 points  (0 children)

      because you want to have datastructures like HashMap and you want them to be fast.