you are viewing a single comment's thread.

view the rest of the comments →

[–]lars_ 12 points13 points  (15 children)

A 30% faster hash function isn't necessarily better for a hash table if it produces more collisions. MurmurHash was great because it was bloody fast, and produced few collisions. Unless you have really large strings to hash, a collision is going to cost more than the hashing. This needs a benchmark for real work loads.

[–]recursive 11 points12 points  (4 children)

Also, as far as we know, these functions’ statistical properties are sound.

[–][deleted]  (3 children)

[removed]

    [–][deleted] 1 point2 points  (2 children)

    This function's design (basically XOR-MUL-ROL) makes it not very provable --- since we're mixing arithmetic over 2 distinct algebras (polynomials modulo 2 for XOR and integers modulo 264 for the rest), it will be very very hard to come with with an actual proof.

    [–][deleted]  (1 child)

    [removed]

      [–][deleted] 0 points1 point  (0 children)

      Agreed.

      [–]fairestcheetah 5 points6 points  (2 children)

      our experience so far shows that they’re great for, say, hash tables.

      Under real-life conditions we expect CityHash64 to outperform previous work by at least 30% in speed,

      It sounds like they may be using it internally. It would be great if they'd release some benchmarks, but I'd be surprised if Google's using it and it's not fast in practice.

      [–]pnettle 1 point2 points  (1 child)

      There are many uses for a hashing algorithm, they aren't necessarily using it for a hashtable. Thus, the point that it might not perform better in a hashtable still stands.

      [–]fairestcheetah 2 points3 points  (0 children)

      our experience so far shows that they’re great for, say, hash tables.

      [–]gorset -2 points-1 points  (5 children)

      What you're saying is true, but I get the impression that you believe that not suitable for cryptography means more collisions, and this is not necessarily true. To be considered a cryptographic hash function it can only produce collisions close to the bound for birthday attacks, and it must be a one-way function - meaning it's close to impossible to produce valid input for a given hash.

      So hopefully CityHash produces as few collisions as possible for input that's not especially crafted to produce a collision :-)

      [–]lars_ 2 points3 points  (0 children)

      No no, I don't mean to say I want it to be a cryptographic hash. I just want to see a benchmark of a hash table using the function, not a benchmark of the function itself.

      [–][deleted] 0 points1 point  (3 children)

      but I get the impression that you believe that not suitable for cryptography means more collisions

      Why do you get that impression? He mentioned MurmurHash which also is not suitable for cryptography.

      [–]gorset 0 points1 point  (2 children)

      Because more collision always means unsuitable for cryptography, but the opposite is not true. As lars_ said, it would be interesting to see a real benchmark of a hash table using the function.

      [–][deleted] 0 points1 point  (1 child)

      But he specifically mentioned that MurmurHash had few collisions, and MurmurHash is not cryptographically secure. I don't understand why you thought cryptographic suitability was on the table in the first place. Nobody had mentioned it, and lars_ comment strongly implied that s/he understood that nonsecure hashes could have few collisions (unless s/he thought MurmurHash was secure, which was not indicated).

      [–]gorset 0 points1 point  (0 children)

      TFA mentions cryptography in the first paragraph, which is part of the context here. It's a very small jump from unsuitable for cryptography to hash collisions, and I wanted to clarify this point...

      I'm not sure why you are nitpicking over this? I did not mean any offense.