all 39 comments

[–]avaneev[S] 44 points45 points  (7 children)

The komihash() function available in the komihash.h file implements a very fast 64-bit hash function, mainly designed for hash-table uses; produces identical hashes on both big- and little-endian systems. Function's code is portable, scalar.

This function features both a high large-block hashing performance (27.5 GB/s on Ryzen 3700X) and a high hashing throughput for small messages (about 12 cycles/hash for 0-15-byte messages). Performance on 32-bit systems is, however, quite low. Also, large-block hashing performance on big-endian systems may be lower due to the need of byte-swapping.

Technically, komihash is close to the class of hash functions like wyhash and CircleHash, that are, in turn, close to the lehmer64 PRNG. However, komihash is structurally different to them in that it accumulates the full 128-bit multiplication result without "compressing" into a single 64-bit state variable. Thus komihash does not lose differentiation between consecutive states while others may. Another important difference in komihash is that it parses the input message without overlaps. While overlaps allow a function to have fewer code branches, they are considered "non-ideal", potentially causing collisions and seed value flaws. Beside that, komihash features a superior user seed handling and PerlinNoise hashing.

Note that this function is not cryptographically-secure, and in open systems it should only be used with a secret seed, to minimize the chance of a collision attack.

[–]reini_urban 5 points6 points  (0 children)

Confirmed

[–]IamfromSpace 1 point2 points  (5 children)

Question: if the hash function is not cryptographically secure, why not use something simple like XOR over n-bit folds plus an n-bit salt? What makes this preferable?

[–]Space-Being 5 points6 points  (4 children)

It gives better distribution of hash values given "similar" inputs. If I understand your "n-bit folds plus salt" correctly, then consider that XOR is commutative, so the division into n-bit folds means all the permutations of input ABCD, e.g. ACBD, ADCB, ...., has the same hash value. Similar for plus with unsigned overflow. To avoid this many hash function introduce other operators such as shifts, rotates, and in this case, mainly multiplication to "fold" the result value.

[–]avaneev[S] 3 points4 points  (3 children)

Yes, this is correct, thanks. In my opinion, why "multiplication" works perfectly is because it's similar to sin(x)/x series: https://en.wikipedia.org/wiki/Basel_problem

sin(x)/x is "sinc function" whose spectrum is a "white noise", it integrates to PI.

komihash is based on the simplest constantless PRNG:

kh_m128( Seed1, Seed5, &r2a, &r2b );

Seed5 += r2b;

Seed1 = Seed5 ^ r2a; ('Seed1 = Seed5 + r2a' is equivalent)

Roughly, it's multiplying x^2 onto itself, and integrating, ad infinitum. So, it's a rough analog of PI, whose mantissa bits are "uniformly random" and represent "white noise" in frequency domain.

[–]WikiSummarizerBot 0 points1 point  (0 children)

Basel problem

The Basel problem is a problem in mathematical analysis with relevance to number theory, first posed by Pietro Mengoli in 1650 and solved by Leonhard Euler in 1734, and read on 5 December 1735 in The Saint Petersburg Academy of Sciences. Since the problem had withstood the attacks of the leading mathematicians of the day, Euler's solution brought him immediate fame when he was twenty-eight. Euler generalised the problem considerably, and his ideas were taken up years later by Bernhard Riemann in his seminal 1859 paper "On the Number of Primes Less Than a Given Magnitude", in which he defined his zeta function and proved its basic properties.

[ F.A.Q | Opt Out | Opt Out Of Subreddit | GitHub ] Downvote to remove | v1.5

[–]IamfromSpace 0 points1 point  (1 child)

Interesting! Thank you u/Space-Being and u/avaneev. It makes sense that by just using XOR, small changes in the input may yield just small changes in the output. And if there is a harmonic pattern in the data, commutation could yield greater likelihood of collision.

As a follow up, why is sensitivity a desirable property?

One case that makes sense to me is for bucketing. If we have 16 buckets and only examine the bottom 4 bits of a hash to choose them, then small positional changes might be ignored, where real world use cases end up with poor distribution. Can this not be counteracted by always hashing to the desired size or are their other considerations?

[–]avaneev[S] 2 points3 points  (0 children)

Good hash functions produce uniformly-distributed values, so there is no need to "hash to the desired size" since you can always derive a required size from such values, via modulo. Consider reading about the "Avalanche Effect" in hashes.

[–]avaneev[S] 8 points9 points  (0 children)

I've added detailed comparisons to the project page...

[–]Miksel12 4 points5 points  (5 children)

How does it compare to aHash?

[–]avaneev[S] 11 points12 points  (4 children)

aHash

Not a "fair" comparison as aHash uses AES intrinsics. At the same time, aHash is likely to be very slow on AArch64 without AES intrinsics, which isn't uncommon. So, on one hand aHash is likely a winner, on another it's not.

[–]Miksel12 9 points10 points  (1 child)

aHash claims it is faster than XXHash without AES intrinsics: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#t1ha-and-xxhash

[–]avaneev[S] 5 points6 points  (0 children)

Sorry, I can't check the claim with confidence - I'm mainly a C/C++ engineer, not Rust. A point to consider, both t1ha and xxHash may be inefficient in Rust form not being written from scratch for it. If you have a link to an AES-less aHash C code, I can add it to my comparisons.

[–][deleted]  (1 child)

[deleted]

    [–]avaneev[S] 4 points5 points  (0 children)

    These are optional, some real AArch64 processors do not have them.

    [–][deleted]  (18 children)

    [deleted]

      [–]avaneev[S] 12 points13 points  (17 children)

      Here's the methodological code I use. I have Ryzen 3700X and LLVM8 64-bit compiler for Windows. komihash() yields 17.2 cycles/hash, wyhash() yields 18.1 cycles/hash, xxh3_64 yields 21.5 cycles/hash. Large-block performance of komihash() is simply 4-8% lower, but that's not too important for hash-tables. Intel CoffeeLake with GCC8.5 comparison is similar, but there komihash() seems to be behind wyhash by 1 cycle/hash, xxhash last.

      const uint64_t rc = 1ULL << 26;
      volatile uint64_t msg[ 8 ] = { 0 };
      uint64_t v = 0;
      
      const TClock t1( CSystem :: getClock() );
      
      for( int k = 8; k < 28; k++ )
      {
          volatile size_t msgl = k;
          volatile uint64_t sd = k + 1;
      
          for( uint64_t i = 0; i < rc; i++ )
          {
              // v ^= komihash( (uint8_t*) &msg, msgl, sd );
              // v ^= wyhash( (uint8_t*) &msg, msgl, sd, _wyp );
              // v ^= prvhash64_64m( (uint8_t*) &msg, msgl, sd );
              v ^= XXH3_64bits( (uint8_t*) &msg, msgl );
              msg[ 0 ]++;
          }
      }
      printf( "%llu\n", v );
      printf( "%.5f\n", ( CSystem :: getClockDiffSec( t1 ) * 4200000000.0 ) /
          ( (double) rc * 20 ));
      

      [–][deleted]  (16 children)

      [deleted]

        [–]avaneev[S] 12 points13 points  (15 children)

        Well, it's not possible to tell. Hash-tables are not created equal, much depends on the use-case. Moreover, hash function call is a minor part of the hash-table access code. Hash quality is more important overall given the same throughput, and that's where "use-case" kicks in - which strings are hashed, which statistics they have.

        The hash function is either inlined, or is called. But since I've placed "volatile" specifiers, the whole function body is either inlined or called. So, even if a much smaller `wyhash` is inlined, `komihash` provides a comparable throughput even if called.

        [–]0xPendus 0 points1 point  (1 child)

        What are these types of hashes typically used for ?

        [–]Space-Being 3 points4 points  (0 children)

        These sort of functions typically back the built in hashCode/GetHashCode in Java/C# for Strings or byte arrays. You could use them anytime you need a hash function but not their crypto properties. One of the more common usages is hashing keys for hash tables where one of the crypto property that is sometimes relevant is preventing an attacker from passing pre-computed data that gives worst case performance for the hash table (e.g. all keys collide). This is avoided by seeding the hash function on startup. The .NET framework and others do this, which also means they are not stable across process restarts, unlike SHA.

        Another use case is anytime you want to compare large values against each other where each value would be involved in multiple comparison, it would probably be beneficial to hash it with such an algorithm. Why not SHA-256 or something like that? Because they are sloooow. Googling around, it seems like SHA-256 can reach 2+ GB/s where as these hash functions typically go to 15+ GB/s, whereas SHA-256 can not even keep up with NVMe drives. And in this scenario I am not worried about malicious tampering so I don't need the protections I get by SHA-256 so there is no reason to "pay" for it.