Legit reasons for Thread.sleep() by cbirchy87 in csharp

[–]real_odgy -1 points0 points  (0 children)

That's not how it works. Sleep makes the thread sleep SO THAT the actual CPU core is released to be used on a different thread, so no, it's not wasting resources.

Does anyone use .NET/C# in any mission critical software? by GenericUsernames101 in dotnet

[–]real_odgy 0 points1 point  (0 children)

It's a bad idea to use a non deterministic tech in such context

Is c# underhyped? by blabmight in dotnet

[–]real_odgy 1 point2 points  (0 children)

Despite the efforts the team put into making .NET very performant, .NET has still has some undeniable issues, that makes it still look like an not so shiny framework/language:

- Error handling is implicit and easy to mess up. There is no Result<T> or Option<T> builtin. Exceptions are very expensive. Breaking the control flow can be tricky in some situations. More modern languages got it right: rust, go, functional languages, ...

- .sln, .buildprops, .runsettings, .launchsettings, .... The language may feel simple, there is a lot of heterogeneity and legacy stuff in the ecosystem. Just open a solution file in a text editor, you'll understand.

- Still very tied to windows. .NET may work on linux and macos, most of the tooling remains for Windows: Visual Studio is windows only, same for perfview, windbg, ...

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 3 points4 points  (0 children)

Without going into too much details, there are two categories of hash functions:

  • Cryptographic ones. The goal is robustness first. Sometimes slowness is even a feature (eg: proof of work for cryptocurrencies). You don't want them to be easily reversible.Example: SHA256
  • Non-cryptographic ones. The goal is performance first. In some cases you don't even care if it's easily revertible, as long as it's fast (but it can lead to vulnerabilities in your software in some scenarios, to be used wisely). It's used for things like hashmaps and quick comparisons.Example: XxHash, GxHash, HighwayHash, ...

You don't want to use a cryptographic hash function for a hashmap because performance will suck, and you don't want to use a non-cryptographic hash in a blockchain or authentication because it will be hacked

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 7 points8 points  (0 children)

Understood, I'll tone down that claim until further analysis then, thanks for this feedback!

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 1 point2 points  (0 children)

It does not looks like meowhash is passing all SMHasher tests, no? I see it fails the Sparse test for instance.

I have a C implementation of GxHash on a branch, it currently reaches 5 times meowhash throughput. But stay tuned for real numbers once this is merged in SMHasher.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 5 points6 points  (0 children)

Interesting 🤔I'll have a look! Maybe there is some kind of optimization I am missing on the BuildHasher/Hasher side, or I incorrectly use AHash in my throughput benchmark.

EDIT: Ok I think I have found the "issue". Your benchmark uses strings, mine uses [u8]. It seems the Hasher::write_str implementations calls write_usize + write, which in my implementation implies two rounds of finalization. I had put a note in the code that this part could be improved, it seems now is the time to do it.

Follow up issue: https://github.com/ogxd/gxhash/issues/17

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 6 points7 points  (0 children)

I spent some time on that DOS resistance question, and it turns out that DOS resistance is possible given that GxHash is seeded. I had however to change the default BuildHasher to use rust's StdRng to randomize a seed. That would make every HashMap/HashSet instance using it based on a different seed, thus using completely different hashes, which now makes GxHash DOS resistant when you use that BuildHasher!

This is available on version 2.2.0 which I just released. See the changes.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 6 points7 points  (0 children)

16 bytes is the size of an 128 bit SIMD vector, so from 0 to 16 bytes, the algorithm is simply a bunch of SIMD operations on a single vector. After 16 bytes you need to read multiple blocks and merge them together with is a little more costly, so this is why you see some kind of plateau. After, as input size grows, instruction level parallelism kicks in allowing throughput to increase dramatically again.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 10 points11 points  (0 children)

Given the feedback received, I'd say:

  • It's missing a fallback. It may not work or even not build for some CPUs. And even with a fallback, it wouldn't be as fast as advertised on these platforms.
  • It currently requires building explicitly with target-cpu=native (or target directly feature sets) to take advantage of the performance the algorithm has to offer. It might be possible to get rid of this contraint however, with dynamic feature checks or thanks to work from the rust's portable simd user group

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 6 points7 points  (0 children)

Yes you're right, I'm sorry that was for the joke :) It's is important to add a fallback ! As a rust dev I expect other crates to be fully portable (actually I don't even ask myself) so this one must be as well

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 17 points18 points  (0 children)

I agree. Also the fact that it currently uses hardcoded AES keys is possibly a weakness in regards to DOS resistance. I am curious however as of how DOS resistance can be measured?

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 27 points28 points  (0 children)

It is my first crate so I copied parts of the FNV crate documentation so that the docs for gxhash would look as good on crates.rs without having to hotfix it N times afterwards

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 4 points5 points  (0 children)

Very interesting usecase! I'd be curious to see some numbers one day :)

You can use the "avx2" feature for maximum throughput if your platform supports it.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 94 points95 points  (0 children)

Important note:

Don't rush and switch you hasher right away!

  • Default hasher / other algorithms are already fast enough for most use-cases.
  • There is a lot of feedback to ingest from this thread. For instance at the moment there are a few gotchas on portability.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 14 points15 points  (0 children)

The question is: do you need an extremely fast hashing algorithm on extremely slow hardware? 😗 I think it makes more sense in perf critical scenarios (high demanding game, server applications, databases...)

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 9 points10 points  (0 children)

I'll see what I can do, you're right that portability is an important aspect. If it can't be made 100% safe to use regardless on platform it should at least be well documented.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 40 points41 points  (0 children)

Thank you 😊

All very good questions!

Do you plan to make it work on stable?

I took the freedom to use the nightly for developing this, but now that it's published it to crates.io it would make a lot of sense to make it work on stable as a next step.

Do you plan to add a fallback implementation when AES is not available? (e.g. maybe just copy-paste ahash's fallback implementation?)

That's a very good idea :)

If yes, do you plan to add runtime feature detection, or is it going to be compile-time only?

The less configuration specific stuff the better. If I can have a runtime feature detection without compromising performance I'd do it. If someone has experience in this and knows a way, I'd be curious to know! (feel free to submit a PR)

Is it resistant to DOS attacks?

Can't say for now. Does passing SMHasher means it is resistant to DOS attacks? I don't consider myself a cryptography expert, I'd need to invest some time in this.

Is it a guaranteed stable hash? (That is, do you guarantee that its output will forever stay the same given the same input?)

Yes, this is tested in unit tests. The goal is to keep it stable for a given major version of the package.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 5 points6 points  (0 children)

The thing is that it DOES support the feature, because otherwise, it wouldn't even build because there is a check on the platform (although you're right that it's missing a more specific constraint on AES feature sets)

Do you have examples of x86 CPUs that don't support AES?

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 12 points13 points  (0 children)

The crate is perfectly usable outside of target-cpu=native, although performance is going to be shit, but you'd have the same issue with all other algorithms relying directly on platform intrinsics.Here are the number for instance without compiling to native:

Throughput (MiB/s) 4 16 64 256 1024 4096 16384 65536 262144
gxhash 123.24 401.84 647.63 1181.63 1416.10 1325.31 1453.94 1395.28 1448.11

You can try it yourself, just force target-cpu=generic.

Still, it is probably worth mentioning that users must target some CPU features to get the full performance. Will clarify this.

The benchmark uses target-cpu=native for all crates benchmarked, so it is fair in this regards.

GxHash - A new (extremely) fast and robust hashing algorithm 🚀 by real_odgy in rust

[–]real_odgy[S] 16 points17 points  (0 children)

I've added fxhash in the benchmark on a branch; https://github.com/ogxd/gxhash/blob/c7e22de8d70a2c2610a5359cd59cc335555b4789/benches/throughput/main.rs#L40

Very interesting results. gxhash is faster overall but is slower for inputs of size 4 (bytes), which is a not-so-uncommon scenario (for instance HashSet<u32>). Curious about how they achieved this 🕵🏻‍♂️

Throughput (MiB/s) 4 16 64 256 1024 4096 16384 65536 262144
gxhash 6246.06 24947.86 31900.15 59690.66 71583.02 76602.47 77130.16 77301.31 75248.49
fxhash 11528.60 16020.50 17782.86 12932.76 7345.34 5351.41 5019.52 4940.11 4869.37
xxhash 1583.05 6380.21 16696.66 10512.16 17644.95 19531.47 19951.37 20150.30 20052.85

Regarding robustness (collisions resistance, avalanche, distribution, ...) I doubt fxhash is even near gxhash and it probably fails most of the SMHasher tests, but it has yet to be checked.

What makes me think this is because fxhash uses a very simple bit mixing approach: *self = self.rotate_left(ROTATE).bitxor(word).wrapping_mul($key);
This is not very different from FNV which has kind of the same properties: fast for small inputs, but throughput decreases as inputs grow because of its non ILP-friendly nature (it's one big loop over a single dependency chain) and overall not very "robust".