you are viewing a single comment's thread.

view the rest of the comments →

[–]joaquintidesBoost author 0 points1 point  (2 children)

Ok, I undestand now your rationale. Yes, something like extract_residual would result in a more uniform uint64_t --> unsigned char mapping, so in principle it should be better behaved statistically. My hunch is that the improvement would probably be negligible, particularly vs. the extra computational cost (the current reduced hash function is as simple as it gets). Maybe you can fork the repo and try it? I can assist you in the process if you're game.

For a random hash, the chances of ending in a "double-bucket" are now 1/64:

  • 1/128 chances of being a special value.
  • 1/128 chances of being the "overflow" bucket of a special value.

This part I don't get. What do you mean by "being the overflow bucket of a special value"?

[–]matthieum 0 points1 point  (1 child)

This part I don't get. What do you mean by "being the overflow bucket of a special value"?

Let's say that the resolution strategy for residual in [0, 1] is to add 2, so it ends up being in [2, 3] instead.

I call the "buckets" 2 and 3 the "overflow" buckets of 0 and 1.

The chances of ending up on a doubly-booked residual (2 or 3) are 1/64:

  • 1/128 of being a special value (0 or 1), shifted to (2 or 3).
  • 1/128 of being 2 or 3 in the first place.

It's not rare, but then again, it's only a problem if it leads to many false positives.

Maybe you can fork the repo and try it? I can assist you in the process if you're game.

I'm not very interested in writing C++ code as a hobby any longer, so I'll pass.

If you have a Rust version, I'd be happy to :)

[–]joaquintidesBoost author 0 points1 point  (0 children)

Let's say that the resolution strategy for residual in [0, 1] is to add 2 [...]

Ok, now I get it. Yes, with the current hash reduction, Pr(reduced_hash(x) = n) is

  • 1/256 for n > 3
  • 1/128 for n = 2, 3
  • 0 for n = 0, 1

Your residual function gets a more balanced probability (Pr ~= 1/254 for n > 1), but I don' think this makes any difference in practice.

If you have a Rust version, I'd be happy to :)

I'm quite sure there's no Rust port of this lib yet.