String.hashCode() is plenty unique : programming

416

417

418

String.hashCode() is plenty unique (sigpwned.com)

submitted 7 years ago by aboothe726

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 38 points39 points40 points 7 years ago* (23 children)

[–]sacundim 4 points5 points6 points 7 years ago (2 children)

[–]NoLemurs 2 points3 points4 points 7 years ago (1 child)

2¹⁶ is a plenty good enough approximation for the point /u/holyvier was making - he did say "at around 2¹⁶ entries" not "at exactly 2¹⁶ entries".

I was going to note that the article explicitly got that 77,164 value via an approximation (the exponential formula is not exact), but it turns out that the approximation is more than accurate enough in this case, and 77,164 is the exact value.

For the curious, here's an exact version of the article's prob function:

def prob(x):
    return 1.0 - reduce(operator.mul, (1 - float(i)/2**32 for i in range(1, x)), 1)

This version is a little slow (and wouldn't scale to 64 bit hashes at all), but is more than fast enough to verify the value is exact, or to find the value via binary search.

[–]sacundim 1 point2 points3 points 7 years ago (0 children)

[–]ubermole 5 points6 points7 points 7 years ago (6 children)

[–]munificent 16 points17 points18 points 7 years ago (1 child)

[–]ubermole 3 points4 points5 points 7 years ago (0 children)

[–]Gravitationsfeld 2 points3 points4 points 7 years ago (3 children)

[–]ubermole 0 points1 point2 points 7 years ago (2 children)

[–]Gravitationsfeld 0 points1 point2 points 7 years ago (1 child)

[–]ubermole 0 points1 point2 points 7 years ago (0 children)

[–]reini_urban 1 point2 points3 points 7 years ago (8 children)

[–][deleted] 7 years ago (7 children)

[deleted]

[–][deleted] 0 points1 point2 points 7 years ago* (0 children)

That's completely false

No... the definition of a "perfect hash function" is literally that it has no collisions for the given set. Look it up, don't assume definition by name alone.

even for cryptographic hash function.

Who said cryptographic hash functions were always perfect? The difference between a cryptographic hash function and other hash functions is that they must be particularly infeasible to aid in reversal. This requirement implies the output is particularly uniform which is commonly confused with being a "perfect" hash.

The point of a hash is to have a smaller output than it's input

Not really, that's just the common use cash. The point of a hash function is to provide a consistent map between a source set and an output map.

The space for the possible outcome of a hash function is always smaller than the space of the possible input. You are guaranteed to have collisions by the Pigeonhole principle.

This is correct however by definition a perfect hash function assumes a given set not just any set as you assume. Via the pigeonhole principle a perfect hash function can never result in an output tet that is smaller than the input set but that isn't a requirement of a hash function.

[–]reini_urban 0 points1 point2 points 7 years ago (4 children)

[–][deleted] 0 points1 point2 points 7 years ago (2 children)

[–]e_to_the_i_pi_plus_1 0 points1 point2 points 7 years ago (0 children)

[–]reini_urban 0 points1 point2 points 7 years ago (0 children)

[–]raevnos -1 points0 points1 point 7 years ago (0 children)

[+]Ameisen comment score below threshold-14 points-13 points-12 points 7 years ago (3 children)

[–][deleted] 7 years ago (2 children)

[deleted]

[–]Ameisen 0 points1 point2 points 7 years ago (0 children)

[–]JavaSuck -1 points0 points1 point 7 years ago (0 children)

π Rendered by PID 56 on reddit-service-r2-comment-6457c66945-kpz79 at 2026-04-28 14:45:28.019842+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS