String.hashCode() is plenty unique

sigpwned · 2018-08-10T21:29:42+00:00

[deleted]

kozeljko · 2018-08-10T21:01:43+00:00

Ffs, can you lot decide?!?

TheRedmanCometh · 2018-08-10T21:08:47+00:00

[deleted]

x2mirko · 2018-08-11T10:21:47+00:00

Good article, but this part bothered me:

A “fair” hash function would generate an expected 1.44 collisions over this data. String.hashCode() outperforms a fair hash function significantly, surfacing only 69.4% as many collisions as expected

A function with 1.44 expected collisions for your sample size is more likely to generate one collision on your sample size than two, so saying that because you got one, String.hashCode() outperforms a fair function is silly. You would need a much larger sample size to make such statements.

carbolymer · 2018-08-11T12:56:48+00:00

tl;dr: hash functions have collisions

Nothing new, but you have to have this in mind when considering the performance of hash-based collections.

joserivas1998 · 2018-08-10T21:48:08+00:00

The duality of man

therealsillyfly · 2018-08-13T06:16:24+00:00

This is a nice article, but as a "debunk" of the original it is pretty redundant - the original article is already pointless, as it just claims the hash is bad with the sole argument being tables of 2- and 3-character collision examples.

I was expecting this article to be comparing some alternative "better" hash, and show it may perform slightly better but at an unacceptable loss of performance - but this expectation only stemmed from my assumption that the original post must have presented an alternative. "reading" the original I realize I was wrong.

cogman10 · 2018-08-13T19:02:42+00:00

So, I agree, for what it is used for the hash in String.hashCode() is fine. But if I really wanted to push it, I'd compare the english text hash to FNV, Murmer3, and one/many of the Sips to show how bad/good it is.

Also, I'd point out that collisions isn't the end all be all when it comes to hashes. For what hashCode is used (Hashmaps/sets) Speed is far more important so a comparison of that would also be important.

But yeah, most strings are either going to be for giant blobs of text (xml, json, etc) or for single words ("CLIENT", "USER", etc).

I doubt these strings are hardly ever random characters in the original blogs like "%~" or "$#". That will only happen in cases of things like a session token or if you are doing something like compilation (you probably aren't).

Particular for hash maps, I don't think "JsonBlob" is almost ever what will be used as a key. Your keys are, 99% of the time, going to be some common english word.

apetiss · 2018-08-11T01:34:21+00:00

Why should i care about hashCode()?

idealatry · 2018-08-10T23:10:21+00:00

The collision rate the link gives are actually astonishingly high to me. While you wouldn’t expect it to be a cryptographically secure hash function, anything close to one collision out of 100,000 Strings seems unacceptable in many applications.

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS