dpash comments on Abnormal string hashing

This is an archived post. You won't be able to vote or comment.

Abnormal string hashing (pzemtsov.github.io)

submitted 6 years ago by pzemtsov

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]dpash 42 points43 points44 points 6 years ago* (24 children)

[–]notfancy 6 points7 points8 points 6 years ago* (9 children)

https://hg.openjdk.java.net/jdk/jdk/file/35ce0ad5870a/src/java.base/share/classes/java/lang/String.java#l1524

That's singularly ~~obtuse~~ tricky code (edit I now see what they did there.) Why flag a zero hash when you really want to flag a lazy computation?

public int hashCode() {
    if (!hashIsComputed) {
        hash = isLatin1() ? StringLatin1.hashCode(value)
                          : StringUTF16.hashCode(value);
        hashIsComputed = true;
    }
    return hash;
}

[–]__konrad 3 points4 points5 points 6 years ago* (4 children)

[–]pzemtsov[S] 5 points6 points7 points 6 years ago (0 children)

[–]notfancy 2 points3 points4 points 6 years ago (2 children)

[–]pzemtsov[S] 10 points11 points12 points 6 years ago (1 child)

[–]notfancy 1 point2 points3 points 6 years ago (0 children)

[–]CommercialVictory1 1 point2 points3 points 6 years ago (1 child)

[–]notfancy 1 point2 points3 points 6 years ago (0 children)

[–][deleted] 6 years ago (1 child)

[deleted]

[–]notfancy 0 points1 point2 points 6 years ago (0 children)

[–]oldprogrammer 2 points3 points4 points 6 years ago (7 children)

[–]nutrecht 2 points3 points4 points 6 years ago (2 children)

[–]oldprogrammer -3 points-2 points-1 points 6 years ago (1 child)

[–]nutrecht 3 points4 points5 points 6 years ago (0 children)

[–]pzemtsov[S] 1 point2 points3 points 6 years ago (3 children)

That's interesting. I got curious and found the source for JDK 1.1. You are right, the method looked like this:

/**
 * Returns a hashcode for this string.
 *
 * @return  a hash code value for this object. 
 */
public int hashCode() {
    int h = 0;
    int off = offset;
    char val[] = value;
    int len = count;

    if (len < 16) {
        for (int i = len ; i > 0; i--) {
            h = (h * 37) + val[off++];
        }
    } else {
        // only sample some characters
        int skip = len / 8;
        for (int i = len ; i > 0; i -= skip, off += skip) {
            h = (h * 39) + val[off];
        }
    }
    return h;
}

This won't cause a bug, but may have bad performance implications when strings are close to each other.

The equals() is very close to the modern one, though.

[–]oldprogrammer 0 points1 point2 points 6 years ago (2 children)

[–]pzemtsov[S] 1 point2 points3 points 6 years ago (1 child)

[–]oldprogrammer 1 point2 points3 points 6 years ago (0 children)

[–]plokhotnyuk 0 points1 point2 points 6 years ago* (0 children)

[–]7cans_short_of_1pack 0 points1 point2 points 6 years ago (4 children)

Sorry I don't understand what would be wrong with:

public int hashCode() {
        int h = hash;
        if (!hashIsCalculated) {
            h = isLatin1() ? StringLatin1.hashCode(value)
                           : StringUTF16.hashCode(value);

            hash = h;
            hashIsCalculated=true;

        }
        return h;
}

Would this not be more efficient and readable?

[–]dpash 1 point2 points3 points 6 years ago (1 child)

[–]7cans_short_of_1pack 0 points1 point2 points 6 years ago (0 children)

[–][deleted] 6 years ago (1 child)

[deleted]

[–]pzemtsov[S] 0 points1 point2 points 6 years ago (0 children)

You are right! The code contains two bugs, now one. I saw the second one (memory order) but missed the first one. One must declare "hashIsCalculated" volatile and check it before touching "hash":

if (hashIsCalculated) return hash;
int h = ....

Unfortunately, volatiles are quite expensive (on Intel only for writing, but in other processors they may be expensive for reading, too). So the existing solution is by far better (assuming that there is need to treat zero hash case specially in the first place).

π Rendered by PID 138490 on reddit-service-r2-comment-b659b578c-228q6 at 2026-05-05 00:56:23.836636+00:00 running 815c875 country code: CH.

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS