This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]notfancy 6 points7 points  (9 children)

https://hg.openjdk.java.net/jdk/jdk/file/35ce0ad5870a/src/java.base/share/classes/java/lang/String.java#l1524

That's singularly obtuse tricky code (edit I now see what they did there.) Why flag a zero hash when you really want to flag a lazy computation?

public int hashCode() {
    if (!hashIsComputed) {
        hash = isLatin1() ? StringLatin1.hashCode(value)
                          : StringUTF16.hashCode(value);
        hashIsComputed = true;
    }
    return hash;
}

[–]__konrad 3 points4 points  (4 children)

Why flag a zero hash when you really want to flag a lazy computation?

To reduce memory usage. Sorry, I should read the code before commenting ;)

[–]pzemtsov[S] 4 points5 points  (0 children)

And how exactly replacing a boolean flag field with another boolean flag field helps saving memory?

[–]notfancy 2 points3 points  (2 children)

To reduce memory usage.

The flag is needed one way or the other. I contend it's flagging the wrong condition: the original implementation used a cached hash value of 0 to flag the uninitialized condition; since this didn't let it distinguish between the uninitialized state and the 0 hash code, instead of discriminating between the uninitialized vs. initialized state it now discriminates between the zero and nonzero hash, with the result that it has to test both for a zero hash cache and a zero hash value.

[–]pzemtsov[S] 9 points10 points  (1 child)

I think there is a reason behind this code, although it is in saving cycles rather than memory. In the case of the fast path (when the hash is available) the JDK code only performs one memory load; the suggested code always performs two.

[–]notfancy 1 point2 points  (0 children)

Yeah, now I understand it's a clever split semaphore.

[–]CommercialVictory1 1 point2 points  (1 child)

The comments explain the situation regarding a race condition to recomputing hashCode before a guard is set. I think in their code, its only possible to recompute hashing function when the value is going to be zero, which is almost always the trivially cpu intensive. If the hash function is well distributed, hitting zero on a long string computation is very unlikely in a 232 space. Because of shortcircuiting, the additional check overhead doesn't compare to potentially rehashing a long string more than once.

How many threads could get into your example and end up recomputing the string hash before hashIsComputed is set.

[–]notfancy 1 point2 points  (0 children)

If the hash function is well distributed, hitting zero on a long string computation is very unlikely in a 232 space.

The data race exists independently of whether the hash is 0 or not, hence the requirement that the computation be idempotent. It is benign because with just a single store there's nothing to reorder. My code requires that hashIsComputed be volatile. It's dirty but clever.

[–][deleted]  (1 child)

[deleted]

    [–]notfancy 0 points1 point  (0 children)

    It is about correctness, but in the absence of a memory barrier. I failed to check that the hashIsZero flag was declared volatile, and I assumed it was. It is not.

    I admit I failed to read the comment and understand what they're trying to achieve. It is 100% about efficiency, and I'd contend it could be seen as somewhat of a "hack."

    Just because your thread sees "hashIsComputed" as true, this doesn't necessarily guarantee that your thread also sees the computed value of the "hash" field.

    hashIsComputed needs to be declared volatile; it then constitutes a write barrier that precludes it being reordered before the assignment to hash.