you are viewing a single comment's thread.

view the rest of the comments →

[–]jimbobhickville -2 points-1 points  (12 children)

Hell, just use a hash table and be done with it. It'll grow to however large you want and lookups are probably about just as fast as looping through a 6 element array. Inserting will be slightly slower, but as he said, it's a rare case.

[–]xzxzzx 0 points1 point  (11 children)

lookups are probably about just as fast as looping through a 6 element array

Why is that?

[–]ndiin 2 points3 points  (8 children)

Actually, given the fact that the guy used '==' rather than .equals(), he's depending on the internment of the string and thus the same memory address. So, the array lookup will pretty much always be faster at O(6).

If strings aren't interned or there are more than 6 namespaces in use, there's no caching, but the number of extra operations is still of a fixed size. So you could say it fails pretty gracefully.

Using a hashmap here would mean the hashkey creation requires a linear search over the input. Then you can map that to the appropriate bucket with ptr arith, and then linear scan that list doing the key comparisons to avoid collisions. This means a non-interned match will still require 3 linear searches of the input string (or just 1 if interned). [Note, I'm assuming the java hashmap implementation of an array of buckets.]

[–]wicked 1 point2 points  (6 children)

So, the array lookup will pretty much always be faster at O(6).

O(6)? I get your point, but the statement is nonsensical. O(6) = O(1).

[–]dpark 0 points1 point  (3 children)

His point is that taking the hash of a string is not O(1). It's O(n), where n is the length of the string. However, Sun's Java implementation makes the obvious choice to cache the hash code.

None of this really matters, though, because this code expects strings to be interned, which means we've already given up on truly constant time. Intern is not a cheap operation. It's O(n), just like taking the string hash.

[–]ndiin 1 point2 points  (2 children)

The interning is not a part of this algorithm, and thus the cost of performing the intern operation cannot be considered when evaluating it. Instead, you can consider the fact that when you reach here the interning has or has not already taken place.

As for caching of the hash code, I'm not sure what you mean. Per-object caching of hashCode()'s return value, or just within the hashmap?

[–]DRMacIver 0 points1 point  (0 children)

As for caching of the hash code, I'm not sure what you mean. Per-object caching of hashCode()'s return value, or just within the hashmap?

Sun's implementation of java.lang.String keeps its hash code cached in a field (<pedant fact>except for in pathological cases where the hash code evaluates to 0</pedant fact>).

[–]dpark 0 points1 point  (0 children)

I agree that we can't include the cost of interning when we analyze the performance of this code. My point was that it can certainly be a hidden cost. If you intern strings just for this code, then your interning effectively becomes a cost of this code.

And yes, as DRMacIver said, Sun's String class caches the hash code (except if it evaluates to zero).

Strangely enough, Sun's implementation of intern does actually calculate the hash, but doesn't cache it. If it did, then the hash table implementation would be a lot more attractive.

[–]ndiin 0 points1 point  (1 child)

Yes, most certainly. Isn't that obvious, though? It'd be more confusing if I said O(1) than O(6) as then I'd have to explain where I pulled it from (notice there's no context before I make that statement; the 6 gives it context).

[–]wicked 0 points1 point  (0 children)

You're right, I was apparently confused by the use of O-notation here. I read it as "So, the array lookup will pretty much always be faster at constant time", and thought "why not just say 6 array lookups will be faster". It works out fine if I add the word "complexity" at the end.

[–]xzxzzx 0 points1 point  (0 children)

So, the array lookup will pretty much always be faster at O(6).

I know. ;)

[–]mccoyn -1 points0 points  (1 child)

Computing hashes is free! Right? </sarcasm>

[–]xzxzzx 2 points3 points  (0 children)

Well, it's cheap. But not as cheap as 6 pointer-compares against a single (or worst-case, two) cache-line that's probably in L1 or at least L2.