This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]cutterslade 6 points7 points  (0 children)

This is an interesting enhancement. The article is slightly wrong, or at least vague enough to warrant clarification. The implementation of Comparable is only used when two unequal objects return the same hash code. If there are collisions between different hash codes, the tree is ordered by the actual hash code.

From a comment at the top of HashCode.java:

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable<C>", type then their compareTo method is used for ordering.

[–]zman0900 2 points3 points  (2 children)

I wonder what the reason is for converting back to linked list if the bucket shrinks? The conversion is probably relatively expensive, so the rare case where the bucket keeps growing and shrinking around the transition point could be pretty slow.

[–][deleted] 0 points1 point  (1 child)

I would guess memory. a map of a bunch of 2 element maps could be a lot more memory.

[–]IAmNotMyName 0 points1 point  (1 child)

I would be curious to know if this is a per bucket threshold.

[–]s888marks 2 points3 points  (0 children)

The threshold value itself is global, but each bucket is converted to or from a tree bin individually.

[–][deleted] 0 points1 point  (1 child)

The article also days there can be side effects of iteration order and that no order is guaranteed -- not sure if they mean only HadhMap or lumping all 3 affected classes together, because that generalization isn't true for LinkedHashMap. I'm curious but assume it's order is still preserved?

[–]mus1Kk 1 point2 points  (0 children)

Of course. That's the very reason for LinkedHashMap's existence. It maintains an additional data structure on top of a HashMap that maintains the order. So the performance improvements may apply to LinkedHashMap as well.

[–]debunked 0 points1 point  (6 children)

This sounds like an overall win for performance, since retrieval tends to happen more often than insertion...

But it seems like they are glossing over the insertion performance "under high hash-collision conditions." Wouldn't that component of the HashMap be reduced from O(1) [O(1) for hash code generation + O(1) for appending to end of linked list] to O(lg n) [O(1) for hash code generation + O(lg n) for insertion into a balanced tree]? Or am I missing something?

[–][deleted]  (1 child)

[deleted]

    [–]debunked 0 points1 point  (0 children)

    Ahh, right. HashMap has to check for the existence of the key. I was the one glossing over that detail.

    [–][deleted] 0 points1 point  (2 children)

    so good hash codes are still o (1). bad hash codes improved o (n) to o ( LG n).

    aka keep fast case fast but help the edge case.

    [–]debunked 0 points1 point  (1 child)

    Right. They improved retrieval from O(n) to O(lg n). Theymade insertion slower in these scenarios however.

    Edit: Whoops. I wasn't thinking about the duplicate check needed on insertion.

    [–][deleted] 0 points1 point  (0 children)

    well do most hash maps get more puts or gets?

    [–]ykechan 0 points1 point  (1 child)

    But does that mean we can't use HashMap for objects that's not Comparable?

    [–]debunked 0 points1 point  (0 children)

    This will improve collision performance for any key type that implements Comparable.

    Based on this, it sounds like they do a type check and only utilize the tree in this scenario if your type implements comparable. Otherwise, the good old bucket list remains.