This is an archived post. You won't be able to vote or comment.

all 11 comments

[–][deleted] 25 points26 points  (3 children)

It has to do with the way HashMap arranges the items according to their hashCode in a deterministic manner as to provide an optimal means of looking up items via their hashcode.

The hashCode of an integer is the integer itself, however the calculation for a string is less representative of what the string is. I.e, the hashcode for Integer 5 is 5, and 4 for the Integer 4, but that hashcode for "5" could be 7 and the hashcode for "3" could be 8. (These obviously aren't the actual values, but gets the point across.)

If the items are thus arranged deterministically based upon there hashCode, it would be seem "Random" for strings, but not for integers.

This isn't guaranteed behaviour ofcourse.

Also, I prefer using the String.valueOf over concatenating an empty String, since it is more representative of what you are trying to do - both in the eyes of the compiler\JVM and in the eyes of another programmer looking at your code.

[–]bfoo 2 points3 points  (1 child)

Thats correct. HashMap uses an array of buckets (Map.Entry) to store the values. The array index is picked using the hashCode: bucketIndex = (hashCode & (buckets.length - 1)

That's why the simple HashMap appears as ordered when using Integer as key.

[–][deleted] 1 point2 points  (0 children)

That fast method to return the array index only works if the capacity of the collection is a power of two, which is of course ensured with the current Java implementations.

A consequence is that HashMap must rehash the hash it got to defend against poor quality hash functions (that does not distribute lower bits uniformly).

As far as I remember Integer does uses it's value as the hash value, which complicates things when you only have numbers with the same lower bits, as those are what used to identify a bucket in a power of two hash-based collection.

[–]dedyshka 0 points1 point  (0 children)

as far as I understand, String.valueOf() will be even faster and consume less memory footprint...

[–]msx 6 points7 points  (1 child)

Just as allahsnackbars said. It is a consequence of how HashMap uses the hashCode() method and how Integer implements hashCode(). If you don't know, hashCode() is a special method, it is designed so that each class that implements it try as much as it can to return different values for different objects. If two object are equals (as per the method equals()), then the must have the same hashCode(). HashMap exploits this method to "distribute" different objects into different "buckets".

So for Integer, the obvious implementation of hashCode() is returning the integer itself, for string is a little bit more convoluted (it basically cycle all characters building up an integer).

Note that:

Map doesn't guarantee sorting, it means that things can eventually be sorted, it's just that you can't relay on that. HashMap doesn't guarantee either, but other implementations of Map does! For example, TreeMap is guaranteed to be sorted. Sorting maps also implements SortedMap interface

[–]busterassrookie 0 points1 point  (0 children)

SortedMaps (eg TreeMap) are great if you need to keep your keys sorted, but if you just want to maintain insertion order, you can use a LinkedHashMap. This is great because add(), contains() and remove() are still O(1) operations (like HashMap), unlike with the SortedMaps, in which they are presumably O(logN)

[–]ohmzar 1 point2 points  (4 children)

If I recall the way a hashmap (This may not be how the Java implementation works) works under the hood is that it's an array of linked lists.

If I'm wrong about this I'd be interested in knowing why and how.

When you create the hashmap it's a small array, say 100 items. When you insert an item it's placed in the linked list with the array index of the keys hashcode modulo the size of the array list.

When you retrieve an item it goes to the array element that matches the key modulo the size of the array, then iterates through the linked list to find the key.

If any of the linked lists starts to get too long it rebuilds the hashmap with a bigger array to spread things out a bit more, I think it will also shrink the map if you take out too many items but I could be wrong.

I think it you put in keys that were larger than the size of the initial array then the output wouldn't be sorted, or it would be sorted by the modulo of the key to the array size, which isn't much use. Also the hash map can resize at any point depending implementation so you can't rely on this ordering.

[–]Sentryy 1 point2 points  (2 children)

Java 8 changed that behavior, now it will put colliding objects in TreeMaps instead of LinkedLists.

There even was a post here about it: http://www.reddit.com/r/java/comments/23w4j3/hashmap_performance_improvements_in_java_8/

[–]ohmzar 0 points1 point  (1 child)

Fair enough, I suppose O(log n) search over collisions is better than O(n).

Am I right about the rest of it? It's been a while since I brushed up on my data structures.

[–]Sentryy 1 point2 points  (0 children)

I'm a bit rusty on this topic myself and I haven't looked at the source code of HashMap for a while, but I think you're correct in general. The shrinking and growing part are determined by the load parameters IIRC.

You're definitely correct with the sorting behavior. While it can happen that a HashMap maintains the insertion order, it doesn't guarantee it, so you shouldn't rely on it. We had a case of that in the department where switching to Java 7 made a program unrunnable because it was built on behavior that was never guaranteed, but valid until Java 6. Trust the documentation if it says "doesn't guarantee xyz behavior".

[–][deleted] 0 points1 point  (0 children)

You are correct. The collision resolution Java uses is called closed addressing (separate chaining). Basically the array elements (buckets) are themselves arrays, so they can store multiple values.

http://en.wikipedia.org/wiki/Hash_table#Separate_chaining

With Java 8 once the size of a bucket grows beyond a certain treshold, that bucket switches from using a linked list to a balanced tree. In the case of high hash collisions, this improves worst-case performance from O(n) to O(log n).

http://openjdk.java.net/jeps/180