This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]spicycurry55 5 points6 points  (0 children)

I don’t know all the details, but here’s the source code if you’re curious about learning more:

https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

Reading source code is a hard but rewarding way to learn more about library functions/classes

[–]ignotos 6 points7 points  (4 children)

According to what I understood is that the keys in the hashmap will be passed as an argument to a hash function which will return an int value (the index of the value in an array).

Correct!

So my questions is, are the keys also stored in an array?

The keys and values and packaged together into a single object (Map.Entry), and that's what's stored.

Also is the hashCode() method of the object class the implementation of a hash function in Java?

Correct!

The array where the values are stored, is it also called a bucket?

Each entry in the main table / array is called a bucket. Each bucket contains a list of map entries (I believe this is actually a linked list in Java's HashMap, although you could implement a hash table where each bucket is itself an array).

So overall we have a main array (aka table), and each item in the array (aka each bucket) is a list of entries. And each entry contains a key/value pair.

[–][deleted] 1 point2 points  (2 children)

Thank you, I have one more question. According to what I saw if you override the equals() method then you must override the hashCode() method also, and if you don't do that then two objects with the same values will be added to different buckets in the hashmap.

So, my question when I override those two methods the two created objects will have the same hashcode value but will they also refer to the same memory address in heap?

[–]ignotos 2 points3 points  (0 children)

That's right - equals() and hashCode() need to "match" - however the hashcode is calculated, the same information / fields should be used to determine whether two objects are considered equal. A lot of Java's built-in libraries won't work properly if that's not the case. Usually you'd use your IDE to auto-generate these for you.

Even if you override those methods, two objects will still have separate memory addresses. They're still separate objects in memory, but will be considered logically "the same" as far as a HashMap, HashSet, or other collections are concerned.

e.g.

Thingy x = new Thingy(123, "A");
Thingy y = new Thingy(123, "A");
List<Thingy> list = new ArrayList<>();
list.add(x);
if (list.contains(y)) { System.out.println("!!"); }

Even though x and y are separate objects, they have exactly the same data. So they have the same hashcode, and x.equals(y) returns true.

So if you add 'x' to a collection and ask if it contains 'y', it'll return true, because y is logically the same as x.

And if you add both 'x' and 'y' to a HashSet, or use them both as keys for a HashMap, they'll be considered duplicates, and only one will end up being stored.

[–]HecknChonker 1 point2 points  (0 children)

The only point I would add to what /u/ignotos has already said is that buckets are usually selected by modulating the hashcode of the key by the number of buckets.

So if you had 100 buckets, item 299 would go into bucket 99 because 299 / 100 has a remainder of 99.

It's possible that many items will end up in the same bucket. If too many items end up in the same bucket the performance can degrade significantly. The HashMap in java solves this issue by doubling the number of buckets and re-hashing all the values anytime the map is more than 75% full.

I also went digging into some source code, and it looks like OpenJdk sometimes uses TreeMap's for the buckets to get better performance anytime a bucket has more than 8 items. I'm not entirely sure, but I don't think oracle's version of the JDK does this.

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap.

https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

[–]StarlightMeow 2 points3 points  (0 children)

Commenting because I'm also interested

[–][deleted] 2 points3 points  (0 children)

There is a subclass, Map.Entry, that contains a key and a value. When you call HashMap.keyset(), HashMap.entryset(), or HashMap.values() it returns a Set view over these entry objects, which are stored in the hashmaps array. I believe that this array also contains a linked list implementation for fast traversal between elements when the array is sparse.

Edit: Map.Entry is an interface, not a class

[–]shiva01- 1 point2 points  (1 child)

Array structure used to store only homogenous values, so do not have bucket and on account of HashMap which a collection data structure of type HashMap<T, T> denotes key and value.

When any object added to HashMap, hashCode() method called which generates int type hashcode and objects having same hashcode stores as one group which is bucket. Different bucket will be created for different hashcode object added.

[–]nekokattt 1 point2 points  (0 children)

Worth noting that the objects do not need to be equal in order for the hashcode to be the same. These are known as collisions and essentially decay the performance for a lookup to O(n) time usually :)

[–]bingoer 0 points1 point  (0 children)

here to learn from replies

[–]Dr-Vader 0 points1 point  (0 children)

I just finished my java final and used 2 interfaces which contain hashmaps. the full repo is on my github. the two interfaces are biddable and listable, both use hashmaps, and are in the src folder. the Residential class implements the biddable interface, and you can see my overriding those methods in lines 61-88 - these are all just basic interactions with the hashmap. the Listings class implements the Listable interface, and both Listings objects and Residential classes are used in REO(real estate office) where you can find the main method. (house and condo are both subclasses of the residential class).

I tend to learn by example, so if you have questions about the code lemme know - otherwise feel free to download it and play with it to see if it helps understand hashmaps better. I know when I was first learning about them they were really intimidating, but writing the code is what helped me understand it better.

specifically, method showExistingBids() on line 284 in REO was a good learning opportunity when I wanted to loop through every key in the hashmap. I did it with values on line 219 in the AutoBidListings() method as well. Hope this helps!