How does HashMap Works Internally?

Table of contents

Reading Time: 7 minutes

What is HashMap?

Basically, HashMap is one of the most popular Collection classes in java. HashMap internally uses HashTable implementation. This HashMap class extends AbstractMap class that implements the Map interface.

Few important points about HashMap :

HashMap uses its static inner class Node<K,V> for storing the entries into the map.
HashMap allows at most one null key and multiple null values.
The HashMap class does not preserve the order of insertion of entries into the map.
HashMap has multiple buckets or bins which contain a head reference to a singly linked list. That means there would be as many linked lists as there are buckets. Initially, it has a bucket size of 16 which grows to 32 when the number of entries in the map crosses the 75%. (That means after inserting in 12 buckets bucket size becomes 32)
HashMap is almost similar to Hashtable except that it’s unsynchronized and allows at max one null key and multiple null values.
HashMap uses hashCode() and equals() methods on keys for the get and put operations. So HashMap key objects should provide a good implementation of these methods.
That’s why the Wrapper classes like Integer and String classes are a good choice for keys for HashMap as they are immutable and their object state won’t change over the course of the execution of the program.

Internal Working Of Hashmap:

HashMap stores the data in the form of key-value pairs. Each key-value pair is stored in an object of Entry<K, V> class. Entry<K, V> class is the static inner class of HashMap which is defined like below.

static class Entry<K,V> implements Map.Entry<K,V> 
{
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
 
        //Some methods are defined here
}

As you see, this inner class has four fields. key, value, next, and hash.

key: It stores the key of an element and its final.

value: It holds the value of an element.

next: It holds the pointer to the next key-value pair. This attribute makes the key-value pairs stored as a linked list.

hash: It holds the hashcode of the key.

These Entry objects are stored in an array called table[]. This array is initially of size 16. It is defined like below.

/* The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table;

To summarize the whole HashMap structure, each key-value pair is stored in an object of Entry<K, V> class. This class has an attribute called next which holds the pointer to next key-value pair. This makes the key-value pairs stored as a linked list. All these Entry<K, V> objects are stored in an array called table[]. The below image best describes the HashMap structure.

Hashmap Internal Structure

What Is Hashing?

The whole HashMap data structure is based on the principle of Hashing. Hashing is nothing but the function or algorithm or method which when applied on any object/variable returns a unique integer value representing that object/variable. This unique integer value is called hash code. Hash function or simply hash is said to be the best if it returns the same hash code each time it is called on the same object. Two objects can have the same hash code.

Whenever you insert new key-value pair using the put() method, HashMap blindly doesn’t allocate a slot in the table[] array. Instead, it calls a hash function on the key. HashMap has its own hash function to calculate the hash code of the key. This function is implemented so that it overcomes poorly implemented hashCode() methods. Below is the implementation code of hash().

/**
     * Retrieve object hash code and applies a supplemental hash function to the
     * result hash, which defends against poor quality hash functions.  This is
     * critical because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }
 
        h ^= k.hashCode();
 
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

After calculating the hash code of the key, it calls indexFor() method by passing the hash code of the key and length of the table[] array. This method returns the index in the table[] array for that particular key-value pair.

     /*
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

Note: To have a high-performance hashMap we need good implementation of hashCode() and equals() method along with hash function.

hashCode():

The hash function is a function that maps a key to an index in the hash table. It obtains an index from a key and uses that index to retrieve the value for a key.

A hash function first converts a search key (object) to an integer value (known as hash code) and then compresses the hash code into an index to the hash table.

The Object class (root class) of Java provides a hashCode method that other classes need to override. hashCode() method is used to retrieve the hash code of an object. It returns an integer hash code by default that is a memory address (memory reference) of an object.

The general syntax for the hashCode() method is as follows:

public native hashCode()

The general syntax to call hashCode() method is as follows:

int h = key.hashCode();

The value obtained from the hashCode() method is used as the bucket index number. The bucket index number is the address of the entry (element) inside the map. If the key is null then the hash value returned by the hashCode() will be 0.

equals():

The equals() method is a method of the Object class that is used to check the equality of two objects. HashMap uses the equals() method to compare Keys to whether they are equal or not.

The equals() method of the Object class can be overridden. If we override the equals() method, it is mandatory to override the hashCode() method.

Put Operation: How put() method of Hashmap works internally in Java?

The put() method of HashMap is used to store the key-value pairs. The syntax of the put() method to add key/value pair is as follows:

hashmap.put(key, value);

Let’s take an example where we will insert three (Key, Value) pairs in the HashMap.

HashMap<String, Integer> hmap = new HashMap<>();  
 
 hmap.put("John", 20);  
 hmap.put("Harry", 5);  
 hmap.put("Deep", 10);

Let’s understand at which index the key-value pairs will be stored into HashMap.

When we call the put() method to add a key-value pair to hashmap, HashMap calculates a hash value or hash code of key by calling its hashCode() method. HashMap uses that code to calculate the bucket index in which key/value pair will be placed.

The formula for calculating the index of the bucket (where n is the size of an array of the bucket) is given below:

Index = hashCode(key) & (n-1);

Suppose the hash code value for “John” is 2657860. Then the index value for “John” is:

Index = 2657860 & (16-1) = 4

The value 4 is the computed index value where the key and value will be stored in HashMap.

Note: Since HashMap allows only one null Key, the hash value returned by the hashCode(key) method will be 0 because the hashcode for null is always 0. The 0th bucket location will be used to place key/value pair.

How is Hash Collision occurred and resolved?

A hash collision occurs when the hashCode() method generates the same index value for two or more keys in the hash table. To overcome this issue, HashMap uses the technique of linked-list.

When hashCode() method produces the same index value for a new Key and the Key that already exists in the hash table, HashMap uses the same bucket index that already contains nodes in the form of a linked list.

A new node is created at the last of the linked list and connects this node object to the existing node object through the LinkedList Hence both Keys will be stored at the same index value.

When a new value object is inserted with an existing Key, HashMap replaces the old value with the current value related to the Key. To do it, HashMap uses the equals() method.

This method checks whether both Keys are equal or not. If Keys are the same, this method returns true and the value of that node is replaced with the current value.

How get() method in HashMap works internally in Java?

The get() method in HashMap is used to retrieve the value by its key. If we don’t know the Key, it will not fetch the value. The syntax for calling get() method is as follows:

value = hashmap.get(key);

When the get(K Key) method takes a Key, it calculates the index of the bucket using the method mentioned above. Then that bucket’s List is searched for the given key using the equals() method and the final result is returned.

Time Complexity of put() and get() methods

HashMap stores a key-value pair in constant time which is O(1) for insertion and retrieval. But in the worst case, it can be O(n) when all node returns the same hash value and are inserted into the same bucket.

The traversal cost of n nodes will be O(n) but after the changes made by Java 1.8 version, it can be a maximum of O(log n).

Concept of Rehashing

Rehashing is a process that occurs automatically by HashMap when the number of keys in the map reaches the threshold value. The threshold value is calculated as threshold = capacity * (load factor of 0.75).

In this case, a new size of bucket array is created with more capacity and all the existing contents are copied over to it.

For example:

Load Factor: 0.75
Initial Capacity: 16 (Available Capacity initially)
Threshold = Load Factor * Available Capacity = 0.75 * 16 = 12

When the 13th key-value pair is inserted into the HashMap, HashMap grows its bucket array size to 16*2 = 32.

Now Available capacity: 32

Threshold = Load Factor * Available Capacity = 0.75 * 32 = 24

Next time when 25th key-value pair is inserted into HashMap, HashMap grows its bucket array size to 32*2 = 64 and so on.

Key notes on internal working of HashMap

Data structure to store entry objects is an array named table of type Entry.
A particular index location in array is referred as bucket, because it can hold the first element of a linkedlist of entry objects.
Key object’s hashCode() is required to calculate the index location of Entry object.
Key object’s equals() method is used to maintain uniqueness of keys in map.
Value object’s hashCode() and equals() method are not used in HashMap’s get() and put() methods.
Hash code for null keys is always zero, and such entry object is always stored in zero index in Entry[].