Internal Working of HashMap

The HashMap in Java is one of the most efficient data structures for key-value pairs, offering average O(1) time complexity for basic operations like put(), get(), and remove(). The efficiency of HashMap comes from how it manages keys using hashing and how it handles potential collisions in the key space. Here's a breakdown of how HashMap works internally:

1. Basic Structure

Array of Buckets: Internally, a HashMap consists of an array, where each element is referred to as a bucket. This array is created when the HashMap is initialized, and each bucket stores a linked list or a binary tree of entries.
Entries (Nodes): Each entry in the HashMap contains a key-value pair, along with the hash value of the key and a reference to the next entry in case of collisions. An entry can be represented as:

  static class Node<K,V> {
      final int hash;  // Hash of the key
      final K key;     // Key object
      V value;         // Value object
      Node<K,V> next;  // Reference to the next node in case of collision
  }

2. Key Operations

a. Hashing

When you insert a key-value pair into a HashMap, it computes a hash code for the key. The hash code is an integer value derived from the key's hashCode() method.

Hash Code Calculation: HashMap uses the hashCode() method of the key object to calculate the initial hash code, and then applies a secondary hashing process (bit-shifting and XOR) to spread the values evenly across the array:

  static final int hash(Object key) {
      int h;
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  }

The purpose of this secondary hashing is to reduce collisions and distribute the keys more evenly.

b. Index Calculation (Bucket Mapping)

Once the hash code is calculated, HashMap determines the index of the bucket where the entry should be stored by using the modulo operation (hash % array.length). In Java, this is implemented with a bitwise AND operation:

  int bucketIndex = (n - 1) & hash;  // n is the size of the bucket array

c. Put Operation

Compute Hash: When you add a key-value pair using the put() method, the first step is to compute the hash code for the key using the above hash() function.
Calculate Index: The index in the bucket array is calculated based on the hash value.
Insert into Bucket:
- If the bucket is empty, the entry is inserted directly.
- If there is already an entry at that index (collision), the HashMap uses chaining to resolve the collision, i.e., the new entry is added as the next node in a linked list or as a node in a binary tree (if the bucket contains many nodes).

d. Get Operation

Compute Hash: When you call get() to retrieve a value, the hash of the key is computed, just like in the put() operation.
Locate Bucket: Using the hash, the bucket index is determined.
Search Bucket: The HashMap then searches for the key in the linked list (or tree) of the corresponding bucket by comparing the hash and the key using the equals() method. If found, it returns the associated value.

e. Remove Operation

Compute Hash and Index: Like get(), remove() starts by computing the hash of the key and locating the bucket using the index.
Traverse Bucket: The bucket is then traversed to find the node with the matching key, which is removed from the linked list (or tree).

3. Collision Handling

a. Chaining with Linked List (Pre-Java 8)

When two or more keys produce the same hash and map to the same bucket index, a collision occurs. In earlier versions of Java, collisions were handled using separate chaining. This means that multiple entries with the same bucket index are stored in a linked list at that index.

If multiple keys collide, they are added as nodes in a linked list within the same bucket. When searching, HashMap traverses the linked list and compares the keys using the equals() method.

b. Chaining with Tree (Post-Java 8)

In Java 8 and later, if the number of entries in a bucket exceeds a certain threshold (8 entries), the linked list is converted to a binary search tree (BST). This improves the search time from O(n) to O(log n) in cases of excessive collisions.

Treeification: When the linked list at a bucket exceeds a threshold (default 8), HashMap converts it into a TreeNode (balanced binary tree) to speed up searches and retrievals. This is called treeification.

After treeification, the time complexity of lookup operations improves to O(log n) in the worst-case scenario (compared to O(n) for linked lists).

4. Rehashing and Load Factor

a. Load Factor

HashMap has a default load factor of 0.75, meaning that when the map reaches 75% of its current capacity, it triggers a resize operation to maintain performance.

Load Factor: The ratio of the number of elements (entries) to the size of the bucket array.
If the number of entries exceeds the product of the load factor and the current bucket array size, the HashMap will resize itself (by doubling the bucket array size) and rehash all the entries to redistribute them across the new array.

b. Resizing and Rehashing

When a HashMap resizes, it:

Creates a new bucket array, typically double the size of the old one.
Recomputes the index for each key by calling the hash function again and redistributing all the entries into the new array.

Rehashing ensures that the performance remains close to O(1) as the HashMap grows in size.

5. Key Points on HashMap Performance

Best Case (O(1)): If there are no collisions, basic operations like put(), get(), and remove() take constant time because they directly access the bucket and locate the element.
Worst Case (O(n)): In the worst case, all keys might hash to the same bucket, causing collisions and degrading performance to O(n) if the bucket is a linked list. However, Java 8 improves this by converting the list to a tree, making it O(log n).
Space Complexity: The space complexity is proportional to the number of entries and the size of the bucket array. Additionally, extra space is required for linked list nodes or tree nodes in case of collisions.

6. Example Code:

import java.util.HashMap;

public class HashMapExample {
    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<>();

        // Insert elements
        map.put("Apple", 1);
        map.put("Banana", 2);
        map.put("Orange", 3);

        // Retrieve elements
        System.out.println("Apple: " + map.get("Apple"));
        System.out.println("Banana: " + map.get("Banana"));

        // Remove element
        map.remove("Orange");

        // Check if map contains a key
        if (map.containsKey("Apple")) {
            System.out.println("Apple is present in the map");
        }
    }
}

Summary of HashMap's Internal Working:

Hashing: Converts keys into hash codes and maps them to bucket indices.
Buckets: Stores key-value pairs in an array of buckets.
Collision Handling: Uses chaining (linked list or tree) to handle collisions.
Rehashing: Resizes the map when the load factor threshold is reached.
Performance: O(1) average time for insertion, lookup, and deletion, but can degrade to O(n) in the worst case if too many collisions occur (mitigated by treeification in Java 8+).

Understanding the internal workings of HashMap allows you to make better design decisions and optimize performance when dealing with large datasets or performance-critical applications.