The HashMap
in Java is one of the most efficient data structures for key-value pairs, offering average O(1) time complexity for basic operations like put()
, get()
, and remove()
. The efficiency of HashMap
comes from how it manages keys using hashing and how it handles potential collisions in the key space. Here's a breakdown of how HashMap
works internally:
1. Basic Structure
Array of Buckets: Internally, a
HashMap
consists of an array, where each element is referred to as a bucket. This array is created when theHashMap
is initialized, and each bucket stores a linked list or a binary tree of entries.Entries (Nodes): Each entry in the
HashMap
contains a key-value pair, along with the hash value of the key and a reference to the next entry in case of collisions. An entry can be represented as:
static class Node<K,V> {
final int hash; // Hash of the key
final K key; // Key object
V value; // Value object
Node<K,V> next; // Reference to the next node in case of collision
}
2. Key Operations
a. Hashing
When you insert a key-value pair into a HashMap
, it computes a hash code for the key. The hash code is an integer value derived from the key's hashCode()
method.
-
Hash Code Calculation:
HashMap
uses thehashCode()
method of the key object to calculate the initial hash code, and then applies a secondary hashing process (bit-shifting and XOR) to spread the values evenly across the array:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
The purpose of this secondary hashing is to reduce collisions and distribute the keys more evenly.
b. Index Calculation (Bucket Mapping)
Once the hash code is calculated, HashMap
determines the index of the bucket where the entry should be stored by using the modulo operation (hash % array.length
). In Java, this is implemented with a bitwise AND operation:
int bucketIndex = (n - 1) & hash; // n is the size of the bucket array
c. Put Operation
Compute Hash: When you add a key-value pair using the
put()
method, the first step is to compute the hash code for the key using the abovehash()
function.Calculate Index: The index in the bucket array is calculated based on the hash value.
-
Insert into Bucket:
- If the bucket is empty, the entry is inserted directly.
- If there is already an entry at that index (collision), the
HashMap
uses chaining to resolve the collision, i.e., the new entry is added as the next node in a linked list or as a node in a binary tree (if the bucket contains many nodes).
d. Get Operation
Compute Hash: When you call
get()
to retrieve a value, the hash of the key is computed, just like in theput()
operation.Locate Bucket: Using the hash, the bucket index is determined.
Search Bucket: The
HashMap
then searches for the key in the linked list (or tree) of the corresponding bucket by comparing the hash and the key using theequals()
method. If found, it returns the associated value.
e. Remove Operation
Compute Hash and Index: Like
get()
,remove()
starts by computing the hash of the key and locating the bucket using the index.Traverse Bucket: The bucket is then traversed to find the node with the matching key, which is removed from the linked list (or tree).
3. Collision Handling
a. Chaining with Linked List (Pre-Java 8)
When two or more keys produce the same hash and map to the same bucket index, a collision occurs. In earlier versions of Java, collisions were handled using separate chaining. This means that multiple entries with the same bucket index are stored in a linked list at that index.
- If multiple keys collide, they are added as nodes in a linked list within the same bucket. When searching,
HashMap
traverses the linked list and compares the keys using theequals()
method.
b. Chaining with Tree (Post-Java 8)
In Java 8 and later, if the number of entries in a bucket exceeds a certain threshold (8 entries), the linked list is converted to a binary search tree (BST). This improves the search time from O(n) to O(log n) in cases of excessive collisions.
-
Treeification: When the linked list at a bucket exceeds a threshold (default 8),
HashMap
converts it into aTreeNode
(balanced binary tree) to speed up searches and retrievals. This is called treeification.
After treeification, the time complexity of lookup operations improves to O(log n) in the worst-case scenario (compared to O(n) for linked lists).
4. Rehashing and Load Factor
a. Load Factor
HashMap
has a default load factor of 0.75, meaning that when the map reaches 75% of its current capacity, it triggers a resize operation to maintain performance.
- Load Factor: The ratio of the number of elements (entries) to the size of the bucket array.
- If the number of entries exceeds the product of the load factor and the current bucket array size, the
HashMap
will resize itself (by doubling the bucket array size) and rehash all the entries to redistribute them across the new array.
b. Resizing and Rehashing
When a HashMap
resizes, it:
- Creates a new bucket array, typically double the size of the old one.
- Recomputes the index for each key by calling the hash function again and redistributing all the entries into the new array.
Rehashing ensures that the performance remains close to O(1) as the HashMap
grows in size.
5. Key Points on HashMap Performance
Best Case (O(1)): If there are no collisions, basic operations like
put()
,get()
, andremove()
take constant time because they directly access the bucket and locate the element.Worst Case (O(n)): In the worst case, all keys might hash to the same bucket, causing collisions and degrading performance to O(n) if the bucket is a linked list. However, Java 8 improves this by converting the list to a tree, making it O(log n).
Space Complexity: The space complexity is proportional to the number of entries and the size of the bucket array. Additionally, extra space is required for linked list nodes or tree nodes in case of collisions.
6. Example Code:
import java.util.HashMap;
public class HashMapExample {
public static void main(String[] args) {
HashMap<String, Integer> map = new HashMap<>();
// Insert elements
map.put("Apple", 1);
map.put("Banana", 2);
map.put("Orange", 3);
// Retrieve elements
System.out.println("Apple: " + map.get("Apple"));
System.out.println("Banana: " + map.get("Banana"));
// Remove element
map.remove("Orange");
// Check if map contains a key
if (map.containsKey("Apple")) {
System.out.println("Apple is present in the map");
}
}
}
Summary of HashMap's Internal Working:
- Hashing: Converts keys into hash codes and maps them to bucket indices.
- Buckets: Stores key-value pairs in an array of buckets.
- Collision Handling: Uses chaining (linked list or tree) to handle collisions.
- Rehashing: Resizes the map when the load factor threshold is reached.
- Performance: O(1) average time for insertion, lookup, and deletion, but can degrade to O(n) in the worst case if too many collisions occur (mitigated by treeification in Java 8+).
Understanding the internal workings of HashMap
allows you to make better design decisions and optimize performance when dealing with large datasets or performance-critical applications.
Top comments (0)