DEV Community

Abhishek Kumar
Abhishek Kumar

Posted on

Internal Working of Collections:

1. How HashMap Works

The HashMap class in Java is one of the most widely used implementations of the Map interface. It stores key-value pairs and allows for efficient retrieval, insertion, and deletion of elements based on the hash of the key.

Key Concepts:
  • Hashing: Hashing is the process of converting an object into an integer value (called the hash code) using a hash function. This hash code is then used to determine the index (or "bucket") where the key-value pair should be stored.

  • Buckets: A HashMap internally maintains an array of "buckets." Each bucket is simply a storage location where entries (key-value pairs) are stored. Multiple entries can exist in the same bucket, but they will be stored as a linked list in case of collisions.

  • Collisions: A collision occurs when two keys generate the same hash code and, as a result, map to the same bucket. When this happens, HashMap handles it by chaining the elements together using a linked list or, after Java 8, by using a balanced tree if the number of collisions in a bucket exceeds a threshold (typically 8 entries).

  • Rehashing: When a HashMap exceeds its load factor (default 0.75), the map resizes itself by creating a new, larger array of buckets. All existing entries are then rehashed and redistributed into this new array. The new capacity is typically double the previous capacity. Rehashing ensures that the map maintains performance as it grows.

Step-by-Step Working of HashMap:
  1. Compute Hash Code: When a key-value pair is added, the hash function computes the hash code for the key (using the hashCode() method). This hash code is then used to determine the bucket where the key-value pair should be stored.

  2. Map Hash Code to Bucket Index: The hash code is then compressed into a valid array index by performing a modulo operation or bitwise AND with (n - 1) where n is the number of buckets.

  3. Collision Handling: If there is already an entry in the bucket (i.e., a collision has occurred), the map either:

    • Stores the new entry as a linked list node in that bucket (before Java 8), or
    • Converts the list into a balanced tree (from Java 8 onwards) if the number of elements in that bucket exceeds 8 (called "treeification").
  4. Get and Remove Operations: When retrieving or removing an entry, the HashMap first determines the bucket using the hash code of the key. Then it traverses the bucket (either as a linked list or tree) to find the correct key-value pair.

Example Code:
Map<String, Integer> map = new HashMap<>();
map.put("Apple", 1);   // Computes hash, adds to the bucket
map.put("Orange", 2);  // Computes hash, adds to another bucket or same in case of collision
Integer value = map.get("Apple");  // Retrieves value by recalculating the hash and finding the key
Enter fullscreen mode Exit fullscreen mode
Performance:
  • Average Case: O(1) for put(), get(), and remove() (because we access the bucket array directly).
  • Worst Case: O(n) if all keys map to the same bucket and form a long chain, but this is mitigated in Java 8 by treeifying after a certain threshold (O(log n)).

2. Differences between ArrayList and LinkedList

Both ArrayList and LinkedList implement the List interface, but their internal workings are different, leading to varied performance characteristics for different operations.

Criteria ArrayList LinkedList
Internal Structure Backed by a dynamic array. Backed by a doubly linked list.
Access Time O(1) for accessing elements via index (random access). O(n) for accessing elements (sequential access).
Insertion Time O(n) for adding/removing in the middle (due to shifting). O(1) for adding/removing at the ends; O(n) for the middle.
Memory Usage Requires contiguous memory. Stores actual objects in array. Each node has additional memory overhead (pointers to next and previous nodes).
Iteration Faster iteration due to contiguous memory and cache locality. Slower iteration since each node needs to be traversed.
Resize Cost Needs to resize (copy to a new array) when capacity is exceeded. No resizing is needed, but more memory is used for links.
Add/Remove at End O(1) when appending (unless resizing happens). O(1) for add/remove at ends.
Best Use Case Best when frequent access to elements by index is needed. Best for frequent insertions and deletions, especially at the beginning or end.
Key Points:
  • Random Access: ArrayList provides constant-time access to elements via index (get() and set() are O(1)), while LinkedList has to traverse the list from the beginning or end, making these operations O(n).

  • Insertion/Deletion:

    • ArrayList: Inserting or removing elements in the middle of the list is slow, as it requires shifting elements to maintain the array’s order (O(n)).
    • LinkedList: Inserting or removing elements from any position is faster in LinkedList (O(1) at the ends), but locating the insertion/removal point requires traversal (O(n)).
  • Memory Usage: ArrayList uses less memory than LinkedList because it only stores the object data itself, while LinkedList needs additional memory for the pointers (next and previous nodes) for each element.

Example Code for ArrayList:
List<String> arrayList = new ArrayList<>();
arrayList.add("Apple");
arrayList.add(0, "Banana");  // Adding at a specific index involves shifting elements
System.out.println(arrayList.get(1));  // Fast access
Enter fullscreen mode Exit fullscreen mode
Example Code for LinkedList:
List<String> linkedList = new LinkedList<>();
linkedList.add("Apple");
linkedList.addFirst("Banana");  // Efficient for adding at the beginning or end
System.out.println(linkedList.get(1));  // Slow access, as traversal is required
Enter fullscreen mode Exit fullscreen mode

Summary:

HashMap:

  • Stores key-value pairs with fast access (O(1) on average) using hashing.
  • Uses buckets to store entries; handles collisions via chaining or treeifying.
  • Resizes (rehashes) when the load factor exceeds a threshold.

ArrayList vs LinkedList:

  • ArrayList: Best for random access and minimal insertions/deletions. Slower for middle insertions/removals due to element shifting.
  • LinkedList: Best for frequent insertions/removals at the ends. Slower random access due to the need to traverse nodes. Uses more memory because of node pointers.

Understanding these internal workings will help you choose the right data structure for specific use cases and optimize the performance of your Java applications.

Top comments (0)