How Hash Maps Work Under the Hood

#programming #computerscience #datastructures #beginners

https://www.youtube.com/watch?v=e9Ny-9Gwsvk

You have a million contacts in your phone. You type a name and it appears instantly. Not after scanning a thousand entries. Instantly. How?

The Core Trick: Calculate, Don't Search

A hash map takes your key — say, "Alice" — and feeds it into a hash function. The function produces a number, which gets reduced (usually modulo the array size) into an index. You don't search for where Alice lives. You calculate it. One operation, direct access. That's O(1) lookup.

But hash functions map infinite possible keys into a finite number of slots. Eventually, two keys land in the same bucket. As the table fills, collisions become unavoidable.

Handling Collisions

Chaining — each bucket holds a linked list. Collisions just append to the chain. Simple, but every pointer chase is a cache miss since list nodes are scattered across memory.

Open addressing — instead of a list, you look for the next empty slot in the array itself. Linear probing checks the next slot, then the next, forming clusters.

Robin Hood hashing — an elegant variant. When a new key collides, it checks: did I travel farther than the key sitting here? If so, it swaps in, and the displaced key continues the search. Stealing from keys with short probe distances, giving to keys with long ones. The result: every key ends up roughly the same distance from home.

Why Resizing Isn't As Expensive As It Looks

As entries pile up, the load factor (entries / buckets) rises. Every implementation draws a line — Java at 0.75, Python at 2/3, Go at ~0.8. Cross it, and the table doubles. Every entry gets rehashed into a new, larger array.

Doubling a table with a million entries sounds brutal. But think of it like a savings account — every insert secretly sets aside extra coins. When the table finally doubles, every element already has enough saved to pay for its own rehash. The bill was prepaid, one insert at a time.

That's why insertion is O(1) amortized. The rare expensive resizes are already budgeted for.

The Battle Scars

In 2011, researchers showed that a single 100KB HTTP request could freeze a web server for 90 seconds. The attack: craft parameter names that all hash to the same bucket. O(1) becomes O(n). Repeat across every lookup, and the server is pinned.

This HashDoS attack affected virtually every major language — Java, Python, Ruby, PHP, JavaScript.

The defenses are now built into the languages you use daily. Python switched to SipHash — a keyed hash function with a random secret generated at startup. Same string, different hash every time you launch Python. Go randomizes hash seeds too. Java converts long bucket chains into balanced red-black trees (linked list → tree when a chain hits 8 nodes).

These protections run silently every time you use a dictionary or a map. Most developers never know they're there.

Watch the animated explanation above to see collisions, Robin Hood hashing, and resizing visualized step by step.