DEV Community

Cover image for Consistent Hashing: The Unseen Engine
Meeth Gangwar
Meeth Gangwar

Posted on

Consistent Hashing: The Unseen Engine

๐Ÿš€ Consistent Hashing: The Secret Behind Tech Giants! ๐Ÿš€

Consistent hashing is one of the most powerful techniques in system design, and it's the hidden engine behind many major platforms you use every day, including:

1๏ธโƒฃ Amazon DynamoDB
2๏ธโƒฃ Discord
3๏ธโƒฃ Apache Cassandra
4๏ธโƒฃ Akamai CDN
5๏ธโƒฃ Google's Maglev Load Balancer

This isn't just an academic conceptโ€”it's a fundamental system that powers modern, scalable applications. ๐Ÿ’ช For any aspiring engineer, a deep understanding of consistent hashing is a surefire way to impress in technical interviews.

Ready to unlock the magic? Let's take a deep dive right now! ๐ŸŽฏ

๐Ÿ”‘ What is Hashing?

Before we dive into the magic of Consistent Hashing, we need to start with the fundamental question: What is hashing?

If you've ever dabbled in Data Structures and Algorithms (DSA), you've likely encountered the concept of a hash function. At its core, hashing is a process that takes an input (or "key") and uses a hash function to map it to a fixed-size value, typically a number or a hash code.

This powerful technique allows us to generate unique (or nearly unique) hash values. These values are incredibly versatileโ€”they can act as a unique index in a hash table for lightning-fast data retrieval ๐Ÿš€, or, as we're about to see, identify a specific node in a distributed system using consistent hashing!

Think of it as the ultimate organizer, turning complex data into a simple, manageable address.

๐ŸŽฏ Understanding the Problem: Why We Need Consistent Hashing

Imagine you have n cache servers, and you need to balance incoming requests across them. A simple approach is to assign each request to a server using a serverIndex.

We can generate this server index using a hash function:

serverIndex=hash%n

Example: With 4 servers (n=4):
Request "user_123" โ†’ hash = 15 โ†’ 15 % 4 = 3 โ†’ Server 3 ๐ŸŽฏ
Request "order_456" โ†’ hash = 22 โ†’ 22 % 4 = 2 โ†’ Server 2 ๐ŸŽฏ

๐Ÿ’ฅ The Problem: Scale Changes Break Everything!
This works perfectly... until the number of servers changes!
What happens when:

  • โž• A server is added? (n=4 โ†’ n=5)
  • โž– A server goes down? (n=4 โ†’ n=3)

Suddenly, our formula changes:

serverIndex = hash(request_key) % 3 # Now with 3 servers

The result? ๐Ÿ“Š Massive cache misses! Most keys now map to different servers, requiring expensive data reshuffling and causing system-wide disruption.

๐Ÿ”‘ The Core Issue
The fundamental problem is that changing 'n' affects nearly ALL mappings, breaking the connection between keys and their servers.

This is exactly why we need Consistent Hashing! ๐Ÿš€ It provides a smart way to handle scale changes while minimizing disruptions.

๐ŸŽช Let's Build the MAGIC HASH RING! ๐ŸŽช:

Ready for some hashing circus tricks? ๐Ÿคนโ™‚๏ธ In consistent hashing, we work with two main characters: the Hash Ring and the Hash Slot!

๐Ÿ”ฎ Meet Our Magic Wand: The SHA-1 Hash Function!
Instead of regular hash functions, we use the mighty SHA-1! This isn't your average calculatorโ€”it creates a MASSIVE hash space from:
0 to 2ยนโถโฐ - 1

๐ŸŽฏ Understanding the Hash Space:
Xโ‚€ = The starting point (our humble 0)
Xโ‚™ = The grand finale (2ยนโถโฐ - 1)

๐Ÿ”„ The Great Ring Illusion!
Here's where the magic happens! By connecting our starting point Xโ‚€ with our ending point Xโ‚™... POOF! โœจ We create a HASH RING!

Wait, but how? ๐Ÿค” Imagine taking a straight line and bending it until the ends meet! That's exactly what we do here:
๐ŸŽ  Physical Reality: It's actually a straight line (like unrolling a ring)
๐ŸŽก Mathematical Magic: We treat it as a continuous circle!

Think of it like a cosmic hula hoop where our hash values can dance around forever! ๐Ÿ’ซ The moment a value reaches the end (Xโ‚™), it seamlessly continues from the beginning (Xโ‚€)!

So remember: While it's physically linear, in our digital wonderland, it's the most magnificent ring you'll ever encounter! ๐ŸŽ‰

๐ŸŽฏ How Do We Use This Hash Ring? Let's Play Matchmaker! ๐Ÿ’:

๐ŸŽช Step 1: Mapping Our Players onto the Ring!

Using our trusty SHA-1 hash function, we place everyone on the ring:

  • ๐ŸŽญ Servers: We map each server using its name or IP address
  • ๐ŸŽฏ Keys: We map each incoming client request (the keys)

Both servers and keys now live together in our massive SHA-1 universe (that 0 to 2ยนโถโฐ -1 space we talked about)! The hash function acts as the ultimate bouncer, deciding exactly where each player gets to stand on our cosmic ring. ๐Ÿช

๐Ÿ” Step 2: The Treasure Hunt Rule! ๐Ÿ—บ๏ธ
Here's the golden rule for key-server matching:

"Look clockwise and claim the first server you find!" ๐Ÿ”

When a key arrives, it spins around the ring clockwise โฉ and the first server it encounters becomes its designated home! That's the server it will talk to for all its data needs.

The image above shows exactly how our key goes on its clockwise treasure hunt! ๐Ÿดโ€โ˜ ๏ธ

๐ŸŽช Step 3: Handling Changes - The Dynamic Dance! ๐Ÿ’ƒ
What happens when a new server joins the party? ๐Ÿ†•
We simply add it to the ring!
Keys get redistributed automatically - some will now find this new server first in their clockwise journey!

Result: Smooth scaling with minimal disruption! ๐ŸŽ‰

What if a server leaves? ๐Ÿ‘‹
No panic! The key simply continues its clockwise search and finds the next available server.
Result: The show goes on! The system self-heals like magic! โœจ

๐Ÿ’ก But the question arises:

If the server which was just removed was the only one which had the value for the keys accessing it, when it's gone the keys accessing the new server (found in the clockwise direction) will not find their data there. So how is this being handled?

๐Ÿ›ก๏ธ This is handled with REPLICATION!
In real-world systems, data is never stored on just one server. Instead, consistent hashing is combined with replication strategies where each piece of data is stored on:

The primary server (first server found clockwise)

Plus the next N servers in the clockwise direction (replica nodes)

๐ŸŽฏ Example:
If we have replication factor N=2:
Data for key "X" is stored on Server A (primary)
AND also replicated on Server B and Server C (the next 2 servers clockwise)
When Server A fails:
Key "X" now finds Server B as its new primary
But Server B already has the data because it was a replica! ๐ŸŽ‰
No data loss occurs!

This way, even when servers come and go, your data remains safe and accessible through the replicated copies! ๐Ÿ”„

But wait there still seems to be a problem. There can be a case where there are too many keys between two server while less keys between two different server. The distribution of the key is not well managed!! This is where virtual node comes into picture.

๐ŸŽช Virtual Nodes: The Ultimate Load Balancers!

You've spotted the critical flaw in basic consistent hashing! ๐Ÿšจ

๐Ÿ’ฅ The Problem: Uneven Distribution "Hotspots"
Even with our fancy hash ring, we can end up with:
๐Ÿ”ฅ Hotspots: Some servers drowning in keys while others sit idle
๐Ÿ“Š Uneven Load: Random distribution can create massive imbalances
๐Ÿ˜ญ Inefficient Scaling: New servers might not relieve the overloaded ones

๐ŸŽฏ How Virtual Nodes Save the Day!
Virtual nodes create multiple "virtual" positions for each physical server on the hash ring:

Physical Server A โ†’ Virtual Nodes: Aโ‚, Aโ‚‚, Aโ‚ƒ, Aโ‚„
Physical Server B โ†’ Virtual Nodes: Bโ‚, Bโ‚‚, Bโ‚ƒ, Bโ‚„
Physical Server C โ†’ Virtual Nodes: Cโ‚, Cโ‚‚, Cโ‚ƒ, Cโ‚„

โœจ The Magic Results:
๐ŸŽฒ Better Distribution: More positions = more chances for even spread
โš–๏ธ Load Balancing: Heavy key ranges get distributed across multiple virtual nodes
๐Ÿ”„ Smooth Scaling: Adding/removing servers affects smaller chunks of data
๐ŸŽช No More Hotspots: Keys get distributed across all virtual nodes evenly!

๐Ÿ—๏ธ Real-World Example:
Cassandra uses virtual nodes by default - each physical node has 256 virtual nodes, creating that beautiful even distribution we dream of! ๐Ÿ’ซ

Conclusion:

And that is it this is all there is to when it comes to consistent hashing. This is one of the most important concepts when it comes to system design and I think every senior software developer should have a idea about it.

That said I will keep coming up with more intresting concepts in the future until then signing off!!

Regards,
Meeth

Top comments (0)