Imagine that you launch a new product and it attracts a huge volume of traffic ($1M order per second). Your product was designed with a single server and the server couldn't keep up with the volume, causing you to lose $1M order which you could've earned.
To support a serge of a demand, what kind of mechanism could you implement in the server-side to not only distribute the traffic evenly, buy also never miss incoming orders? You also want to ensure that other working servers are protected from being overloaded? This is where consistent hashing algorithm comes into play.
Traditionally, a common way to distribute traffic could be done by following a simple formula:
serverIndex = hash(key) % N, where N is the number of servers
But, when a server is added or removed, serverIndex changes and that can possibly lead to redistribution of all hash keys. Think about caching servers as an example (the backend system would result in many cache misses because N value is different)!
One of solutions you can consider is a consistent hashing algorithm.
Consistent hashing is a hash ring formed by connecting the head and tail of a hash space with k number of virtual nodes added between each server. It ensures (uniform) minimal number of redistribution of hash keys or data, thereby prevents overloading a server (aka, "hotspot" key problem)
On the hashing ring, servers are mapped based on server IP or name. k number of virtual nodes are placed for each server. When a server goes down or new servers are consistently added to the system, it impacts hash spaces and that can easily be detected by going the hash ring anti-clockwise (from the server impacted to the server prior to the impacted server). Once broken spots are identified, only keys in that impacted space need to be remapped (not all!).
In summary, employing consistent hashing in balancing the server load has 3 benefits:
- reduces the number of keys to be distributed when a server's added or removed
- makes it easy to scale horizontally as data are more evenly distributed
- mitigates "hotspot" problem
Reference
System Design Interview – An insider's guide by Alex Xu
Top comments (0)