Table Of Contents
- What Is Hashing?
- What Is Consistent Hashing?
- What Problem Does Consistent Hashing Solve?
- How Consistent Hashing Work?
- Where is Consistent Hashing Used?
- Implementing Consistent Hashing in Golang
- Conclusion
Ever wondered how big tech websites like Netflix, Twitch, Facebook, etc. handle huge amounts of traffic and data without slowing down or crashing? The technique that makes this possible is consistent hashing. As a developer, understanding consistent hashing is a useful skill in your toolbox.
In this tutorial, we'll give you an in-depth overview of consistent hashing and walk you through implementing it in Golang.
What Is Hashing?
Hashing is a technique used to map data of an arbitrary size to data of a fixed size. In simple terms, it maps keys to values. The values are called hash codes or hash values.
A hash function is used to generate the hash code. It takes the key and returns the hash value.
A good hash function has the following properties:
Uniform distribution: It should map keys to hash values uniformly. Each hash value should have an equal chance of being the output of the hash function.
Deterministic: The same key should always map to the same hash value.
Efficiently computable: It should be fast to compute the hash value for any key.
Hashing is used for several purposes, like:
Hash tables: To map keys to values for faster lookups.
Cryptography: For message authentication and digital signatures.
Caching: To map keys to cached data.
What Is Consistent Hashing?
This is a special kind of hashing technique. It maps keys to hash values in a way that the mapping between keys and hash values remains consistent even when the number of hash values changes.
What Problem Does Consistent Hashing Solve?
Consistent hashing solves the problem of caching by randomly mapping data to physical nodes in the cluster. With a regular hash function, adding or removing one node changes the mapping of nearly every key.
Consistent hashing minimizes the number of keys that need to be remapped when a node is added or removed. It is useful in distributed caching and database systems where the number of nodes changes Frequently.
It also solves the problems caused by hash collisions. Hash collisions occur when two keys map to the same hash value. This leads to extra work to handle the collision.
How Consistent Hashing Work?
Hashing is a way of mapping data of arbitrary size to fixed-size values. A hash function takes input data and converts it into a hash value. The hash table stores the data using the hash value as the key.
Distributed hashing is an extension of hashing used in distributed data stores. It allows data to be distributed across multiple nodes.
In consistent hashing, the hash values are arranged in a ring. Each key is mapped to one of the hash values on the ring.
When a hash value is added or removed, only the keys mapped to that hash value are remapped. The other keys remain mapped to the same hash value.
The key's hash value is used to determine which node the data is stored on. As nodes add or remove from the cluster, the mapping of keys to nodes changes. This can require massive data movement.
Each node in the cluster receives one or more points on the ring. The system maps the key to a point on the ring and stores it on the node that contains that point.
When a new node is added, it takes over some points on the ring. Only the keys that map to those points move to the new node. The keys mapped to other points stay where they are.
Similarly, when a node is removed, its points on the ring are taken over by the remaining nodes. But keys mapped to other points are unaffected.
Where is Consistent Hashing Used?
Many real-world applications use consistent hashing to distribute data across a cluster of servers. Some major use cases include:
Distributed caching
Consistent hashing is a popular technique for distributed caching systems like Memcached and Dynamo. In these systems, the caches are distributed across many servers. When a cache miss occurs, consistent hashing is used to determine which server contains the required data. This allows the overall cache to scale to handle more requests.
Distributed storage
Distributed storage systems like Cassandra, DynamoDB, and Voldemort also use consistent hashing. In these systems, data is partitioned across many servers. Consistent hashing is used to map data to the servers that store the data. When new servers are added or removed, consistent hashing minimizes the amount of data that needs to be remapped to different servers.
Load Balancing
Many people commonly use consistent hashing for load balancing in distributed systems. It allows you to add or remove nodes from the cluster without affecting too many keys. With a traditional hash function, adding or removing a node changes the mapping of nearly every key. Consistent hashing minimizes the number of keys that need to be remapped.
Peer-to-Peer Networks
Some peer-to-peer networks use consistent hashing to map nodes to the key space. This allows nodes to join and leave the network with minimal disruption. New nodes can take over responsibility for keys that were previously assigned to nodes that left.
DNS
The domain name system (DNS) uses consistent hashing to map domain names to DNS servers. This fault tolerance is provided by reassigning the keys of a DNS server to other servers with minimal changes if it goes down.
Implementing Consistent Hashing in Golang
To implement consistent hashing in Golang, you'll need to have the following installed
- A code editor, VS Code Preferably
- Golang
-
hashicorp/memberlist:
go get -u github.com/hashicorp/memberlist
-
grpc-go:
go get -u google.golang.org/grpc
-
protobuf:
go get -u google.golang.org/protobuf/proto
Step 1: Define the Protocol Buffers (proto) file
Before we begin let's define the following terms:
Protocol Buffers (protobuf):
Protocol Buffers, or protobuf, is a data serialization format developed by Google. It offers a language-agnostic way to define data structures, enabling efficient serialization and deserialization across different programming languages.
gRPC:
gRPC, which stands for gRPC Remote Procedure Calls, is an open-source RPC framework, also developed by Google. It uses protobuf as its interface definition language, describing the structure of data exchanged between servers and clients.
Now, create a file named consistency.proto
and add the following code:
syntax = "proto3";
package consistency;
service Node {
rpc GetNodeForRequest (NodeRequest) returns (NodeResponse);
rpc AddNodeForRequest (NodeRequest) returns (Empty);
rpc RemoveNodeForRequest (NodeRequest) returns (Empty);
}
message NodeRequest {
string key = 1;
string node = 2;
string ip = 3;
}
message NodeResponse {
string node = 1;
}
message Empty {}
Run the following command to generate Go files from the proto file:
protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative consistency.proto
This command generates consistency.pb.go
and consistency_grpc.pb.go
.
Step 2: Implement Consistent Hashing and gRPC Service
Let's break down this step into three parts for better understanding.
Part 1: Initialization and Type Definitions
In this part, we initialize the HashRing
struct, which represents the consistent hash ring. The struct includes fields for maintaining nodes, their corresponding IP addresses, keys, and a memberlist for handling membership in the distributed system.
package main
//import these packages
import (
"context"
"fmt"
"hash/crc32"
"log"
"net"
"sort"
"sync"
"github.com/hashicorp/memberlist"
"google.golang.org/grpc"
)
// HashRing represents the consistent hash ring
type HashRing struct {
sync.RWMutex
Nodes []string
nodeToIP map[string]string
nodeToKey map[string]uint32
mlist *memberlist.Memberlist
}
// NodeService represents the gRPC service for nodes
type NodeService struct {
HashRing *HashRing
}
-
HashRing
struct: It includes a mutex for thread safety, a slice to keep track of nodes, maps for storing node-to-IP and node-to-key mappings, and a memberlist for handling membership. -
NodeService
struct: It represents the gRPC service and holds a reference to theHashRing
struct.
Part 2: The Functions
Here, we define the functions that perform actions on the hash ring, such as adding nodes, removing nodes, and getting the node responsible for a given key.
// AddNode adds a new node to the hash ring
func (hr *HashRing) AddNode(node string, ip string) {
hr.Lock()
defer hr.Unlock()
if _, ok := hr.nodeToKey[node]; ok {
return // Node already exists
}
key := crc32.ChecksumIEEE([]byte(node))
hr.nodeToIP[node] = ip
hr.nodeToKey[node] = key
hr.Nodes = append(hr.Nodes, node)
sort.Strings(hr.Nodes)
// Update memberlist
_, err := hr.mlist.Join([]string{ip})
if err != nil {
log.Fatalf("Failed to join memberlist: %v", err)
}
}
// RemoveNode removes a node from the hash ring
func (hr *HashRing) RemoveNode(node string) {
hr.Lock()
defer hr.Unlock()
if _, ok := hr.nodeToKey[node]; !ok {
return // Node does not exist
}
delete(hr.nodeToIP, node)
delete(hr.nodeToKey, node)
// Remove the node from the slice
for i := len(hr.Nodes) - 1; i >= 0; i-- {
if hr.Nodes[i] == node {
hr.Nodes = append(hr.Nodes[:i], hr.Nodes[i+1:]...)
}
}
// Update memberlist
_, err := hr.mlist.Leave(10 * memberlist.NodeMetaPreload)
if err != nil {
log.Fatalf("Failed to leave memberlist: %v", err)
}
}
// GetNode returns the node responsible for the given key
func (hr *HashRing) GetNode(key string) string {
hr.RLock()
defer hr.RUnlock()
hash := crc32.ChecksumIEEE([]byte(key))
index := sort.Search(len(hr.Nodes), func(i int) bool {
return hr.nodeToKey[hr.Nodes[i]] > hash
})
if index == len(hr.Nodes) {
index = 0
}
return hr.Nodes[index]
}
// GetNodeForRequest returns the node responsible for the given gRPC request
func (ns *NodeService) GetNodeForRequest(ctx context.Context, req *NodeRequest) (*NodeResponse, error) {
node := ns.HashRing.GetNode(req.Key)
return &NodeResponse{Node: node}, nil
}
// AddNodeForRequest adds a new node for the given gRPC request
func (ns *NodeService) AddNodeForRequest(ctx context.Context, req *NodeRequest) (*Empty, error) {
ns.HashRing.AddNode(req.Node, req.Ip)
return &Empty{}, nil
}
// RemoveNodeForRequest removes a node for the given gRPC request
func (ns *NodeService) RemoveNodeForRequest(ctx context.Context, req *NodeRequest) (*Empty, error) {
ns.HashRing.RemoveNode(req.Node)
return &Empty{}, nil
}
-
AddNode
,RemoveNode
,GetNode
: These functions handle modifications to the hash ring. -
GetNodeForRequest
,AddNodeForRequest
,RemoveNodeForRequest
: These functions are gRPC service methods for corresponding client requests.
Part 3: The Main Function
This part includes the main function where we set up the memberlist, create the gRPC server, register the service, and start the server.
func main() {
// Example usage
hashRing := &HashRing{
nodeToIP: make(map[string]string),
nodeToKey: make(map[string]uint32),
}
// Create and start memberlist
config := memberlist.DefaultLocalConfig()
config.Name = "localhost" // Set a unique name for the node
list, err := memberlist.Create(config)
if err != nil {
log.Fatalf("Failed to create memberlist: %v", err)
}
hashRing.mlist = list
// Create and register gRPC server
server := grpc.NewServer()
nodeService := &NodeService{HashRing: hashRing}
RegisterNodeServer(server, nodeService)
// Start gRPC server
listener, err := net.Listen("tcp", ":50051")
if err != nil {
log.Fatalf("Failed to listen: %v", err)
}
defer listener.Close()
log.Println("Server listening on :50051")
if err := server.Serve(listener); err != nil {
log.Fatalf("Failed to serve: %v", err)
}
}
- Initialization: We create a
HashRing
instance and configure a memberlist. - Server Setup: We create a gRPC server, register the
NodeService
, and start listening for connections on port 50051.
Create a file named consistency.go
, and add the code snippets of the 3 parts into the file for the complete implementation.
Step 3: Build and Run
Run the following command to build and run the gRPC server:
go run consistency.go
This will start the gRPC server on localhost:50051
.
Step 4: Test with a gRPC Client
Create a file named client.go
with the following code to test the gRPC server:
In this code, we will add and remove two keys ("key1" and "key2") with corresponding nodes ("Node-A" and "Node-B").
package main
import (
"context"
"fmt"
"log"
"google.golang.org/grpc"
)
func main() {
// Connect to the gRPC server
conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure())
if err != nil {
log.Fatalf("Failed to connect: %v", err)
}
defer conn.Close()
client := NewNodeClient(conn)
// Add nodes
client.AddNodeForRequest(context.Background(), &NodeRequest{Node: "Node-A", Ip: "127.0.0.1"})
client.AddNodeForRequest(context.Background(), &NodeRequest{Node: "Node-B", Ip: "127.0.0.1"})
// Get node for key1
key1 := "key1"
response1, err := client.GetNodeForRequest(context.Background(), &NodeRequest{Key: key1})
if err != nil {
log.Fatalf("Error getting node for key1: %v", err)
}
fmt.Printf("Key '%s' belongs to Node: %s\n", key1, response1.Node)
// Remove Node-A
client.RemoveNodeForRequest(context.Background(), &NodeRequest{Node: "Node-A"})
// Get node for key2
key2 := "key2"
response2, err := client.GetNodeForRequest(context.Background(), &NodeRequest{Key: key2})
if err != nil {
log.Fatalf("Error getting node for key2: %v", err)
}
fmt.Printf("After removing Node-A, key '%s' belongs to Node: %s\n", key2, response2.Node)
}
After running this code, your output should look like this;
Key 'key1' belongs to Node: Node-A
After removing Node-A, key 'key2' belongs to Node: Node-B
This output indicates the node assignment for each key before and after removing a node. The specific output will depend on the keys and nodes you choose during the execution.
Conclusion
Now you understand the basics of consistent hashing and have built a simple implementation of a consistent hash ring in Golang. Consistent hashing is a clever solution for mapping keys to nodes in a distributed system.
You've seen how it handles node additions and removals gracefully without disrupting the entire mapping. While this was just a basic example, the concepts you learned apply to more complex, real-world systems.
Top comments (0)