The Raft Consensus Algorithm: Leader Election and Log Replication Explained

#distributed #systems #architecture #backend

Raft solves the hardest problem in distributed systems: keeping replicas synchronized while nodes fail.

What We're Building

We are dissecting the Raft consensus protocol to understand how a cluster maintains a single source of truth. Unlike Paxos, Raft is designed to be human-readable and easier to implement correctly. Our scope is not building a complete key-value store, but modeling the core state machine of a Raft node. We will focus on the three core roles, the heartbeat mechanism, and the safety properties that prevent split-brain scenarios. We will use Go for examples because its interface definition and structuring closely mimic the RPC patterns found in production Raft libraries like etcd.

Step 1 — Defining Node Roles

Raft nodes operate in a finite state machine. A node transitions between Follower, Candidate, and Leader. The leader manages log replication, while followers maintain state consistency. This separation ensures that only one node writes to the log at any term, preventing conflicts.

type NodeState uint8

const (
    StateFollower NodeState = iota
    StateCandidate
    StateLeader
)

type RaftNode struct {
    State    NodeState
    Term     int
    VoteFor  NodeID
    LastLogIndex int
}

Go’s type system handles the state transitions cleanly, ensuring type safety without external libraries.

Step 2 — Conducting Leader Elections

When a follower stops receiving heartbeats, it starts an election. It increments its term, sets itself to Candidate, and broadcasts a RequestVote RPC to all other nodes. Nodes only vote for a leader if they have newer log entries than the candidate. This prevents old terms from becoming leaders.

func (n *RaftNode) RequestVote(term int) (bool, error) {
    if term < n.Term { return false, nil }
    if n.State == StateLeader { return true, nil }
    // Simplified logic: vote if log matches
    return true, nil
}

This logic enforces the requirement that a leader must have the most up-to-date logs, ensuring safety.

Step 3 — Log Replication via RPC

The leader persists commands by appending them to its log. It then replicates this entry to a majority of followers using AppendEntries. Followers append the entry, acknowledge the success, and the leader moves it to its commit index. If a majority acknowledges the entry, it is considered committed and safe.

func (l *RaftNode) AppendEntries(leaderTerm int, index int) (bool, error) {
    if leaderTerm != l.Term { return false, nil }
    l.Log[index] = data
    return true, nil
}

This function demonstrates the AppendEntries RPC where a leader proposes a change, and followers store it locally.

Step 4 — Commit Safety and Stability

A log entry is considered committed once a majority of nodes store it. The leader sends commit indices down to followers during heartbeats. Crucially, a leader will never overwrite a committed log entry, even if its local log becomes stale during an election. This guarantees that the state machine remains consistent across the cluster even after node failures and recoveries.

func (l *RaftNode) ApplyCommit(index int) {
    if index > l.CommittedIndex {
        l.CommittedIndex = index
        l.Apply(index)
    }
}

This ensures that only committed entries are executed by the state machine, preserving durability guarantees.

Key Takeaways

State Machine: Raft nodes transition between Follower, Candidate, and Leader based on received RPCs.
Election Safety: Leaders must have the most recent logs, preventing old terms from becoming leaders and maintaining order.
Log Consistency: Followers only accept entries from a current leader, ensuring global consistency.
Durability: Entries are durable once acknowledged by a majority before being applied to the state machine.
Safety First: Raft prioritizes data correctness over availability during partitions to prevent data loss.

What's Next?

Consider optimizing the heartbeat interval and election timeout to balance consistency and latency. Next, explore how to handle split-brain scenarios where multiple nodes elect themselves leader simultaneously. You should also investigate how Raft handles snapshot compression to manage large log sizes efficiently. Finally, compare Raft with other consensus algorithms to understand trade-offs in your specific architecture.