Distributed Systems: Implementing the Raft Consensus Protocol from Scratch

#computerscience #javascript #typescript #distributedsystems

This is an excerpt. The full article includes a live interactive Raft consensus cluster simulator — control a 5-node distributed cluster, trigger leader elections, partition nodes to simulate network splits, commit client state updates, and watch the term-clock and replication logs resolve conflicts in real time. Read the full interactive version →

The Distributed State Problem

How do you get multiple independent computer nodes to agree on a single sequence of events, especially when individual nodes can crash, experience network delays, or drop packets?

This is the core challenge of Distributed Consensus. In a distributed database, all replicas must execute incoming commands in identical order to maintain a consistent state.

For years, the industry standard was Paxos — an algorithm famously powerful but incredibly complex to understand and implement without error.

Raft was designed as an alternative. It breaks down the consensus problem into three modular sub-problems: Leader Election, Log Replication, and Safety Invariants.

The Three Raft Node States

At any given moment, every node in a Raft cluster operates in one of three distinct roles:

        ┌───────────────────────────────────────┐
        │                FOLLOWER               │
        └───────────────────────────────────────┘
          │ (Times out, starts election)   ▲
          ▼                                │ (Discovers current leader)
        ┌───────────────────────────────────────┐
        │               CANDIDATE               │
        └───────────────────────────────────────┘
          │ (Receives majority votes)      
          ▼                                
        ┌───────────────────────────────────────┐
        │                 LEADER                │
        └───────────────────────────────────────┘

Follower: Completely passive. Responds to incoming RPC calls from Candidates and Leaders. If a follower receives no communication for a randomized timeout period, it assumes the leader has crashed and transitions to a Candidate.
Candidate: Increments the cluster "term", votes for itself, and broadcasts requests for votes to all other nodes.
Leader: Manages all client writes. Coordinates log replication and broadcasts periodic "heartbeat" RPCs to assert authority and reset follower timeout clocks.

Sub-Problem 1: Leader Election

To prevent split votes, followers wait for a randomized election timeout (typically between 150ms and 300ms) before initiating an election.

Once a node's timeout expires, it:

Increments its local currentTerm counter.
Transitions to the Candidate state.
Votes for itself.
Sends a RequestVote RPC to all other nodes.

If a candidate receives votes from a majority of nodes in the cluster (e.g., 3 out of 5), it immediately transitions to Leader and starts broadcasting heartbeats.

Sub-Problem 2: Log Replication

Once a Leader is elected, it acts as the single gateway for all client write requests.

Client ─── Write "X = 5" ───> Leader
                                │
          ┌─────────────────────┴─────────────────────┐
          │ (Append entry, broadcast AppendEntries)   │
          ▼                                           ▼
      Follower A                                  Follower B

The Replication Steps:

The client sends a command (e.g., set x=5) to the Leader.
The Leader appends the command to its local log.
The Leader broadcasts an AppendEntries RPC containing the new log entry to all followers.
Followers verify the entry and append it to their logs, sending a success confirmation back.
Once a majority of followers acknowledge the write, the Leader commits the entry to its state machine and returns a success response to the client.
In subsequent heartbeats, the Leader notifies followers of the newly committed entry, prompting them to commit it to their local state machines.

TypeScript Raft Node Implementation

Here is the baseline state container and interface schema for a Raft consensus node in TypeScript:

type NodeRole = "Follower" | "Candidate" | "Leader";

interface LogEntry {
  term: number;
  command: string;
}

interface RequestVoteArgs {
  term: number;
  candidateId: string;
  lastLogIndex: number;
  lastLogTerm: number;
}

interface RequestVoteReply {
  term: number;
  voteGranted: boolean;
}

class RaftNode {
  public id: string;
  public currentTerm = 0;
  public role: NodeRole = "Follower";
  public votedFor: string | null = null;
  public log: LogEntry[] = [];

  private commitIndex = 0;
  private electionTimeout: NodeJS.Timeout | null = null;
  private peers: string[] = [];

  constructor(id: string, peers: string[]) {
    this.id = id;
    this.peers = peers;
    this.resetElectionTimeout();
  }

  private resetElectionTimeout() {
    if (this.electionTimeout) clearTimeout(this.electionTimeout);

    // Randomized timeout between 150ms and 300ms prevents split votes
    const timeout = 150 + Math.floor(Math.random() * 150);
    this.electionTimeout = setTimeout(() => this.startElection(), timeout);
  }

  // Handle incoming vote request from Candidate node
  public handleRequestVote(args: RequestVoteArgs): RequestVoteReply {
    this.resetElectionTimeout();

    // 1. Term check: reject candidates with stale terms
    if (args.term < this.currentTerm) {
      return { term: this.currentTerm, voteGranted: false };
    }

    if (args.term > this.currentTerm) {
      this.currentTerm = args.term;
      this.role = "Follower";
      this.votedFor = null;
    }

    // 2. Log completeness check: candidate log must be at least as up-to-date as ours
    const lastIndex = this.log.length - 1;
    const lastTerm = lastIndex >= 0 ? this.log[lastIndex].term : 0;
    const logIsUpToDate =
      args.lastLogTerm > lastTerm ||
      (args.lastLogTerm === lastTerm && args.lastLogIndex >= lastIndex);

    // 3. Vote check
    const canVote = this.votedFor === null || this.votedFor === args.candidateId;

    if (canVote && logIsUpToDate) {
      this.votedFor = args.candidateId;
      return { term: this.currentTerm, voteGranted: true };
    }

    return { term: this.currentTerm, voteGranted: false };
  }

  private startElection() {
    this.role = "Candidate";
    this.currentTerm++;
    this.votedFor = this.id;
    this.resetElectionTimeout();

    console.log(`Node ${this.id} starting election for Term ${this.currentTerm}`);

    let votesReceived = 1; // Vote for self
    const lastLogIndex = this.log.length - 1;
    const lastLogTerm = lastLogIndex >= 0 ? this.log[lastLogIndex].term : 0;

    for (const peerId of this.peers) {
      // Simulate dispatching network RPC calls to peer cluster
      this.sendRequestVoteRPC(peerId, {
        term: this.currentTerm,
        candidateId: this.id,
        lastLogIndex,
        lastLogTerm
      }).then((reply) => {
        if (this.role !== "Candidate") return;

        if (reply.voteGranted) {
          votesReceived++;
          if (votesReceived > (this.peers.length + 1) / 2) {
            this.becomeLeader();
          }
        }
      });
    }
  }

  private becomeLeader() {
    this.role = "Leader";
    if (this.electionTimeout) clearTimeout(this.electionTimeout);
    console.log(`Node ${this.id} elected Leader for Term ${this.currentTerm}!`);
    this.startHeartbeats();
  }

  private startHeartbeats() {
    // Send AppendEntries heartbeats at 50ms intervals to maintain authority
    setInterval(() => {
      if (this.role !== "Leader") return;
      for (const peerId of this.peers) {
        this.sendAppendEntriesRPC(peerId);
      }
    }, 50);
  }

  // Mock network transporters for simulation environment
  private async sendRequestVoteRPC(peerId: string, args: RequestVoteArgs): Promise<RequestVoteReply> {
    return { term: this.currentTerm, voteGranted: Math.random() > 0.3 };
  }

  private async sendAppendEntriesRPC(peerId: string) {
    // Heartbeat logic
  }
}

Split-Brain and Minority Partitions

What happens when a network partition cuts a 5-node cluster into two segments: a majority partition (3 nodes) and a minority partition (2 nodes)?

  MAJORITY PARTITION (Can commit)         MINORITY PARTITION (Cannot commit)
   [Node A] ─── [Node B (Leader 1)]         [Node D] ─── [Node E (Leader 2)]
      │                                        │
   [Node C]                                    X (Network Partition cut)

The minority partition might elect a new leader because they can no longer hear from the original leader. However, this minority leader cannot commit any new client writes because they can never achieve the required cluster majority (3 nodes).

Once the partition heals:

The minority leader receives a heartbeat containing a higher term clock from the majority leader.
The minority leader immediately steps down back to a Follower.
The minority nodes discard their uncommitted local log differences and synchronize their states with the majority leader's canonical log.

Raft guarantees that the cluster always converges safely back to a single source of truth.

Engineering Takeaways

Raft guarantees Safety and Liveness in distributed environments with randomized timeouts.
Commitment requires absolute majority consensus — protecting clusters from split-brain state divergence.
Term clocks act as global logical time — higher term numbers override all stale cluster leaders.

The full article features a live 5-node Raft consensus cluster simulator — trigger network cuts to form minority and majority partitions, watch node roles transition, commit client writes, and view live log conflict resolution directly in your browser.

Read the full interactive article →

Written by Ebenezer Akinseinde — Software Developer & AI Automations Engineer.

Portfolio · GitHub