DEV Community

Cover image for Why must a Kubernetes cluster have an odd number of nodes
Farshad Nickfetrat
Farshad Nickfetrat

Posted on

Why must a Kubernetes cluster have an odd number of nodes

If you’ve spent any time setting up or managing Kubernetes, you might have come across the recommendation that clusters should have an odd number of nodes. But why is that? Let's break it down in simple terms.
It's All About Leader Election

Kubernetes relies on ETCD and ETCD uses RAFT Algorithm that is a consensus algorithm (Paxos)
What is RAFT consensus

RAFT is a consensus algorithm used to ensure multiple computers (or nodes) agree on shared data, even if some nodes fail. It's designed to be easier to understand than other algorithms like Paxos.

RAFT ensures that distributed systems like etcd (used by Kubernetes) can agree on a single leader and maintain consistency, even when some nodes fail. Raft is designed to be understandable.

Imagine a group of people trying to agree on a decision (like which movie to watch). RAFT works by choosing one person as the leader, who suggests a movie (or decision). The others (followers) can agree with the leader or ask for changes. If the leader goes away (fails), the group elects a new leader. As long as a majority agree, the group can keep making decisions, even if some people (nodes) aren't available.

Image description

Take a look at these examples :

  1. 4-Node System

Total Nodes: 4Quorum Required: 3Allowed Failed Nodes: 1

Quorum Required: To maintain consensus in this 4-node system, a majority (3 nodes) must be operational.
Allowed Failed Nodes: This system can tolerate the failure of only 1 node. If 2 nodes go down, the system loses quorum and cannot make decisions.
Scenario: If 3 nodes are up and 1 is down, the system can still function. If 2 nodes go down, the system cannot process any new transactions until at least 3 nodes are up again.

  1. 9-Node System

Total Nodes: 9Quorum Required: 5Allowed Failed Nodes: 4

Quorum Required: In this 9-node system, at least 5 nodes (more than half) must be up and running to reach a consensus.

Allowed Failed Nodes: This system can tolerate up to 4 node failures while still maintaining quorum.
Scenario: If 4 nodes fail, the remaining 5 nodes can continue to operate and maintain consensus. However, if a 5th node fails, the system loses quorum and can no longer process updates or transactions.

  1. 10-Node System

Total Nodes: 10Quorum Required: 6Allowed Failed Nodes: 4

Quorum Required: In a 10-node system, at least 6 nodes must be operational to maintain consensus.
Allowed Failed Nodes: This system can tolerate the failure of up to 4 nodes. If 5 nodes go down, the system loses quorum.

Scenario: If 6 nodes are operational, the system can process transactions and make decisions. However, if 5 nodes are down, the system becomes inoperable because there are not enough nodes to reach quorum.

  1. 11-Node System

Total Nodes: 11Quorum Required: 6Allowed Failed Nodes: 5
Explanation:

Quorum Required: For an 11-node system, a minimum of 6 nodes need to be up to form a quorum.
Allowed Failed Nodes: This system can tolerate up to 5 node failures. If 6 nodes fail, quorum is lost.
Scenario: With 6 nodes operational, the system can still reach consensus and process transactions. However, if the 6th node fails, the system is effectively halted since it no longer has a quorum to make decisions.
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

Conclusion

By increasing the number of control plane nodes, you raise the failure tolerance of the cluster. However, it's crucial to have an odd number of nodes to simplify quorum (majority) calculations and avoid split-brain scenarios. This ensures the cluster can make decisions efficiently and remain stable even during failures.

About Author :
Hi 👋, I’m Farshad Nick (Farshad nickfetrat)

📝 I regularly write articles on packops.dev and packops.ir
💬 Ask me about Devops , Cloud , Kubernetes , Linux
📫 How to reach me on my linkedin
Here is my Github repo
Enter fullscreen mode Exit fullscreen mode

Top comments (0)