Overview
A stacked HA cluster is a topology where the distributed data storage cluster provided by etcd is stacked on top of the cluster formed by the nodes managed by kubeadm that run control plane components.
Each control plane node runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager. The kube-apiserver is exposed to worker nodes using a load balancer.
Each control plane node creates a local etcd member and this etcd member communicates only with the kube-apiserver of this node. The same applies to the local kube-controller-manager and kube-scheduler instances.
This topology couples the control planes and etcd members on the same nodes. It is simpler to set up than a cluster with external etcd nodes, and simpler to manage for replication.
Here's what happens in a 3-node stacked cluster:
Each control plane node runs:
- etcd member
- kube-apiserver, scheduler, controller-manager
So, you have:
- 3 etcd members → quorum = 2
- 3 API servers → load balanced (can handle 1 down)
If one node fails: You still have:
- 2 etcd members → quorum maintained
- 2 control plane instances → still available
This is the default topology deployed by kubeadm. A local etcd member is created automatically on control plane nodes when using kubeadm init and kubeadm join --control-plane
Assumptions: You have done cluster bootstrapping using kubeadm before as this document won’t cover everything in detail.
Setting up the machines
To set up HAProxy + Keepalived for Kubernetes High Availability (HA) with 3 master nodes and a Virtual IP (VIP), follow this structured approach:
Masters: 10.238.40.162, 10.238.40.163, 10.238.40.164
VIP: 10.238.40.166
- Install HAProxy + Keepalived on all 3 Masters
sudo apt update
sudo apt install -y haproxy keepalived
- HAProxy Configuration
Edit
/etc/haproxy/haproxy.cfgon all 3 master nodes:
global
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
option httplog
option dontlognull
frontend kubernetes-apiserver
bind *:8443
mode tcp
option tcplog
default_backend kubernetes-apiserver
backend kubernetes-apiserver
mode tcp
balance roundrobin
option tcp-check
server master1 10.238.40.162:6443 check fall 3 rise 2
server master2 10.238.40.163:6443 check fall 3 rise 2
server master3 10.238.40.164:6443 check fall 3 rise 2
- Keepalived Configuration
Only one node at a time will "own" the VIP (managed by Keepalived), but config is present on all.
Edit
/etc/keepalived/keepalived.confon each master nodes:
Note: Change the priority value for each node:
- Master1: priority 110 (MASTER)
- Master2: priority 100 (BACKUP)
- Master3: priority 90 (BACKUP)
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/bin/curl -f http://localhost:6443/healthz || exit 1"
interval 2
weight -2
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface enp19s0
virtual_router_id 51
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass k8s-ha-cluster
}
virtual_ipaddress {
10.238.40.166/24
}
track_script {
chk_haproxy
}
}
- Restart HAProxy and Keepalived
sudo systemctl restart haproxy keepalived
sudo systemctl enable haproxy keepalived
- Validate the VIP appears on one node
ip addr show | grep 10.238.40.166
- Check service status
sudo systemctl status haproxy
sudo systemctl status keepalived
Bootstrap the cluster
- Create a
kubeadm-config.yamlfile on the first master node Make sure to use the VIP as the control plane endpoint, and include it in theapiServer.certSANs
Important: Change the advertiseAddress field in InitConfiguration to match each master node's IP address.
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.32.6
apiServer:
certSANs:
- "10.238.40.166" # VIP
- "127.0.0.1" # Localhost
- "0.0.0.0" # Wildcard
- "10.96.0.1" # Kubernetes service IP
- "10.238.40.162"
- "10.238.40.163"
- "10.238.40.164"
extraArgs:
authorization-mode: Node,RBAC
certificatesDir: /etc/kubernetes/pki
clusterName: pcai
controlPlaneEndpoint: "10.238.40.166:8443"
controllerManager:
extraArgs:
bind-address: 0.0.0.0
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
networking:
dnsDomain: cluster.local
podSubnet: "172.20.0.0/16"
serviceSubnet: "172.30.0.0/16"
scheduler:
extraArgs:
bind-address: 0.0.0.0
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: "10.238.40.162"
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
- Initialize the cluster
kubeadm init --upload-certs --config kubeadm-config.yaml -v=5
Note: Save the output! It contains the join commands for control plane and worker nodes.
- Configure kubectl access
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Install your choice of networking solutions
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml
- Wait for networking pods to be ready
kubectl wait --for=condition=ready pod -l k8s-app=calico-node -n kube-system --timeout=300s
- Run the control plane node join command (output of the kubeadm init) on the other master nodes.
kubeadm join 10.238.40.166:8443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane --certificate-key <cert-key>
Note: The certificate key is only valid for 2 hours. If it expires, generate a new one:
kubeadm init phase upload-certs --upload-certs
Verification and Health Checks
After setting up all control plane nodes, verify the cluster health:
- Check all nodes are ready
kubectl get nodes -o wide
- Verify control plane components
kubectl get pods -n kube-system
- Check etcd cluster health
kubectl exec -n kube-system etcd-<master-node-name> -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
- Test VIP failover
# Stop keepalived on the master node that owns the VIP
sudo systemctl stop keepalived
# Verify VIP moves to another node
ip addr show | grep 10.238.40.166
# Test API access via VIP
curl -k https://10.238.40.166:8443/healthz
# Restart keepalived
sudo systemctl start keepalived
Conclusion
Congratulations! You have successfully deployed a highly available Kubernetes cluster using a stacked etcd topology with HAProxy and Keepalived. This setup provides:
Key Benefits
- High Availability: Automatic failover with no single point of failure
- Load Distribution: Traffic distributed across all API servers via HAProxy
- Automatic Recovery: Keepalived handles VIP failover in seconds
- Simplified Architecture: Stacked topology reduces complexity compared to external etcd
Cluster Capabilities
With this 3-master node configuration:
- Tolerates 1 node failure while maintaining full cluster functionality
- Maintains etcd quorum with 2 out of 3 members
- Continues serving API requests through the remaining healthy masters
- Automatically fails over VIP to operational nodes


Top comments (0)