Achyuta Das

Posted on Jan 9

HA K8s cluster using Keepalived and HAProxy

#cluster #kubernetes #tutorial

Overview

A stacked HA cluster is a topology where the distributed data storage cluster provided by etcd is stacked on top of the cluster formed by the nodes managed by kubeadm that run control plane components.

Each control plane node runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager. The kube-apiserver is exposed to worker nodes using a load balancer.

Each control plane node creates a local etcd member and this etcd member communicates only with the kube-apiserver of this node. The same applies to the local kube-controller-manager and kube-scheduler instances.

This topology couples the control planes and etcd members on the same nodes. It is simpler to set up than a cluster with external etcd nodes, and simpler to manage for replication.

Here's what happens in a 3-node stacked cluster:

Each control plane node runs:

etcd member
kube-apiserver, scheduler, controller-manager

So, you have:

3 etcd members → quorum = 2
3 API servers → load balanced (can handle 1 down)

If one node fails: You still have:

2 etcd members → quorum maintained
2 control plane instances → still available

This is the default topology deployed by kubeadm. A local etcd member is created automatically on control plane nodes when using kubeadm init and kubeadm join --control-plane

Assumptions: You have done cluster bootstrapping using kubeadm before as this document won’t cover everything in detail.

Setting up the machines

To set up HAProxy + Keepalived for Kubernetes High Availability (HA) with 3 master nodes and a Virtual IP (VIP), follow this structured approach:

Masters: 10.238.40.162, 10.238.40.163, 10.238.40.164
VIP: 10.238.40.166

Install HAProxy + Keepalived on all 3 Masters

sudo apt update 
sudo apt install -y haproxy keepalived

HAProxy Configuration Edit /etc/haproxy/haproxy.cfg on all 3 master nodes:

global
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    option httplog
    option dontlognull

frontend kubernetes-apiserver
    bind *:8443
    mode tcp
    option tcplog
    default_backend kubernetes-apiserver

backend kubernetes-apiserver
    mode tcp
    balance roundrobin
    option tcp-check
    server master1 10.238.40.162:6443 check fall 3 rise 2
    server master2 10.238.40.163:6443 check fall 3 rise 2
    server master3 10.238.40.164:6443 check fall 3 rise 2

Keepalived Configuration Only one node at a time will "own" the VIP (managed by Keepalived), but config is present on all. Edit /etc/keepalived/keepalived.conf on each master nodes:

Note: Change the priority value for each node:

Master1: priority 110 (MASTER)
Master2: priority 100 (BACKUP)
Master3: priority 90 (BACKUP)

global_defs {
    router_id LVS_DEVEL
    script_user root
    enable_script_security
}

vrrp_script chk_haproxy {
    script "/bin/curl -f http://localhost:6443/healthz || exit 1"
    interval 2
    weight -2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface enp19s0
    virtual_router_id 51
    priority 110
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass k8s-ha-cluster
    }
    virtual_ipaddress {
        10.238.40.166/24
    }
    track_script {
        chk_haproxy
    }
}

Restart HAProxy and Keepalived

sudo systemctl restart haproxy keepalived
sudo systemctl enable haproxy keepalived

Validate the VIP appears on one node

ip addr show | grep 10.238.40.166

Check service status

sudo systemctl status haproxy
sudo systemctl status keepalived

Bootstrap the cluster

Create a kubeadm-config.yaml file on the first master node Make sure to use the VIP as the control plane endpoint, and include it in the apiServer.certSANs

Important: Change the advertiseAddress field in InitConfiguration to match each master node's IP address.

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.32.6
apiServer:
  certSANs:
    - "10.238.40.166"      # VIP
    - "127.0.0.1"           # Localhost
    - "0.0.0.0"             # Wildcard
    - "10.96.0.1"           # Kubernetes service IP
    - "10.238.40.162"
    - "10.238.40.163"
    - "10.238.40.164"
  extraArgs:
    authorization-mode: Node,RBAC
certificatesDir: /etc/kubernetes/pki
clusterName: pcai
controlPlaneEndpoint: "10.238.40.166:8443"
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
networking:
  dnsDomain: cluster.local
  podSubnet: "172.20.0.0/16"
  serviceSubnet: "172.30.0.0/16"
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "10.238.40.162"
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

Initialize the cluster

kubeadm init --upload-certs --config kubeadm-config.yaml -v=5

Note: Save the output! It contains the join commands for control plane and worker nodes.

Configure kubectl access

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install your choice of networking solutions

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

Wait for networking pods to be ready

kubectl wait --for=condition=ready pod -l k8s-app=calico-node -n kube-system --timeout=300s

Run the control plane node join command (output of the kubeadm init) on the other master nodes.

kubeadm join 10.238.40.166:8443 --token <token> \
        --discovery-token-ca-cert-hash sha256:<hash> \
        --control-plane --certificate-key <cert-key>

Note: The certificate key is only valid for 2 hours. If it expires, generate a new one:

kubeadm init phase upload-certs --upload-certs

Verification and Health Checks

After setting up all control plane nodes, verify the cluster health:

Check all nodes are ready

kubectl get nodes -o wide

Verify control plane components

kubectl get pods -n kube-system

Check etcd cluster health

kubectl exec -n kube-system etcd-<master-node-name> -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list

Test VIP failover

# Stop keepalived on the master node that owns the VIP
sudo systemctl stop keepalived

# Verify VIP moves to another node
ip addr show | grep 10.238.40.166

# Test API access via VIP
curl -k https://10.238.40.166:8443/healthz

# Restart keepalived
sudo systemctl start keepalived

Conclusion

Congratulations! You have successfully deployed a highly available Kubernetes cluster using a stacked etcd topology with HAProxy and Keepalived. This setup provides:

Key Benefits

High Availability: Automatic failover with no single point of failure
Load Distribution: Traffic distributed across all API servers via HAProxy
Automatic Recovery: Keepalived handles VIP failover in seconds
Simplified Architecture: Stacked topology reduces complexity compared to external etcd

Cluster Capabilities

With this 3-master node configuration:

Tolerates 1 node failure while maintaining full cluster functionality
Maintains etcd quorum with 2 out of 3 members
Continues serving API requests through the remaining healthy masters
Automatically fails over VIP to operational nodes

DEV Community