Running Kubernetes in the cloud provides flexibility, but for I/O and network-heavy workloads, hypervisor overhead can seriously bottleneck your performance. Transitioning to Bare Metal Kubernetes offers direct access to PCIe lanes, raw compute, and complete data sovereignty.
But there’s a catch: installing Kubernetes on general-purpose Linux distributions (like Ubuntu or Debian) requires strict CIS compliance hardening. You spend countless hours managing SSH keys, applying OS-level patches, and fighting configuration drift.
Enter Talos Linux—the modern datacenter standard for immutable Kubernetes.
🛡️ What is Talos Linux? The Immutable Paradigm
A common question among platform engineers is, "What is Talos Linux based on?" While it utilizes the Linux kernel, it is an immutable, API-driven operating system designed explicitly for Kubernetes from the ground up.
It drastically reduces the OS-level attack surface by eliminating SSH, the shell, and package managers entirely. Every interaction happens via a mutually authenticated gRPC API (talosctl).
⚙️ High Availability Architecture & The etcd Quorum
Running a single Control Plane is a lab experiment. The Kubernetes database (etcd) relies on a strict quorum (majority) to function. A production-grade cluster requires a minimum of 3 Control Plane nodes.
- The Quorum Risk: In a 3-node cluster, the quorum is 2. If one node fails, the cluster survives. If two nodes fail, the cluster is dead.
Infrastructure & The Layer 2 VIP
To expose the API securely, Talos uses a Virtual IP (VIP) backed by gratuitous ARP. The limitation: This requires all Control Plane nodes to reside in the exact same Layer 2 subnet.
Deploying this architecture on dedicated bare-metal servers provides the necessary physical Layer 2 networking capabilities without cloud routing restrictions.
- 3x Control Plane Nodes: (e.g., 10.10.10.11, .12, .13)
- 1x Private L2 VIP for API Server: (e.g., 10.10.10.100)
The Deployment Blueprint
Step 1: OS Installation via IPMI
In a true datacenter environment, bare metal provisioning relies on remote Out-of-Band (OOB) management.
- Download the Talos Linux Metal ISO from the official GitHub releases.
- Log into your server's IPMI / iKVM Console.
- Mount the ISO via Virtual Media and power cycle. The system will boot into Talos Maintenance Mode.
Step 2: Generating the HA Configuration
Generate the foundational machine configuration, binding the cluster endpoint to our Private VIP (10.10.10.100).
talosctl gen config my-ha-cluster https://10.10.10.100:6443
# Generated files: controlplane.yaml, worker.yaml, talosconfig
Step 3: Layer 2 VIP & VLAN Patching
Configure Talos to announce the Layer 2 VIP across the Control Planes for seamless failover. We also disable the default kube-proxy as we will replace it with Cilium eBPF.
Create patch-cp.yaml:
machine:
network:
interfaces:
- interface: eth1
vip:
ip: 10.10.10.100 # The L2 Shared API Endpoint
cluster:
network:
cni:
name: none # We will install Cilium manually
proxy:
disabled: true # Cilium will replace kube-proxy
Merge the patch:
talosctl machineconfig patch controlplane.yaml --patch @patch-cp.yaml -o cp-patched.yaml
Step 4: Bootstrapping the Cluster
Apply the patched configuration to all three Control Plane nodes.
talosctl apply-config --insecure --nodes 10.10.10.11 --file cp-patched.yaml
talosctl apply-config --insecure --nodes 10.10.10.12 --file cp-patched.yaml
talosctl apply-config --insecure --nodes 10.10.10.13 --file cp-patched.yaml
Bootstrap the cluster on only the first node to initiate the etcd quorum:
talosctl config endpoint 10.10.10.100
talosctl config node 10.10.10.11
talosctl bootstrap --talosconfig ./talosconfig
talosctl kubeconfig ./kubeconfig --talosconfig ./talosconfig
export KUBECONFIG=$(pwd)/kubeconfig
Step 5: Cilium CNI (Native L2 Announcements)
Modern eBPF-based CNIs like Cilium natively support L2 announcements and BGP, making legacy tools like MetalLB completely redundant.
1. Install Cilium (Replacing Kube-Proxy):
helm install cilium cilium/cilium \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set k8sServiceHost=10.10.10.100 \
--set k8sServicePort=6443 \
--set l2announcements.enabled=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup
- Define the IP Pool: Apply the IP Pool and Announcement Policy. (Replace the RFC IPs with your actual assigned Public IP block).
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: public-ip-pool
spec:
blocks:
- cidr: "198.51.100.10/29" # REPLACE WITH YOUR REAL IPs
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: default-l2-policy
spec:
interfaces:
- eth0
externalIPs: true
loadBalancerIPs: true
Wrap Up
Your bare metal cluster is now online, highly available, and networking natively via eBPF. You have successfully eliminated the hypervisor tax and OS-level attack vectors.
Top comments (0)