What You'll Learn
- How Linux systemd supervises the kubelet process
- The role of static pods in bootstrapping the control plane
- How the controller manager implements reconciliation loops
- The complete 4-layer supervision model
Previously
In Part 1, we investigated why a Grafana ingress disappeared after GKE node upgrades. The fix was straightforward: use Helm-managed resources instead of manual kubectl apply.
But that raised a deeper question: How do controllers themselves survive pod evictions?
The answer is a hierarchical supervision model—each layer watches the layer above it, ensuring continuous operation despite failures.
The Four Layers of Kubernetes Supervision
In this post, we'll explore Layers 1-3. Part 3 covers Layer 4 and the complete resilience model.
Layer 1: The Linux Foundation
systemd — The Root Supervisor
At the very bottom of the stack is systemd, the init system running as PID 1 on most modern Linux distributions.
# On a Kubernetes node
ps aux | head -5
# USER PID COMMAND
# root 1 /sbin/init (systemd)
# root ... /usr/bin/kubelet
systemd's job is simple but critical:
- Start services in the correct order at boot
- Monitor services and restart them if they crash
- Provide dependency management between services
The kubelet runs as a systemd service:
# /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/kubelet \
--config=/var/lib/kubelet/config.yaml \
--kubeconfig=/etc/kubernetes/kubelet.conf \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
The key line: Restart=always
If kubelet crashes, systemd restarts it within 10 seconds. This is the foundation of Kubernetes resilience—the node agent is supervised by the operating system itself.
# View kubelet status
systemctl status kubelet
# Watch kubelet restart after killing it (don't do this in production!)
sudo kill $(pgrep kubelet)
# systemd will restart it automatically
Layer 2: The Node Agent
kubelet — The Pod Supervisor
kubelet is the Kubernetes agent running on every node. It has two critical responsibilities:
1. Running Static Pods
kubelet watches a directory (typically /etc/kubernetes/manifests/) for pod manifests and runs them directly—no API server required.
ls /etc/kubernetes/manifests/
# etcd.yaml
# kube-apiserver.yaml
# kube-controller-manager.yaml
# kube-scheduler.yaml
This is how the control plane bootstraps itself. The API server can't schedule pods before it exists, so kubelet runs these components directly from files.
# /etc/kubernetes/manifests/kube-apiserver.yaml (simplified)
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-apiserver
image: registry.k8s.io/kube-apiserver:v1.28.0
command:
- kube-apiserver
- --etcd-servers=https://127.0.0.1:2379
- --service-cluster-ip-range=10.96.0.0/12
# ... more flags
2. Running API-Scheduled Pods
Once the control plane is running, kubelet also:
- Watches the API server for pods scheduled to its node
- Starts containers via the container runtime (containerd)
- Reports pod status back to the API server
- Restarts failed containers based on
restartPolicy
Layer 3: The Control Plane
Static Pods — The Bootstrap Layer
The control plane runs as static pods managed directly by kubelet:
| Component | Role |
|---|---|
| etcd | Distributed key-value store; holds all cluster state |
| kube-apiserver | REST API frontend; all components communicate through it |
| kube-controller-manager | Runs built-in controllers (Deployment, ReplicaSet, etc.) |
| kube-scheduler | Assigns pods to nodes |
These components form a supervision loop:
- kubelet ensures static pods are running
- Control plane components use etcd for persistence
- If a component crashes, kubelet restarts it
- State is never lost because it's in etcd
kube-controller-manager — The Reconciliation Engine
The controller manager runs dozens of controllers, each implementing the reconciliation pattern:
// Simplified reconciliation loop
func (c *DeploymentController) Run() {
for {
// 1. Get desired state from API server (backed by etcd)
deployment := c.client.GetDeployment(name)
desiredReplicas := deployment.Spec.Replicas
// 2. Get current state
replicaSets := c.client.ListReplicaSets(deployment.Selector)
currentReplicas := countReadyReplicas(replicaSets)
// 3. Reconcile
if currentReplicas < desiredReplicas {
c.scaleUp(deployment)
} else if currentReplicas > desiredReplicas {
c.scaleDown(deployment)
}
// 4. Repeat
time.Sleep(reconciliationInterval)
}
}
Key controllers and what they manage:
| Controller | Watches | Ensures |
|---|---|---|
| Deployment | Deployments | Correct ReplicaSets exist |
| ReplicaSet | ReplicaSets | Correct number of pods exist |
| StatefulSet | StatefulSets | Pods with stable identities |
| DaemonSet | DaemonSets | One pod per matching node |
| Job | Jobs | Pods run to completion |
| Service | Services + Pods | Endpoints are updated |
The Foundation is Set
We've now covered the first three layers:
- systemd supervises kubelet (Restart=always)
-
kubelet runs static pods from
/etc/kubernetes/manifests/ - Control plane components persist state in etcd and reconcile continuously
But what about your controllers—NGINX Ingress, cert-manager, Prometheus Operator? How do they survive pod evictions?
In Part 3, we'll explore:
- How application controllers persist through evictions
- The complete persistence chain from hardware to application
- Why controllers are stateless (and why that matters)
- What survives pod evictions vs. what doesn't
Next in this series: Part 3: Controllers & Resilience — Why Kubernetes Self-Heals
Top comments (0)