DEV Community

Cover image for Kubernetes Persistence Series Part 2: The Foundation — From systemd to Control Plane
Vincent Du
Vincent Du

Posted on

Kubernetes Persistence Series Part 2: The Foundation — From systemd to Control Plane

What You'll Learn

  • How Linux systemd supervises the kubelet process
  • The role of static pods in bootstrapping the control plane
  • How the controller manager implements reconciliation loops
  • The complete 4-layer supervision model

Previously

In Part 1, we investigated why a Grafana ingress disappeared after GKE node upgrades. The fix was straightforward: use Helm-managed resources instead of manual kubectl apply.

But that raised a deeper question: How do controllers themselves survive pod evictions?

The answer is a hierarchical supervision model—each layer watches the layer above it, ensuring continuous operation despite failures.

The Four Layers of Kubernetes Supervision

The Four Layers of Kubernetes Supervision

In this post, we'll explore Layers 1-3. Part 3 covers Layer 4 and the complete resilience model.


Layer 1: The Linux Foundation

systemd — The Root Supervisor

At the very bottom of the stack is systemd, the init system running as PID 1 on most modern Linux distributions.

# On a Kubernetes node
ps aux | head -5
# USER  PID  COMMAND
# root    1  /sbin/init (systemd)
# root  ...  /usr/bin/kubelet
Enter fullscreen mode Exit fullscreen mode

systemd's job is simple but critical:

  • Start services in the correct order at boot
  • Monitor services and restart them if they crash
  • Provide dependency management between services

The kubelet runs as a systemd service:

# /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/usr/bin/kubelet \
    --config=/var/lib/kubelet/config.yaml \
    --kubeconfig=/etc/kubernetes/kubelet.conf \
    --container-runtime-endpoint=unix:///run/containerd/containerd.sock
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

The key line: Restart=always

If kubelet crashes, systemd restarts it within 10 seconds. This is the foundation of Kubernetes resilience—the node agent is supervised by the operating system itself.

# View kubelet status
systemctl status kubelet

# Watch kubelet restart after killing it (don't do this in production!)
sudo kill $(pgrep kubelet)
# systemd will restart it automatically
Enter fullscreen mode Exit fullscreen mode

Layer 2: The Node Agent

kubelet — The Pod Supervisor

kubelet is the Kubernetes agent running on every node. It has two critical responsibilities:

1. Running Static Pods

kubelet watches a directory (typically /etc/kubernetes/manifests/) for pod manifests and runs them directly—no API server required.

ls /etc/kubernetes/manifests/
# etcd.yaml
# kube-apiserver.yaml
# kube-controller-manager.yaml
# kube-scheduler.yaml
Enter fullscreen mode Exit fullscreen mode

This is how the control plane bootstraps itself. The API server can't schedule pods before it exists, so kubelet runs these components directly from files.

# /etc/kubernetes/manifests/kube-apiserver.yaml (simplified)
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: registry.k8s.io/kube-apiserver:v1.28.0
    command:
    - kube-apiserver
    - --etcd-servers=https://127.0.0.1:2379
    - --service-cluster-ip-range=10.96.0.0/12
    # ... more flags
Enter fullscreen mode Exit fullscreen mode

2. Running API-Scheduled Pods

Once the control plane is running, kubelet also:

  • Watches the API server for pods scheduled to its node
  • Starts containers via the container runtime (containerd)
  • Reports pod status back to the API server
  • Restarts failed containers based on restartPolicy

Pod Scheduling Sequence


Layer 3: The Control Plane

Static Pods — The Bootstrap Layer

The control plane runs as static pods managed directly by kubelet:

Component Role
etcd Distributed key-value store; holds all cluster state
kube-apiserver REST API frontend; all components communicate through it
kube-controller-manager Runs built-in controllers (Deployment, ReplicaSet, etc.)
kube-scheduler Assigns pods to nodes

These components form a supervision loop:

  • kubelet ensures static pods are running
  • Control plane components use etcd for persistence
  • If a component crashes, kubelet restarts it
  • State is never lost because it's in etcd

kube-controller-manager — The Reconciliation Engine

The controller manager runs dozens of controllers, each implementing the reconciliation pattern:

// Simplified reconciliation loop
func (c *DeploymentController) Run() {
    for {
        // 1. Get desired state from API server (backed by etcd)
        deployment := c.client.GetDeployment(name)
        desiredReplicas := deployment.Spec.Replicas

        // 2. Get current state
        replicaSets := c.client.ListReplicaSets(deployment.Selector)
        currentReplicas := countReadyReplicas(replicaSets)

        // 3. Reconcile
        if currentReplicas < desiredReplicas {
            c.scaleUp(deployment)
        } else if currentReplicas > desiredReplicas {
            c.scaleDown(deployment)
        }

        // 4. Repeat
        time.Sleep(reconciliationInterval)
    }
}
Enter fullscreen mode Exit fullscreen mode

Key controllers and what they manage:

Controller Watches Ensures
Deployment Deployments Correct ReplicaSets exist
ReplicaSet ReplicaSets Correct number of pods exist
StatefulSet StatefulSets Pods with stable identities
DaemonSet DaemonSets One pod per matching node
Job Jobs Pods run to completion
Service Services + Pods Endpoints are updated

The Foundation is Set

We've now covered the first three layers:

  1. systemd supervises kubelet (Restart=always)
  2. kubelet runs static pods from /etc/kubernetes/manifests/
  3. Control plane components persist state in etcd and reconcile continuously

But what about your controllers—NGINX Ingress, cert-manager, Prometheus Operator? How do they survive pod evictions?

In Part 3, we'll explore:

  • How application controllers persist through evictions
  • The complete persistence chain from hardware to application
  • Why controllers are stateless (and why that matters)
  • What survives pod evictions vs. what doesn't

Next in this series: Part 3: Controllers & Resilience — Why Kubernetes Self-Heals

Top comments (0)