Vincent Du

Posted on Jan 11

Kubernetes Persistence Series Part 2: The Foundation — From systemd to Control Plane

#kubernetes #devops #linux #architecture

What You'll Learn

How Linux systemd supervises the kubelet process
The role of static pods in bootstrapping the control plane
How the controller manager implements reconciliation loops
The complete 4-layer supervision model

Previously

In Part 1, we investigated why a Grafana ingress disappeared after GKE node upgrades. The fix was straightforward: use Helm-managed resources instead of manual kubectl apply.

But that raised a deeper question: How do controllers themselves survive pod evictions?

The answer is a hierarchical supervision model—each layer watches the layer above it, ensuring continuous operation despite failures.

The Four Layers of Kubernetes Supervision

In this post, we'll explore Layers 1-3. Part 3 covers Layer 4 and the complete resilience model.

Layer 1: The Linux Foundation

systemd — The Root Supervisor

At the very bottom of the stack is systemd, the init system running as PID 1 on most modern Linux distributions.

# On a Kubernetes node
ps aux | head -5
# USER  PID  COMMAND
# root    1  /sbin/init (systemd)
# root  ...  /usr/bin/kubelet

systemd's job is simple but critical:

Start services in the correct order at boot
Monitor services and restart them if they crash
Provide dependency management between services

The kubelet runs as a systemd service:

# /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/usr/bin/kubelet \
    --config=/var/lib/kubelet/config.yaml \
    --kubeconfig=/etc/kubernetes/kubelet.conf \
    --container-runtime-endpoint=unix:///run/containerd/containerd.sock
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

The key line: Restart=always

If kubelet crashes, systemd restarts it within 10 seconds. This is the foundation of Kubernetes resilience—the node agent is supervised by the operating system itself.

# View kubelet status
systemctl status kubelet

# Watch kubelet restart after killing it (don't do this in production!)
sudo kill $(pgrep kubelet)
# systemd will restart it automatically

Layer 2: The Node Agent

kubelet — The Pod Supervisor

kubelet is the Kubernetes agent running on every node. It has two critical responsibilities:

1. Running Static Pods

kubelet watches a directory (typically /etc/kubernetes/manifests/) for pod manifests and runs them directly—no API server required.

ls /etc/kubernetes/manifests/
# etcd.yaml
# kube-apiserver.yaml
# kube-controller-manager.yaml
# kube-scheduler.yaml

This is how the control plane bootstraps itself. The API server can't schedule pods before it exists, so kubelet runs these components directly from files.

# /etc/kubernetes/manifests/kube-apiserver.yaml (simplified)
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: registry.k8s.io/kube-apiserver:v1.28.0
    command:
    - kube-apiserver
    - --etcd-servers=https://127.0.0.1:2379
    - --service-cluster-ip-range=10.96.0.0/12
    # ... more flags

2. Running API-Scheduled Pods

Once the control plane is running, kubelet also:

Watches the API server for pods scheduled to its node
Starts containers via the container runtime (containerd)
Reports pod status back to the API server
Restarts failed containers based on restartPolicy

Layer 3: The Control Plane

Static Pods — The Bootstrap Layer

The control plane runs as static pods managed directly by kubelet:

Component	Role
etcd	Distributed key-value store; holds all cluster state
kube-apiserver	REST API frontend; all components communicate through it
kube-controller-manager	Runs built-in controllers (Deployment, ReplicaSet, etc.)
kube-scheduler	Assigns pods to nodes

These components form a supervision loop:

kubelet ensures static pods are running
Control plane components use etcd for persistence
If a component crashes, kubelet restarts it
State is never lost because it's in etcd

kube-controller-manager — The Reconciliation Engine

The controller manager runs dozens of controllers, each implementing the reconciliation pattern:

// Simplified reconciliation loop
func (c *DeploymentController) Run() {
    for {
        // 1. Get desired state from API server (backed by etcd)
        deployment := c.client.GetDeployment(name)
        desiredReplicas := deployment.Spec.Replicas

        // 2. Get current state
        replicaSets := c.client.ListReplicaSets(deployment.Selector)
        currentReplicas := countReadyReplicas(replicaSets)

        // 3. Reconcile
        if currentReplicas < desiredReplicas {
            c.scaleUp(deployment)
        } else if currentReplicas > desiredReplicas {
            c.scaleDown(deployment)
        }

        // 4. Repeat
        time.Sleep(reconciliationInterval)
    }
}

Key controllers and what they manage:

Controller	Watches	Ensures
Deployment	Deployments	Correct ReplicaSets exist
ReplicaSet	ReplicaSets	Correct number of pods exist
StatefulSet	StatefulSets	Pods with stable identities
DaemonSet	DaemonSets	One pod per matching node
Job	Jobs	Pods run to completion
Service	Services + Pods	Endpoints are updated

The Foundation is Set

We've now covered the first three layers:

systemd supervises kubelet (Restart=always)
kubelet runs static pods from /etc/kubernetes/manifests/
Control plane components persist state in etcd and reconcile continuously

But what about your controllers—NGINX Ingress, cert-manager, Prometheus Operator? How do they survive pod evictions?

In Part 3, we'll explore:

How application controllers persist through evictions
The complete persistence chain from hardware to application
Why controllers are stateless (and why that matters)
What survives pod evictions vs. what doesn't

Next in this series: Part 3: Controllers & Resilience — Why Kubernetes Self-Heals

DEV Community