DEV Community

Cover image for Kubernetes Persistence Series Part 3: Controllers & Resilience — Why Kubernetes Self-Heals
Vincent Du
Vincent Du

Posted on

Kubernetes Persistence Series Part 3: Controllers & Resilience — Why Kubernetes Self-Heals

What You'll Learn

  • How application controllers (NGINX Ingress, cert-manager) persist through evictions
  • Why controllers are stateless and can restart anywhere
  • The complete persistence chain from hardware to application
  • What survives pod evictions vs. what doesn't

Previously

In Part 1, we debugged a missing ingress after GKE node upgrades. In Part 2, we explored how systemd supervises kubelet, and how kubelet bootstraps the control plane through static pods.

Now we reach the final layer: your application controllers—and the elegant insight that makes Kubernetes truly resilient.


Layer 4: Application Controllers

How Application Controllers Persist

Controllers like NGINX Ingress, cert-manager, and Prometheus Operator are deployed as Deployments or StatefulSets:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
  template:
    spec:
      containers:
      - name: controller
        image: registry.k8s.io/ingress-nginx/controller:v1.9.0
Enter fullscreen mode Exit fullscreen mode

When this pod is evicted:

  1. kubelet stops reporting the pod → control plane marks it terminated
  2. ReplicaSet controller notices: current replicas (0) < desired (1)
  3. ReplicaSet creates a new pod specification
  4. Scheduler assigns the pod to a healthy node
  5. kubelet on that node starts the container
  6. NGINX controller reconnects to API server and resumes watching ingresses

The controller itself doesn't store state—it reads everything from the API server (backed by etcd).

Helm Release Persistence

Helm stores release information in Kubernetes secrets:

kubectl get secret -n monitoring -l owner=helm -o yaml
Enter fullscreen mode Exit fullscreen mode
apiVersion: v1
kind: Secret
metadata:
  name: sh.helm.release.v1.prometheus.v3
  labels:
    owner: helm
    name: prometheus
    version: "3"
type: helm.sh/release.v1
data:
  release: H4sIAAAAAAAAA... # Base64 encoded release manifest
Enter fullscreen mode Exit fullscreen mode

This secret contains:

  • The chart that was installed
  • The values that were used
  • The computed manifest of all resources

Because this is stored in etcd via the API server, Helm releases survive any pod eviction.


The Complete Persistence Chain

┌─────────────────────────────────────────────────────────────────────┐
│                     Linux Host (Physical/VM)                        │
├─────────────────────────────────────────────────────────────────────┤
│  systemd (PID 1)                                                    │
│  ├── Supervises all system services                                 │
│  ├── Restarts failed services automatically                         │
│  └── Config: /etc/systemd/system/                                   │
│      │                                                              │
│      └── kubelet.service                                            │
│          ├── Started and supervised by systemd                      │
│          ├── Watches /etc/kubernetes/manifests/ for static pods     │
│          ├── Watches API server for scheduled pods                  │
│          └── Ensures containers match pod specs                     │
│              │                                                      │
│              ├── Static Pods (/etc/kubernetes/manifests/)           │
│              │   ├── etcd ──────────────────┐                       │
│              │   ├── kube-apiserver ◄───────┤ Persistent            │
│              │   ├── kube-controller-manager│ State Store           │
│              │   └── kube-scheduler         │                       │
│              │                              │                       │
│              └── Regular Pods ◄─────────────┘                       │
│                  │                 (scheduled via API server)       │
│                  │                                                  │
│                  ├── kube-system namespace                          │
│                  │   ├── CoreDNS                                    │
│                  │   ├── kube-proxy                                 │
│                  │   └── CNI plugins                                │
│                  │                                                  │
│                  ├── ingress-nginx namespace                        │
│                  │   └── NGINX Ingress Controller                   │
│                  │       └── Watches Ingress resources              │
│                  │                                                  │
│                  └── Application namespaces                         │
│                      ├── cert-manager                               │
│                      ├── Prometheus Operator                        │
│                      └── Your applications                          │
└─────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The Critical Insight: Controllers Are Stateless

This is the elegant core of the design: controllers don't store state.

Every controller:

  1. Reads desired state from the API server (backed by etcd)
  2. Watches for changes via the API server
  3. Makes changes through the API server
  4. Can be restarted anywhere, anytime, without losing information

The API server + etcd is the single source of truth, not the controllers.

Stateless Controllers Architecture

This is why you can:

  • Delete any controller pod → it restarts and catches up
  • Move controllers between nodes → they just reconnect
  • Scale controllers to multiple replicas → they coordinate via the API server
  • Upgrade controllers → new version reads the same state

What Survives vs. What Doesn't

Survives Any Pod Eviction

Resource Why It Survives
Kubernetes objects in etcd Stored independently of pods
Helm releases Stored as secrets in etcd
Operator-managed CRDs Reconciled by operator continuously
PersistentVolumes Storage exists outside the cluster
ConfigMaps/Secrets Stored in etcd

Doesn't Survive Without Help

Resource Why It Doesn't Survive
Pod-local EmptyDir volumes Deleted with the pod
Manually applied resources with missing dependencies Validation webhooks reject on recreation
In-memory caches Process restarts lose memory
Node-local state Unless explicitly persisted

The Elegance of the Design

The Kubernetes architecture embodies several design principles:

  1. Declarative over imperative — Describe desired state, not steps to get there
  2. Reconciliation over transactions — Continuously converge to desired state
  3. Stateless controllers — State lives in etcd, not in components
  4. Hierarchical supervision — Every layer watches the layer above
  5. Failure is normal — Design for recovery, not prevention

This is why Kubernetes clusters can:

  • Lose nodes unexpectedly
  • Have pods evicted for resource pressure
  • Experience network partitions
  • Undergo rolling upgrades

...and still maintain application availability.


Conclusion

The journey from debugging a missing ingress to understanding the complete supervision hierarchy revealed the sophisticated machinery that makes Kubernetes resilient.

systemd → kubelet → static pods → control plane → controllers → your apps
Enter fullscreen mode Exit fullscreen mode

Each layer supervises the next, with etcd as the persistent memory that survives any component failure.

The key insight: Kubernetes doesn't prevent failures—it recovers from them automatically through layers of supervision, persistent state in etcd, and continuous reconciliation loops.

This is the true power of Kubernetes: not that things don't fail, but that when they do, the system knows how to restore itself to the desired state.


Series Recap

  1. Part 1: When Our Ingress Vanished — The incident that started it all
  2. Part 2: The Foundation — systemd → kubelet → control plane
  3. Part 3: Controllers & Resilience — Why Kubernetes self-heals

Further Reading


Found this series useful? Follow for more Kubernetes internals content!

Top comments (0)