Ajinkya Singh

Posted on Apr 3

Kubernetes Core Workloads: Everything You Need to Know

#kubernetes #docker #devops #linux

If you've made it past the basics of Kubernetes — you know what a cluster is, you've spun up a local environment with kind or minikube — the next wall you hit is understanding the workload objects. There are nine of them. They look similar in YAML. They all involve containers. But they each exist for a very different reason.

The Quick Reference: Which Workload Do You Need?

Before diving in, here's the decision tree you'll use every day:

Workload	Use When
Deployment	Running stateless apps, APIs, web servers, workers
StatefulSet	Databases, message brokers, anything needing stable identity
DaemonSet	Log agents, monitoring — one pod per node
Job	One-off batch tasks: migrations, reports, cleanup
CronJob	Scheduled recurring tasks
Pod (bare)	Almost never in production — use one of the above

Now let's understand why.

Pods: The Atomic Unit

Everything in Kubernetes ultimately runs as a Pod — the smallest deployable unit. You don't run containers directly; you run Pods that contain containers.

Think of a Pod as a tiny apartment that one or more containers share:

They share a single IP address
Containers within a Pod talk to each other via localhost
They can share volumes (disk storage)
They always land on the same node — never split

apiVersion: v1
kind: Pod
metadata:
  name: api-server-pod
spec:
  containers:
    - name: api
      image: mycompany/api:2.1.0
      resources:
        requests:
          memory: "256Mi"
          cpu: "250m"
        limits:
          memory: "512Mi"
          cpu: "1000m"
      livenessProbe:
        httpGet:
          path: /health
          port: 3000
        initialDelaySeconds: 15
        periodSeconds: 20
      readinessProbe:
        httpGet:
          path: /ready
          port: 3000
        initialDelaySeconds: 5
        periodSeconds: 10

The Most Important Thing About Pods

Pods are ephemeral. When a Pod dies, a new one is created with a completely new name and a completely new IP. Any code that hardcoded the old IP breaks. This is why Services exist (Phase 3) — but it's the mental model you need to carry through everything else.

Two Probes You Can't Skip

Liveness probe — "Is this container still alive?" If it fails, Kubernetes restarts the container. Use for crash detection and deadlocks.

Readiness probe — "Is this container ready for traffic?" If it fails, the pod is removed from the Service's endpoints — stops receiving traffic — but is not restarted. Use for startup warmup and temporary overload.

These two probes are doing different jobs. Never confuse them.

Pod Status Lifecycle

Pending → Running → Succeeded
              ↓
         Failed / CrashLoopBackOff / OOMKilled

When you see CrashLoopBackOff, run kubectl logs my-pod --previous — that's where your crash output lives.

Quick Debugging Playbook

# Pod stuck in Pending?
kubectl describe pod my-pod   # read the Events section
# "Insufficient CPU" → reduce requests or add nodes

# ImagePullBackOff?
kubectl describe pod my-pod   # wrong image name? typo? private registry?

# CrashLoopBackOff?
kubectl logs my-pod --previous   # read the crash

# OOMKilled?
kubectl describe pod my-pod   # see Last State → increase memory limit

# Running but not working?
kubectl exec -it my-pod -- /bin/bash   # investigate from inside

🧪 Practice — Pods

Lab 1: Your first pod

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: hello-nginx
  labels:
    app: hello
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80
      resources:
        requests:
          memory: "64Mi"
          cpu: "50m"
        limits:
          memory: "128Mi"
          cpu: "200m"
EOF

kubectl get pods --watch
kubectl describe pod hello-nginx    # read the Events section top to bottom
kubectl exec -it hello-nginx -- /bin/bash
# inside: curl localhost
kubectl port-forward pod/hello-nginx 8080:80 &
curl http://localhost:8080
kill %1

Lab 2: Multi-container pod — shared localhost

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: multi-pod
spec:
  containers:
    - name: app
      image: nginx:1.25
    - name: sidecar
      image: busybox
      command: ['sh', '-c', 'while true; do echo sidecar running; sleep 5; done']
EOF

kubectl logs multi-pod -c app
kubectl logs multi-pod -c sidecar
kubectl exec -it multi-pod -c sidecar -- /bin/sh
# inside: wget -q -O- localhost   ← hits the nginx container via shared IP

Lab 3: Trigger and observe CrashLoopBackOff

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: crasher
spec:
  containers:
    - name: app
      image: busybox
      command: ['sh', '-c', 'echo crashing; exit 1']
EOF

kubectl get pods --watch    # watch it go into CrashLoopBackOff
kubectl logs crasher --previous   # see the crash output

Cleanup

kubectl delete pod hello-nginx multi-pod crasher

Namespaces: Logical Isolation

A Namespace is a virtual cluster inside your real cluster. It's how you divide one Kubernetes cluster into isolated sections — by team, environment, or application.

Four namespaces exist by default:

default — where objects land if you don't specify
kube-system — Kubernetes system components (etcd, apiserver, coredns)
kube-public — publicly readable data (rarely used)
kube-node-lease — node heartbeat objects (internal, ignore)

What Namespaces Give You

Name reuse — you can have a webapp Deployment in both staging and production with no conflict.

RBAC scoping — give team A access only to their namespace.

Resource quotas — cap a namespace to 4 CPUs and 8Gi memory total.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    pods: "50"
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi

One Critical Misconception

Namespaces do not provide network isolation by default. A pod in dev can still reach a pod in prod if it knows the IP or DNS name. For actual network isolation, you need NetworkPolicies.

Cross-namespace DNS format: <service>.<namespace>.svc.cluster.local

🧪 Practice — Namespaces

Lab: Isolation, quotas, and the -n flag

# Create two environments
kubectl create namespace dev
kubectl create namespace prod

# Deploy the same app name in both — no conflict
kubectl create deployment webapp --image=nginx:1.25 -n dev
kubectl create deployment webapp --image=nginx:1.25 -n prod

kubectl get deployments -A    # see both side-by-side

# Apply a quota to prod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: prod-quota
  namespace: prod
spec:
  hard:
    pods: "5"
    requests.cpu: "2"
    requests.memory: 4Gi
EOF

# Try to exceed it
kubectl scale deployment webapp --replicas=10 -n prod
kubectl get pods -n prod              # stops at quota limit
kubectl describe resourcequota prod-quota -n prod   # usage vs hard

# Set dev as your default namespace
kubectl config set-context --current --namespace=dev
kubectl get pods    # now implicitly reads from dev

# Switch back
kubectl config set-context --current --namespace=default

Cleanup

kubectl delete namespace dev prod

Labels, Selectors & Annotations

Labels are key-value pairs on any Kubernetes object. They're the connective tissue — how Services find Pods, how Deployments track their ReplicaSets, how you filter resources.

Annotations are also key-value pairs but for non-identifying metadata: descriptions, tool config, URLs. They cannot be used in selectors.

metadata:
  labels:
    app: api
    env: production
    version: "2.1.0"   # always quote values that look like numbers
  annotations:
    description: "Payments API server"
    prometheus.io/scrape: "true"

How a Service Uses Labels

apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  selector:
    app: api           # matches pods with app=api
    env: production    # AND env=production
  ports:
    - port: 80
      targetPort: 3000

The Service routes traffic to any pod with both labels. Remove a label from a pod and it's immediately removed from the Service's endpoints. Add it back and it rejoins. This is live — no restart required.

Set-Based Selectors

Beyond simple equality, you can use expressions:

selector:
  matchExpressions:
    - key: env
      operator: In
      values: [production, staging]
    - key: tier
      operator: NotIn
      values: [frontend]
    - key: app
      operator: Exists

One gotcha: always quote version numbers in YAML. version: 2.0 is parsed as a float, stored as "2" — breaking your selectors silently.

🧪 Practice — Labels & Selectors

Lab: Live label manipulation and Service routing

# Create three pods with different label combos
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: frontend-v1
  labels:
    app: frontend
    env: prod
    version: "1.0"
spec:
  containers:
    - name: nginx
      image: nginx:1.25
---
apiVersion: v1
kind: Pod
metadata:
  name: backend-v1
  labels:
    app: backend
    env: prod
    version: "1.0"
spec:
  containers:
    - name: nginx
      image: nginx:1.24
---
apiVersion: v1
kind: Pod
metadata:
  name: backend-staging
  labels:
    app: backend
    env: staging
    version: "2.0"
spec:
  containers:
    - name: nginx
      image: nginx:1.26
EOF

# Filter practice
kubectl get pods --show-labels
kubectl get pods -l app=backend
kubectl get pods -l app=backend,env=prod
kubectl get pods -l 'env in (prod,staging)'
kubectl get pods -l 'version notin (1.0)'

# Create a Service that selects app=backend + env=prod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: backend-svc
spec:
  selector:
    app: backend
    env: prod
  ports:
    - port: 80
      targetPort: 80
EOF

kubectl describe service backend-svc   # check Endpoints section

# Live label surgery — remove env from backend-v1
kubectl label pod backend-v1 env-
kubectl describe service backend-svc   # endpoint disappears immediately

# Add it back
kubectl label pod backend-v1 env=prod
kubectl describe service backend-svc   # endpoint returns

Cleanup

kubectl delete pod frontend-v1 backend-v1 backend-staging
kubectl delete service backend-svc

ReplicaSets: Self-Healing

A ReplicaSet ensures a specified number of identical Pods are always running.

spec.replicas: 3

Pod-2 crashes → ReplicaSet creates Pod-4 → back to 3
Rogue pod added → ReplicaSet deletes one → back to 3

The RS continuously counts pods matching its selector. Actual count ≠ desired → create or delete.

Why You Almost Never Create ReplicaSets Directly

ReplicaSets don't support rolling updates or rollbacks. That's exactly what Deployments add. In practice, you create a Deployment and it creates/manages a ReplicaSet for you.

You need to understand ReplicaSets because:

Deployments own them under the hood
Debugging often means inspecting the RS
It explains the self-healing behavior you see

The RS name format when created by a Deployment: <deployment-name>-<pod-template-hash>. That hash changes every time the pod template changes — which is exactly how rolling updates work.

🧪 Practice — ReplicaSets

Lab: Self-healing in action

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: demo-rs
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
        - name: nginx
          image: nginx:1.25
EOF

kubectl get pods -l app=demo

# Open a watch in a second terminal: kubectl get pods --watch

# Kill one pod — RS heals immediately
kubectl delete pod $(kubectl get pods -l app=demo -o jsonpath='{.items[0].metadata.name}')
kubectl get pods -l app=demo    # new pod already created

# Scale up
kubectl scale rs demo-rs --replicas=5
kubectl get pods -l app=demo

# Try creating a stray pod with the same label
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: stray-pod
  labels:
    app: demo        # matches RS selector!
spec:
  containers:
    - name: nginx
      image: nginx:1.25
EOF

kubectl get pods -l app=demo    # RS sees 6, wants 5 → deletes one

Cleanup

kubectl delete rs demo-rs

Deployments: The Standard Way to Run Apps

A Deployment is what you use 90% of the time. It wraps a ReplicaSet and adds rolling updates, rollbacks, and version history.

You → Deployment → manages → ReplicaSets → manages → Pods
                      │
                      ├── RS v1 (old image)  → 0 pods
                      └── RS v2 (new image)  → 3 pods ✅

How a Rolling Update Actually Works

With maxSurge: 1, maxUnavailable: 0 and 3 replicas:

[v1][v1][v1]               # start

[v1][v1][v1][v2]           # spin up 1 new pod (surge)
[v1][v1][v2]               # v2 passes readiness → kill 1 v1

[v1][v1][v2][v2]           # spin up another v2
[v1][v2][v2]               # v2 passes → kill v1

[v1][v2][v2][v2]           # final new pod
[v2][v2][v2]               # all updated, zero downtime ✅

The readiness probe gates each step. Without it, Kubernetes can't tell when a new pod is actually ready.

Production Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: orders-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0   # never go below desired count
  minReadySeconds: 10
  revisionHistoryLimit: 5
  template:
    metadata:
      labels:
        app: orders-api
    spec:
      containers:
        - name: api
          image: mycompany/orders-api:3.0.1
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

The Commands You'll Use Daily

# Update image → triggers rolling update
kubectl set image deployment/my-app app=nginx:1.26

# Watch it happen
kubectl rollout status deployment/my-app

# Something went wrong?
kubectl rollout undo deployment/my-app

# Check history
kubectl rollout history deployment/my-app

# Restart all pods (rolling, with zero downtime)
kubectl rollout restart deployment/my-app

kubectl rollout undo is your emergency brake. Run it first, debug second.

🧪 Practice — Deployments

Lab: Full deployment lifecycle — deploy, update, break, rollback

# Deploy v1
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
  annotations:
    kubernetes.io/change-cause: "initial deployment nginx 1.24"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
        - name: nginx
          image: nginx:1.24
          readinessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 3
            periodSeconds: 5
EOF

kubectl get pods --watch    # Ctrl+C when all 3 Running
kubectl get rs              # see the RS created by the Deployment

# Rolling update to 1.25
kubectl set image deployment/webapp nginx=nginx:1.25
kubectl annotate deployment/webapp kubernetes.io/change-cause="upgrade to nginx 1.25" --overwrite
kubectl rollout status deployment/webapp
kubectl rollout history deployment/webapp   # see 2 revisions
kubectl get rs    # old RS now has 0 pods but still exists for rollback

# Simulate a bad update — nonexistent image
kubectl set image deployment/webapp nginx=nginx:NONEXISTENT
kubectl rollout status deployment/webapp    # hangs — new pods can't start
kubectl get pods    # old pods still serving! new pods in ErrImagePull

# Emergency rollback
kubectl rollout undo deployment/webapp
kubectl rollout status deployment/webapp    # watch recovery
kubectl get pods    # all healthy again

# Scale
kubectl scale deployment webapp --replicas=6
kubectl scale deployment webapp --replicas=2

Cleanup

kubectl delete deployment webapp

Imperative vs Declarative: The Mental Model

There are two ways to work with Kubernetes.

Imperative — tell Kubernetes what to do step by step:

kubectl create deployment web --image=nginx:1.25
kubectl scale deployment web --replicas=3
kubectl set image deployment/web nginx=nginx:1.26

Declarative — tell Kubernetes what you want, let it figure out how:

kubectl apply -f web.yaml
# edit the file
kubectl apply -f web.yaml   # kubernetes computes the diff

The key advantage of declarative: idempotency.

# Imperative — fails on second run
kubectl create deployment web --image=nginx
# Error: deployments.apps "web" already exists

# Declarative — safe to run forever
kubectl apply -f web.yaml
# deployment.apps/web created      (first run)
# deployment.apps/web unchanged    (no changes)
# deployment.apps/web configured   (file changed)

In production: always declarative. Store YAML in Git. CI/CD runs kubectl apply. This is GitOps — Git is the source of truth, the cluster always mirrors it.

Generate YAML Fast

kubectl create deployment web --image=nginx:1.25 --replicas=3 \
  --dry-run=client -o yaml > deployment.yaml

--dry-run=client -o yaml generates the YAML locally without sending anything to the API server. Edit it, then apply. This is extremely useful in CKA/CKAD exams.

🧪 Practice — Imperative vs Declarative

Lab: Generate, edit, and apply

# Generate Deployment YAML without applying
kubectl create deployment web --image=nginx:1.25 --replicas=3 \
  --dry-run=client -o yaml > deployment.yaml

cat deployment.yaml    # inspect what was generated

# Add resources block manually, then apply
kubectl apply -f deployment.yaml
kubectl apply -f deployment.yaml    # safe to run again — "unchanged"

# Edit the file — change replicas to 5
sed -i 's/replicas: 3/replicas: 5/' deployment.yaml
kubectl apply -f deployment.yaml    # "configured" — only diff applied
kubectl get deployment web

# Generate Pod and Service YAML for practice
kubectl run my-pod --image=busybox --dry-run=client -o yaml > pod.yaml
kubectl expose deployment web --port=80 --type=ClusterIP \
  --dry-run=client -o yaml > service.yaml

Cleanup

kubectl delete deployment web

DaemonSets: One Pod per Node

A DaemonSet ensures exactly one Pod runs on every node in the cluster. New node joins → pod auto-created. Node removed → pod garbage collected.

Node 1      Node 2      Node 3      Node 4 (new)
[log]       [log]       [log]       [log] ← auto-created

When to Use DaemonSets

Use Case	Examples
Log collection	Fluentd, Filebeat, Promtail
Monitoring agents	Prometheus node-exporter, Datadog agent
Network plugins	Calico, Cilium, kube-proxy
Security agents	Falco, Sysdig

kube-proxy itself is a DaemonSet. Verify: kubectl get ds -n kube-system

Running on Control-Plane Nodes

By default, DaemonSet pods won't run on control-plane nodes because of a taint: node-role.kubernetes.io/control-plane:NoSchedule. If you need them there, add a toleration:

spec:
  tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
      effect: NoSchedule

Update Strategies

RollingUpdate (default): automatically updates pods one node at a time
OnDelete: only updates pods when you manually delete them — useful for per-node maintenance windows

🧪 Practice — DaemonSets

Lab: Per-node coverage and system DaemonSets

# See existing DaemonSets in the cluster
kubectl get ds -n kube-system -o wide
# kube-proxy is a DaemonSet — one pod per node

# Create a simple node-monitoring DaemonSet
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-monitor
spec:
  selector:
    matchLabels:
      app: node-monitor
  template:
    metadata:
      labels:
        app: node-monitor
    spec:
      containers:
        - name: monitor
          image: busybox
          command:
            - sh
            - -c
            - while true; do echo "Node $(NODE_NAME) alive at $(date)"; sleep 10; done
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          resources:
            requests:
              memory: "20Mi"
              cpu: "10m"
EOF

# Verify one pod per worker node
kubectl get pods -l app=node-monitor -o wide

# Check logs from one
POD=$(kubectl get pods -l app=node-monitor -o jsonpath='{.items[0].metadata.name}')
kubectl logs $POD

Cleanup

kubectl delete ds node-monitor

Jobs & CronJobs: Batch and Scheduled Work

Job — runs pods to completion. Once finished successfully, stops. For one-off batch tasks.

CronJob — runs a Job on a schedule (cron syntax). For recurring tasks.

Deployment:  "Run this forever"
Job:         "Run this ONCE until it succeeds"
CronJob:     "Run this Job every day at 2am"

Job YAML

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 3              # retry up to 3 times
  activeDeadlineSeconds: 300   # kill after 5 minutes regardless
  ttlSecondsAfterFinished: 60  # auto-delete 60s after completion
  template:
    spec:
      restartPolicy: OnFailure  # REQUIRED — Always is invalid for Jobs
      containers:
        - name: migration
          image: mycompany/api:3.0.1
          command: ["node", "scripts/migrate.js"]

The restartPolicy: OnFailure (or Never) is mandatory — Jobs cannot use Always because they'd never complete.

Set ttlSecondsAfterFinished or finished Jobs pile up in your cluster forever.

CronJob YAML

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"        # every day at 2:00 AM
  concurrencyPolicy: Forbid    # skip if previous run still running
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      backoffLimit: 2
      ttlSecondsAfterFinished: 300
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: reporter
              image: mycompany/reporter:1.0
              command: ["python", "generate_report.py"]

Cron Schedule Quick Reference

0 2 * * *       Every day at 2:00 AM
*/5 * * * *     Every 5 minutes
0 9 * * 1       Every Monday at 9:00 AM
0 0 1 * *       1st of every month at midnight
@daily          Shortcut for 0 0 * * *
@hourly         Shortcut for 0 * * * *

Trigger a CronJob Manually

kubectl create job --from=cronjob/nightly-report manual-run-01

This creates a Job immediately using the CronJob's template, without affecting scheduled runs.

🧪 Practice — Jobs & CronJobs

Lab 1: Simple one-off Job

cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: hello-job
spec:
  backoffLimit: 2
  ttlSecondsAfterFinished: 60
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: hello
          image: busybox
          command: ['sh', '-c', 'echo "Job running at $(date)"; sleep 5; echo Done!']
EOF

kubectl get pods --watch    # watch pod go: Pending → Running → Completed
kubectl logs -l job-name=hello-job
kubectl get job hello-job   # COMPLETIONS should show 1/1

Lab 2: CronJob that runs every minute

cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: CronJob
metadata:
  name: minute-job
spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 60
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: echo
              image: busybox
              command: ['sh', '-c', 'echo "Scheduled run at $(date)"']
EOF

# Wait 1-2 minutes
kubectl get jobs
kubectl get cronjob minute-job

# Trigger it manually right now
kubectl create job --from=cronjob/minute-job manual-run-01
kubectl logs -l job-name=manual-run-01

# Suspend future runs without deleting
kubectl patch cronjob minute-job -p '{"spec":{"suspend":true}}'
kubectl get cronjob minute-job    # SUSPEND should show True

Cleanup

kubectl delete cronjob minute-job

StatefulSets: Stateful Applications

A StatefulSet is like a Deployment, but for applications that need three guarantees:

Stable, predictable names — pods are always pod-0, pod-1, pod-2
Stable storage — each pod gets its own PVC that sticks with it across restarts
Ordered startup/shutdown — pods start in order (0→1→2) and stop in reverse

DEPLOYMENT:              STATEFULSET:
my-app-abc12 (random)    mysql-0  (always 0)
my-app-def34 (random)    mysql-1  (always 1)
my-app-ghi56 (random)    mysql-2  (always 2)

Pod dies → new name      Pod dies → comes back as mysql-1
Pod dies → new IP        Pod dies → same storage re-attached
Starts in any order      Starts 0 first, then 1, then 2

The Headless Service Requirement

StatefulSets require a headless Service (clusterIP: None) to provide stable DNS names per pod:

mysql-0.mysql-svc.default.svc.cluster.local → 10.244.1.5
mysql-1.mysql-svc.default.svc.cluster.local → 10.244.2.8

A normal Service load-balances across pods. A headless Service returns individual DNS records for each pod — so applications can say "connect to the primary at mysql-0.mysql-svc" and mean it.

StatefulSet YAML

# Headless Service — create this FIRST
apiVersion: v1
kind: Service
metadata:
  name: mysql-svc
spec:
  clusterIP: None    # ← this makes it headless
  selector:
    app: mysql
  ports:
    - port: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql-svc    # links to headless service above
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
        - name: mysql
          image: mysql:8.0
          volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
  volumeClaimTemplates:     # ← key difference from Deployment
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

volumeClaimTemplates creates individual PVCs per pod: data-mysql-0, data-mysql-1, data-mysql-2.

The PVC Retention Gotcha

When you delete a StatefulSet, the PVCs are not deleted. This is intentional — data is precious. But it means they stay around until you clean them up manually:

kubectl delete sts mysql
kubectl delete pvc -l app=mysql   # must do this separately

Canary Updates with partition

The partition field in rollingUpdate is powerful for staged rollouts:

updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    partition: 2    # only update pods with ordinal >= 2

Set partition: 2 to update only mysql-2 first. Test it. Lower to 1. Lower to 0 to complete. This is native canary deployment for StatefulSets.

🧪 Practice — StatefulSets

Lab: Stable identity, persistent storage, and ordered startup

# Create headless service + StatefulSet
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: web-svc
spec:
  clusterIP: None
  selector:
    app: web
  ports:
    - port: 80
      name: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: web-svc
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: nginx
          image: nginx:1.25
          volumeMounts:
            - name: www
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
    - metadata:
        name: www
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi
EOF

# Watch ordered startup — web-0 first, then web-1, then web-2
kubectl get pods --watch

# Confirm stable names
kubectl get pods -l app=web   # always web-0, web-1, web-2

# See the per-pod PVCs created
kubectl get pvc   # www-web-0, www-web-1, www-web-2

# Write data to web-0's volume
kubectl exec web-0 -- sh -c 'echo "pod-0 data" > /usr/share/nginx/html/index.html'

# Delete web-0 — watch it come back with the SAME name
kubectl delete pod web-0
kubectl get pods --watch    # web-0 comes back, not a random name

# Data persists — PVC was reattached
kubectl exec web-0 -- cat /usr/share/nginx/html/index.html   # still "pod-0 data"

# Reach a specific pod by stable DNS from another pod
kubectl run tmp --image=busybox --rm -it --restart=Never -- \
  wget -q -O- web-0.web-svc.default.svc.cluster.local

Cleanup — note PVCs must be deleted separately

kubectl delete sts web
kubectl delete svc web-svc
kubectl delete pvc -l app=web    # PVCs are NOT auto-deleted

The Master Cheat Sheet

# LIST
kubectl get pods / deploy / rs / ds / sts / jobs / cronjobs

# USEFUL FLAGS
kubectl get pods -o wide              # +IP +NODE
kubectl get pods --show-labels        # show label columns
kubectl get pods -l app=my-app        # filter by label
kubectl get pods -A                   # all namespaces

# INSPECT (always read the Events section)
kubectl describe pod/deploy/sts <n>

# LOGS
kubectl logs <pod>
kubectl logs <pod> --previous         # after a crash
kubectl logs <pod> -c <container>     # multi-container pod

# SHELL
kubectl exec -it <pod> -- /bin/bash

# DEPLOYMENTS
kubectl scale deploy <n> --replicas=5
kubectl set image deploy/<n> <container>=<image>:tag
kubectl rollout status deploy/<n>
kubectl rollout undo deploy/<n>
kubectl rollout restart deploy/<n>

# JOBS
kubectl create job test --from=cronjob/<n>   # manual trigger
kubectl patch cronjob <n> -p '{"spec":{"suspend":true}}'

# GENERATE YAML BOILERPLATE
kubectl create deployment web --image=nginx:1.25 --replicas=3 \
  --dry-run=client -o yaml > deployment.yaml

API Versions Quick Reference

Workload	apiVersion	kind
Pod	`v1`	`Pod`
Namespace	`v1`	`Namespace`
ReplicaSet	`apps/v1`	`ReplicaSet`
Deployment	`apps/v1`	`Deployment`
DaemonSet	`apps/v1`	`DaemonSet`
StatefulSet	`apps/v1`	`StatefulSet`
Job	`batch/v1`	`Job`
CronJob	`batch/v1`	`CronJob`

Restart Policy Rules

Policy	Use In	Behaviour
`Always`	Deployments, DaemonSets	Restart on any exit — default
`OnFailure`	Jobs	Restart only on non-zero exit
`Never`	Jobs	Never restart — new pod on each retry

🧪 Final Practice — End-to-End Scenario

This lab ties everything together. You'll deploy a full application stack using every concept from this article.

Scenario: Deploy a web application with a background worker, a nightly cleanup job, and per-node logging.

# Step 1: Create a dedicated namespace
kubectl create namespace myapp

# Step 2: Deploy the web app (Deployment)
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  namespace: myapp
  labels:
    app: web
    tier: frontend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
        tier: frontend
    spec:
      containers:
        - name: nginx
          image: nginx:1.25
          resources:
            requests: { memory: "64Mi", cpu: "50m" }
            limits:   { memory: "128Mi", cpu: "200m" }
          readinessProbe:
            httpGet: { path: /, port: 80 }
            initialDelaySeconds: 3
EOF

# Step 3: Deploy a background worker (Deployment, different labels)
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker
  namespace: myapp
  labels:
    app: worker
    tier: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: worker
  template:
    metadata:
      labels:
        app: worker
        tier: backend
    spec:
      containers:
        - name: worker
          image: busybox
          command: ['sh', '-c', 'while true; do echo "worker processing..."; sleep 10; done']
          resources:
            requests: { memory: "32Mi", cpu: "25m" }
EOF

# Step 4: Per-node log collector (DaemonSet)
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
  namespace: myapp
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      containers:
        - name: collector
          image: busybox
          command: ['sh', '-c', 'while true; do echo "collecting logs from $(NODE_NAME)"; sleep 15; done']
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          resources:
            requests: { memory: "20Mi", cpu: "10m" }
EOF

# Step 5: Nightly cleanup Job (CronJob)
cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: CronJob
metadata:
  name: cleanup
  namespace: myapp
spec:
  schedule: "0 3 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 2
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 120
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: cleanup
              image: busybox
              command: ['sh', '-c', 'echo "cleaning up old data..."; sleep 3; echo done']
EOF

# Inspect the whole namespace
kubectl get all -n myapp

# Check labels
kubectl get pods -n myapp --show-labels

# Filter by tier
kubectl get pods -n myapp -l tier=frontend
kubectl get pods -n myapp -l tier=backend

# Trigger cleanup job manually
kubectl create job --from=cronjob/cleanup manual-cleanup-01 -n myapp
kubectl logs -l job-name=manual-cleanup-01 -n myapp

# Simulate rolling update on web
kubectl set image deployment/web nginx=nginx:1.26 -n myapp
kubectl rollout status deployment/web -n myapp

# Scale worker up
kubectl scale deployment worker --replicas=3 -n myapp
kubectl get pods -n myapp

Cleanup

kubectl delete namespace myapp    # deletes everything inside it