DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on

Project: One App — Three Probes — Real Failures

Visual mental model (keep this in mind first)

Image

Image

Image

Traffic rule (critical):

  • ❌ Startup probe FAIL → container is NOT checked by liveness/readiness
  • ❌ Readiness probe FAIL → pod gets NO traffic
  • ❌ Liveness probe FAIL → pod is RESTARTED

What you will demonstrate

You will run one app that:

  • Starts slowly (startup probe needed)
  • Can become temporarily unavailable (readiness probe needed)
  • Can become stuck (liveness probe needed)

This mirrors real production behavior.


Step 0 — Prerequisites

minikube start
kubectl get nodes
Enter fullscreen mode Exit fullscreen mode

Step 1 — Demo application (intentionally imperfect)

We’ll use a small HTTP app that:

  • Takes 20 seconds to start
  • Has /ready and /health endpoints

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: probe-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: probe-demo
  template:
    metadata:
      labels:
        app: probe-demo
    spec:
      containers:
      - name: app
        image: ghcr.io/stefanprodan/podinfo:6.5.3
        ports:
        - containerPort: 9898
Enter fullscreen mode Exit fullscreen mode

Apply:

kubectl apply -f deployment.yaml
kubectl get pods
Enter fullscreen mode Exit fullscreen mode

At this point:

  • App starts slowly
  • Kubernetes does not know
  • Traffic could hit it too early

Step 2 — Expose the app

apiVersion: v1
kind: Service
metadata:
  name: probe-demo-svc
spec:
  selector:
    app: probe-demo
  ports:
  - port: 80
    targetPort: 9898
  type: NodePort
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f service.yaml
minikube service probe-demo-svc
Enter fullscreen mode Exit fullscreen mode

Now the REAL learning starts


STARTUP PROBE

What it is

“Don’t judge this container until it has fully started.”

Why DevOps needs it

Without startup probe:

  • Liveness probe may restart slow apps endlessly
  • JVM, Python, DB-connected apps suffer most

Add startup probe

startupProbe:
  httpGet:
    path: /health
    port: 9898
  failureThreshold: 30
  periodSeconds: 1
Enter fullscreen mode Exit fullscreen mode

Meaning:

  • Kubernetes waits 30 seconds
  • Liveness & readiness are disabled until startup succeeds

Demonstrate failure (remove it)

kubectl delete pod -l app=probe-demo
kubectl describe pod
Enter fullscreen mode Exit fullscreen mode

What you’ll see:

  • Pod restarts before finishing startup
  • Classic CrashLoopBackOff for slow apps

READINESS PROBE

What it is

“Can this pod receive traffic RIGHT NOW?”

Why DevOps cares

  • Prevents traffic during deploys
  • Prevents broken pods from serving users
  • Required for zero-downtime rolling updates

Add readiness probe

readinessProbe:
  httpGet:
    path: /ready
    port: 9898
  initialDelaySeconds: 5
  periodSeconds: 3
Enter fullscreen mode Exit fullscreen mode

Observe

kubectl get pods
kubectl describe pod
kubectl get endpoints probe-demo-svc
Enter fullscreen mode Exit fullscreen mode

When readiness fails:

  • Pod stays Running
  • But is removed from Service endpoints
  • No traffic sent

Demonstrate issue if missing

Remove readiness probe → deploy new version:

  • Traffic hits half-started pods
  • Users see 502 / empty responses

LIVENESS PROBE

What it is

“Is this container still healthy or stuck?”

Why DevOps needs it

  • Apps can hang forever
  • Memory leaks
  • Deadlocks
  • Kubernetes must restart them automatically

Add liveness probe

livenessProbe:
  httpGet:
    path: /health
    port: 9898
  initialDelaySeconds: 20
  periodSeconds: 5
Enter fullscreen mode Exit fullscreen mode

Demonstrate restart

Simulate failure:

kubectl exec -it pod-name -- kill 1
Enter fullscreen mode Exit fullscreen mode

Observe:

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

You will see:

  • Pod restarted automatically
  • No manual intervention

Without liveness probe

  • Pod stays Running
  • App is dead
  • Kubernetes does NOTHING

Final combined (production-correct)

startupProbe:
  httpGet:
    path: /health
    port: 9898
  failureThreshold: 30
  periodSeconds: 1

readinessProbe:
  httpGet:
    path: /ready
    port: 9898
  periodSeconds: 3

livenessProbe:
  httpGet:
    path: /health
    port: 9898
  initialDelaySeconds: 20
  periodSeconds: 5
Enter fullscreen mode Exit fullscreen mode

DevOps engineer checklist (this is interview-level)

You MUST know:

  • Startup → protects slow boot apps
  • Readiness → controls traffic
  • Liveness → auto-recovery
  • Readiness ≠ Liveness (most common mistake)
  • Missing readiness = downtime
  • Missing liveness = silent failure
  • Wrong startup = endless restarts

How to explain this in production terms

“Startup protects initialization, readiness protects users, liveness protects the platform.”

Top comments (0)