DEV Community

Cover image for Kubernetes Probe Anti-Pattern: Stop Restarting Pods That Don't Need It
Venkata BhumiReddy
Venkata BhumiReddy

Posted on

Kubernetes Probe Anti-Pattern: Stop Restarting Pods That Don't Need It

Have you ever watched your pod restart counter climb during a MongoDB re-election event or any LDAP connection timeout or any external system failures — even though the JVM was perfectly fine? That's not a MongoDB or external system problem. That's a probe configuration problem. And it's one of the most common anti-patterns we see across Kubernetes deployments.

This post walks through the problem, the live simulation we ran to prove it, and the exact fix using Spring Boot Actuator health groups.


What Are Kubernetes Probes?

Kubernetes uses three probe types to monitor container health:

Probe Question it answers Action on failure
Liveness Is the JVM alive and not deadlocked? Kills and restarts the pod
Readiness Is the app ready to receive traffic? Removes pod from Service endpoints
Startup Has the app finished initializing? Kills pod if startup is too slow

The difference between liveness and readiness is the most important thing to understand before you configure either one:

Liveness says: "this process is broken beyond self-repair — kill it."
Readiness says: "this process isn't ready right now — don't send it traffic."


The Anti-Pattern

Here's what a lot of teams ship to production:

livenessProbe:
  httpGet:
    path: /actuator/health   # ❌ includes MongoDB, diskSpace, ALL deps
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

readinessProbe:
  httpGet:
    path: /actuator/health   # ❌ same endpoint as liveness
    port: 8080
  periodSeconds: 5
  failureThreshold: 2
Enter fullscreen mode Exit fullscreen mode

Both probes point to /actuator/health. That endpoint aggregates everything:

{
  "status": "UP",
  "components": {
    "diskSpace":      { "status": "UP" },
    "livenessState":  { "status": "UP" },
    "mongo":          { "status": "UP" },
    "mongoDB":        { "status": "UP" },
    "ping":           { "status": "UP" },
    "readinessState": { "status": "UP" }
  }
}
Enter fullscreen mode Exit fullscreen mode

The moment MongoDB goes down — even temporarily during a normal primary re-election — /actuator/health returns DOWN. The liveness probe fails. Kubernetes kills the pod.

The pod restart does nothing to fix MongoDB. The JVM was healthy. You just killed a healthy process for no reason.


The Failure Cascade

Here's the timeline when this anti-pattern hits a MongoDB re-election:

t=0s    MongoDB primary pod deleted (normal Kubernetes rolling update / failure)
t=2s    Spring Boot MongoDB driver loses connection
t=2s    /actuator/health → mongo: DOWN → overall: DOWN
t=5s    Liveness probe check #1 → FAIL
t=10s   Liveness probe check #2 → FAIL  ← failureThreshold: 2 reached
t=10s   Kubernetes KILLS the pod
t=40s   Pod still restarting ... MongoDB finishes re-election ✓
t=70s   Pod finally UP — but RESTARTS counter now shows 1, 2, 3...
Enter fullscreen mode Exit fullscreen mode

If MongoDB stays down long enough, you get a restart loop. The pod restarts repeatedly, failing health checks each time, never getting a chance to recover on its own.


The Fix: Split Your Probes

Spring Boot has had dedicated probe endpoints since 2.3. All you need is to enable health groups.

Spring Boot configuration

# application.yaml
management:
  endpoint:
    health:
      probes:
        enabled: true             # enables /liveness and /readiness
      show-details: always
      group:
        liveness:
          include: livenessState        # ✅ JVM only
        readiness:
          include: mongo,readinessState # ✅ DB failure → remove from LB
  endpoints:
    web:
      exposure:
        include: health,info
Enter fullscreen mode Exit fullscreen mode

Kubernetes deployment

livenessProbe:
  httpGet:
    path: /actuator/health/liveness   # ✅ JVM only
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 5
  failureThreshold: 2

readinessProbe:
  httpGet:
    path: /actuator/health/readiness  # ✅ DB failure stops traffic, not pod
    port: 8080
  initialDelaySeconds: 20
  periodSeconds: 5
  failureThreshold: 2
Enter fullscreen mode Exit fullscreen mode

What each endpoint returns now

GET /actuator/health/liveness — MongoDB DOWN? Still 200 UP.

{
  "status": "UP",
  "components": {
    "livenessState": { "status": "UP" }
  }
}
Enter fullscreen mode Exit fullscreen mode

GET /actuator/health/readiness — MongoDB DOWN? Returns 503 DOWN.

{
  "status": "DOWN",
  "components": {
    "readinessState": { "status": "UP" },
    "mongo":          { "status": "DOWN" }
  }
}
Enter fullscreen mode Exit fullscreen mode

Now when MongoDB goes down:

  • Readiness fails → pod is removed from the Service endpoints (no traffic)
  • Liveness stays UP → pod is never killed
  • MongoDB recovers in ~20-30s → readiness passes → pod automatically rejoins
  • RESTARTS counter: 0

Live Demo

Checkout code from https://github.com/codebhumi/app-kubernetes-probes and follow instructions in README.MD to compile and build this application.

Now set this up on Docker Desktop Kubernetes with:

  • MongoDB Community Operator (3-node replica set)
  • Spring Boot 3.3 / Java 21
  • Priority-weighted replica set so pod-0 is always the preferred primary

Weighted priority — makes the demo reproducible

memberConfig:
  - votes: 1
    priority: "2"    # pod-0: always preferred primary
  - votes: 1
    priority: "1"
  - votes: 1
    priority: "1"
Enter fullscreen mode Exit fullscreen mode

Now you always know which pod to kill to trigger a re-election.

The kill command

# Confirm who is primary
kubectl exec -it mongodb-replicaset-0 -n mongodb -c mongod -- mongosh \
  -u admin -p MyMongoExperiment --authenticationDatabase admin \
  --eval 'rs.status().members.forEach(m => print(m.name, m.stateStr))'

# Simulate full outage — kill all three pods
kubectl delete pod mongodb-replicaset-0 \
                    mongodb-replicaset-1 \
                    mongodb-replicaset-2 -n mongodb
Enter fullscreen mode Exit fullscreen mode

What to watch

# Terminal 1 — pod status + endpoints (your "load balancer view")
watch -n 2 '
echo "=== PODS ==="
kubectl get pods -n mongodb | grep app-kubernetes-probes
echo ""
echo "=== ENDPOINTS ==="
kubectl get endpoints app-kubernetes-probes-svc -n mongodb
'

# Terminal 2 — health probe responses
while true; do
  echo "--- $(date +%H:%M:%S) ---"
  echo -n "LIVENESS:  "
  curl -s -o /dev/null -w "%{http_code}" http://localhost:30080/actuator/health/liveness
  echo ""
  echo -n "READINESS: "
  curl -s -o /dev/null -w "%{http_code}" http://localhost:30080/actuator/health/readiness
  echo ""
  sleep 3
done

# Terminal 3 — API traffic
while true; do
  echo "$(date +%H:%M:%S) $(curl -s -o /dev/null -w '%{http_code}' http://localhost:30080/api/products)"
  sleep 2
done
Enter fullscreen mode Exit fullscreen mode

Anti-pattern result

PODS:
app-kubernetes-probes   0/1   Running   1    ← killed once
app-kubernetes-probes   0/1   Running   2    ← killed again
app-kubernetes-probes   1/1   Running   3    ← back but 3 restarts

ENDPOINTS:
app-kubernetes-probes-svc   10.1.0.15:8080   ← NEW IP (pod was killed)
Enter fullscreen mode Exit fullscreen mode

Correct pattern result

PODS:
app-kubernetes-probes   0/1   Running   0    ← removed from LB, NOT killed
app-kubernetes-probes   1/1   Running   0    ← rejoined, ZERO restarts

ENDPOINTS:
app-kubernetes-probes-svc   10.1.0.16:8080   ← SAME IP (pod survived!)
Enter fullscreen mode Exit fullscreen mode

The same IP rejoining is the smoking gun. It proves the pod was never killed — just temporarily removed from rotation.


Before vs After

Scenario Anti-pattern Correct pattern
MongoDB goes down Liveness fails → pod killed Readiness fails → pod removed from LB
RESTARTS counter Climbs: 1, 2, 3... Stays at 0
Recovery time 60-90s (restart + initialDelay) 20-30s (just re-election time)
Pod IP after recovery New IP — pod was killed Same IP — pod survived
Alert noise CrashLoopBackOff fires No alerts — expected transient state
Root cause addressed? No — restart doesn't fix MongoDB N/A — pod never restarted

The Rule of Thumb

Put only livenessState in your liveness group. That's almost always sufficient. If the JVM is alive and not deadlocked, liveness should pass — regardless of what external dependencies are doing.

Put external dependencies (mongo, redis, db) in your readiness group. Their failure means "I can't serve requests right now" — not "kill me."

Liveness  → am I broken?      → livenessState only
Readiness → am I ready?       → livenessState + all your dependencies
Enter fullscreen mode Exit fullscreen mode

TL;DR

  • Both probes pointing to /actuator/health is an anti-pattern
  • When MongoDB goes down, liveness fails, pod gets killed unnecessarily
  • Enable probes.enabled: true in Spring Boot
  • Configure group.liveness.include: livenessState
  • Configure group.readiness.include: mongo,readinessState
  • Switch probe paths to /actuator/health/liveness and /actuator/health/readiness
  • Add serverSelectionTimeoutMS=3000 to your MongoDB URI
  • Watch your RESTARTS counter drop to zero

Resources


Demonstrated on Docker Desktop Kubernetes with MongoDB Community Operator, Spring Boot 3.3, Java 21. Production target: OpenShift.

Top comments (0)