Venkata BhumiReddy

Posted on Mar 18

Kubernetes Probe Anti-Pattern: Stop Restarting Pods That Don't Need It

#kubernetes #springboot #resilience #bestpractices

Have you ever watched your pod restart counter climb during a MongoDB re-election event or any LDAP connection timeout or any external system failures — even though the JVM was perfectly fine? That's not a MongoDB or external system problem. That's a probe configuration problem. And it's one of the most common anti-patterns we see across Kubernetes deployments.

This post walks through the problem, the live simulation we ran to prove it, and the exact fix using Spring Boot Actuator health groups.

What Are Kubernetes Probes?

Kubernetes uses three probe types to monitor container health:

Probe	Question it answers	Action on failure
Liveness	Is the JVM alive and not deadlocked?	Kills and restarts the pod
Readiness	Is the app ready to receive traffic?	Removes pod from Service endpoints
Startup	Has the app finished initializing?	Kills pod if startup is too slow

The difference between liveness and readiness is the most important thing to understand before you configure either one:

Liveness says: "this process is broken beyond self-repair — kill it."
Readiness says: "this process isn't ready right now — don't send it traffic."

The Anti-Pattern

Here's what a lot of teams ship to production:

livenessProbe:
  httpGet:
    path: /actuator/health   # ❌ includes MongoDB, diskSpace, ALL deps
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

readinessProbe:
  httpGet:
    path: /actuator/health   # ❌ same endpoint as liveness
    port: 8080
  periodSeconds: 5
  failureThreshold: 2

Both probes point to /actuator/health. That endpoint aggregates everything:

{
  "status": "UP",
  "components": {
    "diskSpace":      { "status": "UP" },
    "livenessState":  { "status": "UP" },
    "mongo":          { "status": "UP" },
    "mongoDB":        { "status": "UP" },
    "ping":           { "status": "UP" },
    "readinessState": { "status": "UP" }
  }
}

The moment MongoDB goes down — even temporarily during a normal primary re-election — /actuator/health returns DOWN. The liveness probe fails. Kubernetes kills the pod.

The pod restart does nothing to fix MongoDB. The JVM was healthy. You just killed a healthy process for no reason.

The Failure Cascade

Here's the timeline when this anti-pattern hits a MongoDB re-election:

t=0s    MongoDB primary pod deleted (normal Kubernetes rolling update / failure)
t=2s    Spring Boot MongoDB driver loses connection
t=2s    /actuator/health → mongo: DOWN → overall: DOWN
t=5s    Liveness probe check #1 → FAIL
t=10s   Liveness probe check #2 → FAIL  ← failureThreshold: 2 reached
t=10s   Kubernetes KILLS the pod
t=40s   Pod still restarting ... MongoDB finishes re-election ✓
t=70s   Pod finally UP — but RESTARTS counter now shows 1, 2, 3...

If MongoDB stays down long enough, you get a restart loop. The pod restarts repeatedly, failing health checks each time, never getting a chance to recover on its own.

The Fix: Split Your Probes

Spring Boot has had dedicated probe endpoints since 2.3. All you need is to enable health groups.

Spring Boot configuration

# application.yaml
management:
  endpoint:
    health:
      probes:
        enabled: true             # enables /liveness and /readiness
      show-details: always
      group:
        liveness:
          include: livenessState        # ✅ JVM only
        readiness:
          include: mongo,readinessState # ✅ DB failure → remove from LB
  endpoints:
    web:
      exposure:
        include: health,info

Kubernetes deployment

livenessProbe:
  httpGet:
    path: /actuator/health/liveness   # ✅ JVM only
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 5
  failureThreshold: 2

readinessProbe:
  httpGet:
    path: /actuator/health/readiness  # ✅ DB failure stops traffic, not pod
    port: 8080
  initialDelaySeconds: 20
  periodSeconds: 5
  failureThreshold: 2

What each endpoint returns now

GET /actuator/health/liveness — MongoDB DOWN? Still 200 UP.

{
  "status": "UP",
  "components": {
    "livenessState": { "status": "UP" }
  }
}

GET /actuator/health/readiness — MongoDB DOWN? Returns 503 DOWN.

{
  "status": "DOWN",
  "components": {
    "readinessState": { "status": "UP" },
    "mongo":          { "status": "DOWN" }
  }
}

Now when MongoDB goes down:

Readiness fails → pod is removed from the Service endpoints (no traffic)
Liveness stays UP → pod is never killed
MongoDB recovers in ~20-30s → readiness passes → pod automatically rejoins
RESTARTS counter: 0

Live Demo

Checkout code from https://github.com/codebhumi/app-kubernetes-probes and follow instructions in README.MD to compile and build this application.

Now set this up on Docker Desktop Kubernetes with:

MongoDB Community Operator (3-node replica set)
Spring Boot 3.3 / Java 21
Priority-weighted replica set so pod-0 is always the preferred primary

Weighted priority — makes the demo reproducible

memberConfig:
  - votes: 1
    priority: "2"    # pod-0: always preferred primary
  - votes: 1
    priority: "1"
  - votes: 1
    priority: "1"

Now you always know which pod to kill to trigger a re-election.

The kill command

# Confirm who is primary
kubectl exec -it mongodb-replicaset-0 -n mongodb -c mongod -- mongosh \
  -u admin -p MyMongoExperiment --authenticationDatabase admin \
  --eval 'rs.status().members.forEach(m => print(m.name, m.stateStr))'

# Simulate full outage — kill all three pods
kubectl delete pod mongodb-replicaset-0 \
                    mongodb-replicaset-1 \
                    mongodb-replicaset-2 -n mongodb

What to watch

# Terminal 1 — pod status + endpoints (your "load balancer view")
watch -n 2 '
echo "=== PODS ==="
kubectl get pods -n mongodb | grep app-kubernetes-probes
echo ""
echo "=== ENDPOINTS ==="
kubectl get endpoints app-kubernetes-probes-svc -n mongodb
'

# Terminal 2 — health probe responses
while true; do
  echo "--- $(date +%H:%M:%S) ---"
  echo -n "LIVENESS:  "
  curl -s -o /dev/null -w "%{http_code}" http://localhost:30080/actuator/health/liveness
  echo ""
  echo -n "READINESS: "
  curl -s -o /dev/null -w "%{http_code}" http://localhost:30080/actuator/health/readiness
  echo ""
  sleep 3
done

# Terminal 3 — API traffic
while true; do
  echo "$(date +%H:%M:%S) $(curl -s -o /dev/null -w '%{http_code}' http://localhost:30080/api/products)"
  sleep 2
done

Anti-pattern result

PODS:
app-kubernetes-probes   0/1   Running   1    ← killed once
app-kubernetes-probes   0/1   Running   2    ← killed again
app-kubernetes-probes   1/1   Running   3    ← back but 3 restarts

ENDPOINTS:
app-kubernetes-probes-svc   10.1.0.15:8080   ← NEW IP (pod was killed)

Correct pattern result

PODS:
app-kubernetes-probes   0/1   Running   0    ← removed from LB, NOT killed
app-kubernetes-probes   1/1   Running   0    ← rejoined, ZERO restarts

ENDPOINTS:
app-kubernetes-probes-svc   10.1.0.16:8080   ← SAME IP (pod survived!)

The same IP rejoining is the smoking gun. It proves the pod was never killed — just temporarily removed from rotation.

Before vs After

Scenario	Anti-pattern	Correct pattern
MongoDB goes down	Liveness fails → pod killed	Readiness fails → pod removed from LB
RESTARTS counter	Climbs: 1, 2, 3...	Stays at 0
Recovery time	60-90s (restart + initialDelay)	20-30s (just re-election time)
Pod IP after recovery	New IP — pod was killed	Same IP — pod survived
Alert noise	CrashLoopBackOff fires	No alerts — expected transient state
Root cause addressed?	No — restart doesn't fix MongoDB	N/A — pod never restarted

The Rule of Thumb

Put only livenessState in your liveness group. That's almost always sufficient. If the JVM is alive and not deadlocked, liveness should pass — regardless of what external dependencies are doing.

Put external dependencies (mongo, redis, db) in your readiness group. Their failure means "I can't serve requests right now" — not "kill me."

Liveness  → am I broken?      → livenessState only
Readiness → am I ready?       → livenessState + all your dependencies

TL;DR

Both probes pointing to /actuator/health is an anti-pattern
When MongoDB goes down, liveness fails, pod gets killed unnecessarily
Enable probes.enabled: true in Spring Boot
Configure group.liveness.include: livenessState
Configure group.readiness.include: mongo,readinessState
Switch probe paths to /actuator/health/liveness and /actuator/health/readiness
Add serverSelectionTimeoutMS=3000 to your MongoDB URI
Watch your RESTARTS counter drop to zero

Resources

Demonstrated on Docker Desktop Kubernetes with MongoDB Community Operator, Spring Boot 3.3, Java 21. Production target: OpenShift.

DEV Community