Have you ever watched your pod restart counter climb during a MongoDB re-election event or any LDAP connection timeout or any external system failures — even though the JVM was perfectly fine? That's not a MongoDB or external system problem. That's a probe configuration problem. And it's one of the most common anti-patterns we see across Kubernetes deployments.
This post walks through the problem, the live simulation we ran to prove it, and the exact fix using Spring Boot Actuator health groups.
What Are Kubernetes Probes?
Kubernetes uses three probe types to monitor container health:
| Probe | Question it answers | Action on failure |
|---|---|---|
| Liveness | Is the JVM alive and not deadlocked? | Kills and restarts the pod |
| Readiness | Is the app ready to receive traffic? | Removes pod from Service endpoints |
| Startup | Has the app finished initializing? | Kills pod if startup is too slow |
The difference between liveness and readiness is the most important thing to understand before you configure either one:
Liveness says: "this process is broken beyond self-repair — kill it."
Readiness says: "this process isn't ready right now — don't send it traffic."
The Anti-Pattern
Here's what a lot of teams ship to production:
livenessProbe:
httpGet:
path: /actuator/health # ❌ includes MongoDB, diskSpace, ALL deps
port: 8080
periodSeconds: 5
failureThreshold: 2
readinessProbe:
httpGet:
path: /actuator/health # ❌ same endpoint as liveness
port: 8080
periodSeconds: 5
failureThreshold: 2
Both probes point to /actuator/health. That endpoint aggregates everything:
{
"status": "UP",
"components": {
"diskSpace": { "status": "UP" },
"livenessState": { "status": "UP" },
"mongo": { "status": "UP" },
"mongoDB": { "status": "UP" },
"ping": { "status": "UP" },
"readinessState": { "status": "UP" }
}
}
The moment MongoDB goes down — even temporarily during a normal primary re-election — /actuator/health returns DOWN. The liveness probe fails. Kubernetes kills the pod.
The pod restart does nothing to fix MongoDB. The JVM was healthy. You just killed a healthy process for no reason.
The Failure Cascade
Here's the timeline when this anti-pattern hits a MongoDB re-election:
t=0s MongoDB primary pod deleted (normal Kubernetes rolling update / failure)
t=2s Spring Boot MongoDB driver loses connection
t=2s /actuator/health → mongo: DOWN → overall: DOWN
t=5s Liveness probe check #1 → FAIL
t=10s Liveness probe check #2 → FAIL ← failureThreshold: 2 reached
t=10s Kubernetes KILLS the pod
t=40s Pod still restarting ... MongoDB finishes re-election ✓
t=70s Pod finally UP — but RESTARTS counter now shows 1, 2, 3...
If MongoDB stays down long enough, you get a restart loop. The pod restarts repeatedly, failing health checks each time, never getting a chance to recover on its own.
The Fix: Split Your Probes
Spring Boot has had dedicated probe endpoints since 2.3. All you need is to enable health groups.
Spring Boot configuration
# application.yaml
management:
endpoint:
health:
probes:
enabled: true # enables /liveness and /readiness
show-details: always
group:
liveness:
include: livenessState # ✅ JVM only
readiness:
include: mongo,readinessState # ✅ DB failure → remove from LB
endpoints:
web:
exposure:
include: health,info
Kubernetes deployment
livenessProbe:
httpGet:
path: /actuator/health/liveness # ✅ JVM only
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
failureThreshold: 2
readinessProbe:
httpGet:
path: /actuator/health/readiness # ✅ DB failure stops traffic, not pod
port: 8080
initialDelaySeconds: 20
periodSeconds: 5
failureThreshold: 2
What each endpoint returns now
GET /actuator/health/liveness — MongoDB DOWN? Still 200 UP.
{
"status": "UP",
"components": {
"livenessState": { "status": "UP" }
}
}
GET /actuator/health/readiness — MongoDB DOWN? Returns 503 DOWN.
{
"status": "DOWN",
"components": {
"readinessState": { "status": "UP" },
"mongo": { "status": "DOWN" }
}
}
Now when MongoDB goes down:
- Readiness fails → pod is removed from the Service endpoints (no traffic)
- Liveness stays UP → pod is never killed
- MongoDB recovers in ~20-30s → readiness passes → pod automatically rejoins
- RESTARTS counter: 0
Live Demo
Checkout code from https://github.com/codebhumi/app-kubernetes-probes and follow instructions in README.MD to compile and build this application.
Now set this up on Docker Desktop Kubernetes with:
- MongoDB Community Operator (3-node replica set)
- Spring Boot 3.3 / Java 21
- Priority-weighted replica set so pod-0 is always the preferred primary
Weighted priority — makes the demo reproducible
memberConfig:
- votes: 1
priority: "2" # pod-0: always preferred primary
- votes: 1
priority: "1"
- votes: 1
priority: "1"
Now you always know which pod to kill to trigger a re-election.
The kill command
# Confirm who is primary
kubectl exec -it mongodb-replicaset-0 -n mongodb -c mongod -- mongosh \
-u admin -p MyMongoExperiment --authenticationDatabase admin \
--eval 'rs.status().members.forEach(m => print(m.name, m.stateStr))'
# Simulate full outage — kill all three pods
kubectl delete pod mongodb-replicaset-0 \
mongodb-replicaset-1 \
mongodb-replicaset-2 -n mongodb
What to watch
# Terminal 1 — pod status + endpoints (your "load balancer view")
watch -n 2 '
echo "=== PODS ==="
kubectl get pods -n mongodb | grep app-kubernetes-probes
echo ""
echo "=== ENDPOINTS ==="
kubectl get endpoints app-kubernetes-probes-svc -n mongodb
'
# Terminal 2 — health probe responses
while true; do
echo "--- $(date +%H:%M:%S) ---"
echo -n "LIVENESS: "
curl -s -o /dev/null -w "%{http_code}" http://localhost:30080/actuator/health/liveness
echo ""
echo -n "READINESS: "
curl -s -o /dev/null -w "%{http_code}" http://localhost:30080/actuator/health/readiness
echo ""
sleep 3
done
# Terminal 3 — API traffic
while true; do
echo "$(date +%H:%M:%S) $(curl -s -o /dev/null -w '%{http_code}' http://localhost:30080/api/products)"
sleep 2
done
Anti-pattern result
PODS:
app-kubernetes-probes 0/1 Running 1 ← killed once
app-kubernetes-probes 0/1 Running 2 ← killed again
app-kubernetes-probes 1/1 Running 3 ← back but 3 restarts
ENDPOINTS:
app-kubernetes-probes-svc 10.1.0.15:8080 ← NEW IP (pod was killed)
Correct pattern result
PODS:
app-kubernetes-probes 0/1 Running 0 ← removed from LB, NOT killed
app-kubernetes-probes 1/1 Running 0 ← rejoined, ZERO restarts
ENDPOINTS:
app-kubernetes-probes-svc 10.1.0.16:8080 ← SAME IP (pod survived!)
The same IP rejoining is the smoking gun. It proves the pod was never killed — just temporarily removed from rotation.
Before vs After
| Scenario | Anti-pattern | Correct pattern |
|---|---|---|
| MongoDB goes down | Liveness fails → pod killed | Readiness fails → pod removed from LB |
| RESTARTS counter | Climbs: 1, 2, 3... | Stays at 0 |
| Recovery time | 60-90s (restart + initialDelay) | 20-30s (just re-election time) |
| Pod IP after recovery | New IP — pod was killed | Same IP — pod survived |
| Alert noise | CrashLoopBackOff fires | No alerts — expected transient state |
| Root cause addressed? | No — restart doesn't fix MongoDB | N/A — pod never restarted |
The Rule of Thumb
Put only livenessState in your liveness group. That's almost always sufficient. If the JVM is alive and not deadlocked, liveness should pass — regardless of what external dependencies are doing.
Put external dependencies (mongo, redis, db) in your readiness group. Their failure means "I can't serve requests right now" — not "kill me."
Liveness → am I broken? → livenessState only
Readiness → am I ready? → livenessState + all your dependencies
TL;DR
- Both probes pointing to
/actuator/healthis an anti-pattern - When MongoDB goes down, liveness fails, pod gets killed unnecessarily
- Enable
probes.enabled: truein Spring Boot - Configure
group.liveness.include: livenessState - Configure
group.readiness.include: mongo,readinessState - Switch probe paths to
/actuator/health/livenessand/actuator/health/readiness - Add
serverSelectionTimeoutMS=3000to your MongoDB URI - Watch your RESTARTS counter drop to zero
Resources
- Kubernetes Probe Documentation
- Spring Boot — Kubernetes Probes
- Spring Boot — Health Groups
- MongoDB Community Operator
- Kubernetes Production Best Practices
Demonstrated on Docker Desktop Kubernetes with MongoDB Community Operator, Spring Boot 3.3, Java 21. Production target: OpenShift.
Top comments (0)