It's Tuesday morning. The platform team starts draining nodes for a Kubernetes upgrade. Sixty seconds later, Slack explodes — the payment service is fully down. All 3 replicas landed on the same two nodes, both drained simultaneously. There was nothing wrong with the app. The cluster did exactly what it was told.*
This is what PodDisruptionBudgets prevent.
The Problem
Kubernetes has two kinds of pod disruptions:
- Involuntary: Node crashes, OOM kills, hardware failures. Unpredictable. You handle these with replicas and health checks.
- Voluntary: Node drains, cluster upgrades, autoscaler scale-downs, spot instance reclaims. Planned and controlled.
For voluntary disruptions, Kubernetes asks the eviction API to remove pods. By default, the eviction API has zero awareness of your application's availability requirements. It will happily evict every replica of your service at once if they're all on the node being drained.
Replicas don't help if the system removes all of them simultaneously.
What Is a PodDisruptionBudget?
A PDB is a simple declaration: "During voluntary disruptions, always keep at least N pods (or at most M pods unavailable) for this application."
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-service-pdb
namespace: production
spec:
minAvailable: 2 # OR use maxUnavailable: 1
selector:
matchLabels:
app: payment-service
That's it. This tells the eviction API: "You may not evict a payment-service pod if doing so would drop the available count below 2."
How It Works
┌─────────────────────────────────────────────────────────────┐
│ WITHOUT PDB │
│ │
│ kubectl drain node-2 │
│ │ │
│ ▼ │
│ Evict pod-A ──── ✓ Gone │
│ Evict pod-B ──── ✓ Gone │
│ Evict pod-C ──── ✓ Gone │
│ │
│ Result: 0/3 replicas running. Service DOWN. │
│ (New pods schedule eventually, but there's a gap) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────-┐
│ WITH PDB (minAvailable: 2) │
│ │
│ kubectl drain node-2 │
│ │ │
│ ▼ │
│ Evict pod-A ──── ✓ Allowed (3→2, still ≥ 2) │
│ Evict pod-B ──── ✗ BLOCKED (would go 2→1, violates PDB) │
│ │ │
│ ▼ (waits...) │
│ pod-A reschedules on node-3 ──── ✓ Running │
│ │ │
│ ▼ (now 3 available again) │
│ Evict pod-B ──── ✓ Allowed (3→2, still ≥ 2) │
│ │
│ Result: Always ≥ 2 replicas running. Service STAYS UP. │
└─────────────────────────────────────────────────────────────-┘
The drain operation becomes serialized and respectful — it waits for replacements to come healthy before continuing.
minAvailable vs maxUnavailable
Two ways to express the same idea:
| Field | Meaning | Example (5 replicas) |
|---|---|---|
minAvailable: 3 |
At least 3 must be running at all times | Can evict up to 2 at once |
maxUnavailable: 2 |
At most 2 can be down at once | Same effect |
You can also use percentages:
spec:
maxUnavailable: "25%" # For a 4-replica app: max 1 pod down
Rule of thumb: Use maxUnavailable for large deployments (scales naturally with replica count). Use minAvailable when you have a hard quorum requirement (e.g., etcd needs 2/3 members alive).
When You Need One
- Any production service with > 1 replica
- Stateful workloads with quorum (etcd, ZooKeeper, Kafka)
- During cluster upgrades (nodes drain one by one)
- When using cluster autoscaler (it respects PDBs during scale-down)
- Spot/preemptible instances (cloud provider can reclaim nodes)
** When to Be Careful**
-
Don't set
minAvailableequal to your replica count. A PDB ofminAvailable: 3on a 3-replica deployment means nothing can ever be evicted. Node drains will hang forever. - Don't forget PDBs block node drains. If your PDB is too strict and pods can't reschedule (due to resource pressure, node affinity, etc.), your drain operation will be stuck indefinitely.
-
Single-replica deployments: A PDB with
minAvailable: 1on a 1-replica app means the pod can never be evicted. Either accept downtime or add replicas.
The Minimum Viable PDB for Every Service
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: <app>-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: <app>
One line of config. Guarantees at least one pod stays alive during any voluntary disruption. The cost: drains take slightly longer because they wait for rescheduling. The benefit: you never get paged because a routine node drain cascaded into an outage.
PDBs don't prevent disruptions. They civilize them — turning a shotgun blast into a controlled, one-at-a-time handoff. The five minutes it takes to add one is significantly less than the five hours debugging why a cluster upgrade took down production at 2 AM.*
Top comments (0)