<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Naman</title>
    <description>The latest articles on DEV Community by Naman (@namansharma18899).</description>
    <link>https://dev.to/namansharma18899</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1225761%2F9830ef98-993c-43d9-832b-f485ee5ff2ca.jpeg</url>
      <title>DEV Community: Naman</title>
      <link>https://dev.to/namansharma18899</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/namansharma18899"/>
    <language>en</language>
    <item>
      <title>PodDisruptionBudgets: Your Kubernetes Outage Insurance</title>
      <dc:creator>Naman</dc:creator>
      <pubDate>Fri, 26 Jun 2026 05:05:11 +0000</pubDate>
      <link>https://dev.to/namansharma18899/poddisruptionbudgets-your-kubernetes-outage-insurance-36da</link>
      <guid>https://dev.to/namansharma18899/poddisruptionbudgets-your-kubernetes-outage-insurance-36da</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;It's Tuesday morning&lt;/em&gt;&lt;/strong&gt;. The platform team starts draining nodes for a Kubernetes upgrade. Sixty seconds later, Slack explodes — the payment service is fully down. All 3 replicas landed on the same two nodes, both drained simultaneously. There was nothing wrong with the app. The cluster did exactly what it was told.*&lt;/p&gt;

&lt;p&gt;This is what PodDisruptionBudgets prevent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Kubernetes has two kinds of pod disruptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Involuntary&lt;/strong&gt;: Node crashes, OOM kills, hardware failures. Unpredictable. You handle these with replicas and health checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voluntary&lt;/strong&gt;: Node drains, cluster upgrades, autoscaler scale-downs, spot instance reclaims. Planned and controlled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For voluntary disruptions, Kubernetes asks the eviction API to remove pods. By default, the eviction API has &lt;strong&gt;zero awareness&lt;/strong&gt; of your application's availability requirements. It will happily evict every replica of your service at once if they're all on the node being drained.&lt;/p&gt;

&lt;p&gt;Replicas don't help if the system removes all of them simultaneously.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Is a PodDisruptionBudget?
&lt;/h3&gt;

&lt;p&gt;A PDB is a simple declaration: &lt;strong&gt;"During voluntary disruptions, always keep at least N pods (or at most M pods unavailable) for this application."&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PodDisruptionBudget&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payment-service-pdb&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;minAvailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;          &lt;span class="c1"&gt;# OR use maxUnavailable: 1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payment-service&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. This tells the eviction API: "You may not evict a &lt;code&gt;payment-service&lt;/code&gt; pod if doing so would drop the available count below 2."&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                    WITHOUT PDB                              │
│                                                             │
│  kubectl drain node-2                                       │
│       │                                                     │
│       ▼                                                     │
│  Evict pod-A ──── ✓ Gone                                    │
│  Evict pod-B ──── ✓ Gone                                    │
│  Evict pod-C ──── ✓ Gone                                    │
│                                                             │
│  Result: 0/3 replicas running. Service DOWN.                │
│  (New pods schedule eventually, but there's a gap)          │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────-┐
│                     WITH PDB (minAvailable: 2)               │
│                                                              │
│  kubectl drain node-2                                        │
│       │                                                      │
│       ▼                                                      │
│  Evict pod-A ──── ✓ Allowed (3→2, still ≥ 2)                 │
│  Evict pod-B ──── ✗ BLOCKED (would go 2→1, violates PDB)     │
│       │                                                      │
│       ▼ (waits...)                                           │
│  pod-A reschedules on node-3 ──── ✓ Running                  │
│       │                                                      │
│       ▼ (now 3 available again)                              │
│  Evict pod-B ──── ✓ Allowed (3→2, still ≥ 2)                 │
│                                                              │
│  Result: Always ≥ 2 replicas running. Service STAYS UP.      │
└─────────────────────────────────────────────────────────────-┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The drain operation becomes &lt;strong&gt;serialized and respectful&lt;/strong&gt; — it waits for replacements to come healthy before continuing.&lt;/p&gt;




&lt;h2&gt;
  
  
  minAvailable vs maxUnavailable
&lt;/h2&gt;

&lt;p&gt;Two ways to express the same idea:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Example (5 replicas)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;minAvailable: 3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;At least 3 must be running at all times&lt;/td&gt;
&lt;td&gt;Can evict up to 2 at once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;maxUnavailable: 2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;At most 2 can be down at once&lt;/td&gt;
&lt;td&gt;Same effect&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can also use percentages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;maxUnavailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;25%"&lt;/span&gt;    &lt;span class="c1"&gt;# For a 4-replica app: max 1 pod down&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rule of thumb&lt;/strong&gt;: Use &lt;code&gt;maxUnavailable&lt;/code&gt; for large deployments (scales naturally with replica count). Use &lt;code&gt;minAvailable&lt;/code&gt; when you have a hard quorum requirement (e.g., etcd needs 2/3 members alive).&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;When You Need One&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Any production service with &amp;gt; 1 replica&lt;/li&gt;
&lt;li&gt;Stateful workloads with quorum (etcd, ZooKeeper, Kafka)&lt;/li&gt;
&lt;li&gt;During cluster upgrades (nodes drain one by one)&lt;/li&gt;
&lt;li&gt;When using cluster autoscaler (it respects PDBs during scale-down)&lt;/li&gt;
&lt;li&gt;Spot/preemptible instances (cloud provider can reclaim nodes)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ** When to Be Careful**
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't set &lt;code&gt;minAvailable&lt;/code&gt; equal to your replica count.&lt;/strong&gt; A PDB of &lt;code&gt;minAvailable: 3&lt;/code&gt; on a 3-replica deployment means &lt;em&gt;nothing can ever be evicted&lt;/em&gt;. Node drains will hang forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't forget PDBs block node drains.&lt;/strong&gt; If your PDB is too strict and pods can't reschedule (due to resource pressure, node affinity, etc.), your drain operation will be stuck indefinitely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-replica deployments&lt;/strong&gt;: A PDB with &lt;code&gt;minAvailable: 1&lt;/code&gt; on a 1-replica app means the pod can never be evicted. Either accept downtime or add replicas.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Minimum Viable PDB for Every Service&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PodDisruptionBudget&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;app&amp;gt;-pdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;maxUnavailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;app&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line of config. Guarantees at least one pod stays alive during any voluntary disruption. The cost: drains take slightly longer because they wait for rescheduling. The benefit: you never get paged because a routine node drain cascaded into an outage.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;PDBs don't prevent disruptions&lt;/strong&gt;. They civilize them — turning a shotgun blast into a controlled, one-at-a-time handoff. The five minutes it takes to add one is significantly less than the five hours debugging why a cluster upgrade took down production at 2 AM.*&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>podcast</category>
      <category>outage</category>
    </item>
  </channel>
</rss>
