<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: aman kohli</title>
    <description>The latest articles on DEV Community by aman kohli (@aman_kohli_6a14e8f0da3d37).</description>
    <link>https://dev.to/aman_kohli_6a14e8f0da3d37</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3832978%2Fb33d6d27-395b-4a51-97ad-5048749a4d9c.png</url>
      <title>DEV Community: aman kohli</title>
      <link>https://dev.to/aman_kohli_6a14e8f0da3d37</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aman_kohli_6a14e8f0da3d37"/>
    <language>en</language>
    <item>
      <title>Argo Rollouts: Stop Gambling with Kubernetes Deployments</title>
      <dc:creator>aman kohli</dc:creator>
      <pubDate>Sun, 22 Mar 2026 10:18:30 +0000</pubDate>
      <link>https://dev.to/aman_kohli_6a14e8f0da3d37/argo-rollouts-stop-gambling-with-kubernetes-deployments-ea</link>
      <guid>https://dev.to/aman_kohli_6a14e8f0da3d37/argo-rollouts-stop-gambling-with-kubernetes-deployments-ea</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fln9mx9v3kv7t2yz1xmrl.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fln9mx9v3kv7t2yz1xmrl.webp" alt="Argocd Rollout"&gt;&lt;/a&gt;&lt;br&gt;
Kubernetes is the de facto standard for running containerized workloads at scale.&lt;/p&gt;

&lt;p&gt;But when it comes to &lt;strong&gt;deploying safely&lt;/strong&gt;, its default approach is surprisingly limited.&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Readiness probes&lt;/li&gt;
&lt;li&gt;Rolling updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that's about it.&lt;/p&gt;

&lt;p&gt;For production systems, that's not enough.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Rolling updates don't control risk — they just distribute it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's exactly the gap &lt;strong&gt;Argo Rollouts&lt;/strong&gt; is designed to solve.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Problem with Rolling Updates
&lt;/h2&gt;

&lt;p&gt;Kubernetes rolling updates provide a basic safety net:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pods are replaced gradually&lt;/li&gt;
&lt;li&gt;Health checks ensure they're alive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But they &lt;strong&gt;don't give you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control over &lt;em&gt;who sees the new version&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Visibility into real production impact&lt;/li&gt;
&lt;li&gt;Metric-based validation before proceeding&lt;/li&gt;
&lt;li&gt;Automatic rollback on failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So most deployments still look like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Deploy → Wait → Hope nothing breaks&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And when things go wrong — they affect &lt;strong&gt;everyone at once&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Argo Rollouts Actually Does
&lt;/h2&gt;

&lt;p&gt;Argo Rollouts brings &lt;strong&gt;progressive delivery&lt;/strong&gt; to Kubernetes.&lt;/p&gt;

&lt;p&gt;Instead of pushing changes globally, it lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gradually expose changes to a subset of users&lt;/li&gt;
&lt;li&gt;Measure real-world impact using your existing metrics&lt;/li&gt;
&lt;li&gt;Automatically decide whether to proceed or roll back&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It introduces a new custom resource:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rollout&lt;/strong&gt; — a drop-in replacement for &lt;code&gt;Deployment&lt;/code&gt; with progressive delivery built in&lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Note:&lt;/strong&gt; Argo Rollouts does not interfere with existing &lt;code&gt;Deployment&lt;/code&gt; resources. It only acts on &lt;code&gt;Rollout&lt;/code&gt; objects — so you can introduce it incrementally, one service at a time.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  How Argo Rollouts Works (Under the Hood)
&lt;/h2&gt;

&lt;p&gt;It's not a single tool — it's a system of components working together.&lt;/p&gt;
&lt;h3&gt;
  
  
  Rollout (CRD)
&lt;/h3&gt;

&lt;p&gt;The core resource. Defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strategy (blue-green or canary)&lt;/li&gt;
&lt;li&gt;Step-by-step rollout logic&lt;/li&gt;
&lt;li&gt;Analysis and traffic rules&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Controller
&lt;/h3&gt;

&lt;p&gt;The brain of the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Watches &lt;code&gt;Rollout&lt;/code&gt; changes&lt;/li&gt;
&lt;li&gt;Creates and manages &lt;code&gt;ReplicaSets&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Progresses or aborts deployments&lt;/li&gt;
&lt;li&gt;Ignores standard &lt;code&gt;Deployment&lt;/code&gt; objects completely&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  ReplicaSets
&lt;/h3&gt;

&lt;p&gt;Managed automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Old version → scaled down&lt;/li&gt;
&lt;li&gt;New version → scaled up&lt;/li&gt;
&lt;li&gt;You never touch these directly&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Services &amp;amp; Ingress
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Control traffic routing between versions&lt;/li&gt;
&lt;li&gt;Enable canary and blue-green switching&lt;/li&gt;
&lt;li&gt;Integrate with service meshes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  AnalysisTemplate
&lt;/h3&gt;

&lt;p&gt;Defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What metrics to check&lt;/li&gt;
&lt;li&gt;How often to check them&lt;/li&gt;
&lt;li&gt;What counts as success or failure&lt;/li&gt;
&lt;li&gt;Reusable across multiple rollouts&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  AnalysisRun
&lt;/h3&gt;

&lt;p&gt;The live execution of those checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Success&lt;/strong&gt; → rollout continues&lt;/li&gt;
&lt;li&gt;❌ &lt;strong&gt;Failure&lt;/strong&gt; → automatic rollback&lt;/li&gt;
&lt;li&gt;⏸ &lt;strong&gt;Inconclusive&lt;/strong&gt; → rollout pauses for human judgement&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Experiment
&lt;/h3&gt;

&lt;p&gt;Run stable and canary versions side-by-side under identical traffic. This gives you true A/B testing in production — no timing bias, no guesswork.&lt;/p&gt;


&lt;h2&gt;
  
  
  Progressive Delivery in Plain Language
&lt;/h2&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Deploy → Hope&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You move to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Deploy → Observe → Decide&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You define the signals that indicate a healthy deploy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP success rate&lt;/li&gt;
&lt;li&gt;Error rate&lt;/li&gt;
&lt;li&gt;Latency P99&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the system enforces them automatically. If metrics degrade, the rollout stops. No pager. No incident call. No scrambling.&lt;/p&gt;


&lt;h2&gt;
  
  
  🔵 Blue-Green vs 🟡 Canary: Which Should You Use?
&lt;/h2&gt;

&lt;p&gt;This is where most teams overcomplicate things.&lt;/p&gt;
&lt;h3&gt;
  
  
  Blue-Green
&lt;/h3&gt;

&lt;p&gt;Two versions run simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Old&lt;/strong&gt; → serves 100% of live traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New&lt;/strong&gt; → idle, being tested via a preview service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you're satisfied, traffic switches instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Only one version is ever active — no version-conflict issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Shared databases, queue workers, legacy applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; The switch is all-or-nothing. No gradual exposure.&lt;/p&gt;


&lt;h3&gt;
  
  
  Canary
&lt;/h3&gt;

&lt;p&gt;Traffic is gradually shifted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5% → 25% → 50% → 100%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At each step, metrics are evaluated. The rollout proceeds, pauses, or aborts based on what it sees.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Limits the blast radius of failures to a small percentage of users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; A traffic routing layer (Istio, NGINX, etc.) and an app that can safely run two versions simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; More complexity, but far lower risk per release.&lt;/p&gt;




&lt;h3&gt;
  
  
  Quick Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Blue-Green&lt;/th&gt;
&lt;th&gt;Canary&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium–High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk (blast radius)&lt;/td&gt;
&lt;td&gt;High (instant switch)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Traffic control&lt;/td&gt;
&lt;td&gt;0% or 100%&lt;/td&gt;
&lt;td&gt;Gradual %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works with legacy systems&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ Often no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works with queue workers&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Requires traffic manager&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Usually&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; Start with blue-green. No extra infrastructure, works everywhere, immediate improvement over rolling updates. Evolve to canary once you trust your metrics and your system supports dual versions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Traffic Management: The Layer Kubernetes Is Missing
&lt;/h2&gt;

&lt;p&gt;Native Kubernetes &lt;code&gt;Services&lt;/code&gt; can only route traffic based on pod selectors. They can't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split traffic by exact percentage&lt;/li&gt;
&lt;li&gt;Route based on HTTP headers&lt;/li&gt;
&lt;li&gt;Mirror traffic silently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's where service meshes come in. Argo Rollouts integrates with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Istio&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NGINX Ingress&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS ALB&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Traefik&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambassador, Kong, Apache APISIX, SMI, Google Cloud&lt;/strong&gt; and more&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Three Advanced Routing Techniques
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Percentage-Based Routing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The baseline. Route N% to canary, the rest to stable. Works with all providers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;90% → stable service
10% → canary service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Header-Based Routing&lt;/strong&gt; &lt;em&gt;(Istio only)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Route internal users, QA teams, or beta testers to the new version based on a custom HTTP header — regardless of the overall traffic percentage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setHeaderRoute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal-test"&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;headerName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;X-Canary-User&lt;/span&gt;
      &lt;span class="na"&gt;headerValue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;exact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Traffic Mirroring&lt;/strong&gt; &lt;em&gt;(Istio only)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Copy real production traffic to the canary silently. Users always see the stable response — the canary's response is discarded. This lets you validate the new version under real load with &lt;strong&gt;zero user impact&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setMirrorRoute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mirror-route&lt;/span&gt;
    &lt;span class="na"&gt;percentage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;35&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;exact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GET&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 Mirroring is one of the safest ways to validate changes before exposing them to any users.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Automated Analysis: Where the Real Power Is
&lt;/h2&gt;

&lt;p&gt;This is what separates Argo Rollouts from basic deployment tools.&lt;/p&gt;

&lt;p&gt;You define rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;successCondition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;result[0] &amp;gt;= &lt;/span&gt;&lt;span class="m"&gt;0.95&lt;/span&gt;   &lt;span class="c1"&gt;# success rate &amp;gt;= 95%&lt;/span&gt;
&lt;span class="na"&gt;failureLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;                        &lt;span class="c1"&gt;# abort after 3 failures&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Argo will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continue&lt;/strong&gt; the rollout if metrics are healthy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback automatically&lt;/strong&gt; if metrics fail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pause&lt;/strong&gt; if the picture is unclear&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No manual intervention needed.&lt;/p&gt;




&lt;h3&gt;
  
  
  Types of Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Background Analysis&lt;/strong&gt; — runs continuously during the canary steps. Fails at any point → rollout aborts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inline Analysis&lt;/strong&gt; — a blocking step in your rollout sequence. Rollout waits until this completes before proceeding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blue-Green Pre-Promotion&lt;/strong&gt; — validates the new version &lt;em&gt;before&lt;/em&gt; traffic switches. Fails → traffic never switches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blue-Green Post-Promotion&lt;/strong&gt; — validates the new version &lt;em&gt;after&lt;/em&gt; traffic switches. Fails → traffic switches back automatically.&lt;/p&gt;




&lt;h3&gt;
  
  
  Full Example: Background Analysis with Prometheus
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Rollout&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rollout&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;canary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;templates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;templateName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
        &lt;span class="na"&gt;startingStep&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;   &lt;span class="c1"&gt;# don't start analysis until 40% traffic&lt;/span&gt;
        &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-name&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-service.default.svc.cluster.local&lt;/span&gt;
      &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;10m&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;10m&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;10m&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;10m&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# AnalysisTemplate&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AnalysisTemplate&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-name&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
    &lt;span class="na"&gt;successCondition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;result[0] &amp;gt;= &lt;/span&gt;&lt;span class="m"&gt;0.95&lt;/span&gt;
    &lt;span class="na"&gt;failureLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;prometheus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus.example.com:9090&lt;/span&gt;
        &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;sum(irate(&lt;/span&gt;
            &lt;span class="s"&gt;istio_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;destination_service=~"{{args.service-name}}",&lt;/span&gt;
              &lt;span class="s"&gt;response_code!~"5.*"&lt;/span&gt;
            &lt;span class="s"&gt;}[5m]&lt;/span&gt;
          &lt;span class="s"&gt;)) /&lt;/span&gt;
          &lt;span class="s"&gt;sum(irate(&lt;/span&gt;
            &lt;span class="s"&gt;istio_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;destination_service=~"{{args.service-name}}"&lt;/span&gt;
            &lt;span class="s"&gt;}[5m]&lt;/span&gt;
          &lt;span class="s"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If success rate drops below 95% in any &lt;strong&gt;three&lt;/strong&gt; consecutive 5-minute windows → rollout aborts, canary weight resets to zero, old version continues serving 100% of traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supported Metric Providers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;Most common, full query support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;default()&lt;/code&gt; to handle nil results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Relic&lt;/td&gt;
&lt;td&gt;NRQL queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch&lt;/td&gt;
&lt;td&gt;AWS-native metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wavefront&lt;/td&gt;
&lt;td&gt;Tanzu environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graphite&lt;/td&gt;
&lt;td&gt;Self-hosted metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;InfluxDB&lt;/td&gt;
&lt;td&gt;Time-series metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kayenta&lt;/td&gt;
&lt;td&gt;Canary analysis from Spinnaker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web (HTTP)&lt;/td&gt;
&lt;td&gt;Custom webhook endpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes Jobs&lt;/td&gt;
&lt;td&gt;Run arbitrary analysis as a Job&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Deploying with Helm: A Real Walkthrough
&lt;/h2&gt;

&lt;p&gt;In real systems you won't apply raw YAML manually. Here's a complete Helm-based setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install the Controller
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create namespace argo-rollouts
kubectl apply &lt;span class="nt"&gt;-n&lt;/span&gt; argo-rollouts &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Install the kubectl Plugin
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;argoproj/tap/kubectl-argo-rollouts

&lt;span class="c"&gt;# Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-LO&lt;/span&gt; https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x kubectl-argo-rollouts-linux-amd64
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Scaffold a Chart
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm create my-app
&lt;span class="nb"&gt;cd &lt;/span&gt;my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Delete &lt;code&gt;templates/deployment.yaml&lt;/code&gt; and create &lt;code&gt;templates/rollout.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Blue-Green Rollout Template
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# templates/rollout.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rollout&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.name" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.name" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.Values.image.repository&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}:{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.Values.image.tag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;blueGreen&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;activeService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-active&lt;/span&gt;
      &lt;span class="na"&gt;previewService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-preview&lt;/span&gt;
      &lt;span class="na"&gt;autoPromotionEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Define Two Services
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# templates/service-active.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-active&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.name" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# templates/service-preview.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.fullname" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;-preview&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include "my-app.name" .&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Deploy and Upgrade
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Initial deploy&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;my-app ./my-app

&lt;span class="c"&gt;# Deploy new version&lt;/span&gt;
helm upgrade my-app ./my-app &lt;span class="nt"&gt;--set&lt;/span&gt; image.tag&lt;span class="o"&gt;=&lt;/span&gt;v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Argo Rollouts deploys v2 to the &lt;strong&gt;preview&lt;/strong&gt; service. Production traffic stays on v1 untouched.&lt;/p&gt;

&lt;p&gt;Test against &lt;code&gt;my-app-preview&lt;/code&gt;. When satisfied:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts promote my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traffic switches to v2 instantly.&lt;/p&gt;




&lt;h3&gt;
  
  
  Graduating to Canary
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;canary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;60s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traffic shifts 20% → 50% → 100% with a pause at each stage for observation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Metric-Driven Analysis
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# templates/analysis-template.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AnalysisTemplate&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;
      &lt;span class="na"&gt;successCondition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;result[0] &amp;gt;= &lt;/span&gt;&lt;span class="m"&gt;0.95&lt;/span&gt;
      &lt;span class="na"&gt;failureLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;prometheus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus:9090&lt;/span&gt;
          &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;sum(rate(http_requests_total{status!~"5.."}[1m])) /&lt;/span&gt;
            &lt;span class="s"&gt;sum(rate(http_requests_total[1m]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Attach it to your rollout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;canary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;templates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;templateName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;1m&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if success rate drops below 95% → rollout aborts automatically. No human needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Rolling updates are enough"
&lt;/h3&gt;

&lt;p&gt;They're not. You're still guessing. A bad deploy reaches all your users before you know it's bad.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Canary is always better"
&lt;/h3&gt;

&lt;p&gt;Not if your system can't handle two versions running simultaneously. Shared databases, queue workers, and locked resources all rule it out. Start with blue-green.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Metrics are optional"
&lt;/h3&gt;

&lt;p&gt;They're not optional — they're the entire decision-making system. Without metrics, Argo Rollouts is just a more complicated way to do a rolling update.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Start with the full setup"
&lt;/h3&gt;

&lt;p&gt;Don't introduce a traffic provider, metric analysis, header routing, and canary all at once. You'll be debugging the infrastructure instead of your application. Layer in complexity one step at a time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recommended Adoption Path
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What to add&lt;/th&gt;
&lt;th&gt;What you gain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Blue-green + Helm&lt;/td&gt;
&lt;td&gt;Instant rollback, zero infrastructure change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Manual promotion gates&lt;/td&gt;
&lt;td&gt;Human review before traffic switches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Metric-based analysis&lt;/td&gt;
&lt;td&gt;Automatic rollback on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Traffic provider (Istio/NGINX)&lt;/td&gt;
&lt;td&gt;Exact percentage splits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Canary strategy&lt;/td&gt;
&lt;td&gt;Low blast radius per release&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Header routing + mirroring&lt;/td&gt;
&lt;td&gt;Zero-impact production validation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Stop at whichever step still feels worth the complexity for your team. Step 3 alone is a meaningful improvement for most organisations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;The shift Argo Rollouts enables isn't just technical. It's this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You're no longer deploying code. You're managing risk.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Argo Rollouts gives you control, visibility, and automation. But it doesn't define "safe" for you — that's still your responsibility. You set the thresholds. You define what success looks like. You decide how aggressive your rollout steps are.&lt;/p&gt;

&lt;p&gt;The system executes your definition of safe, automatically, at every deploy.&lt;/p&gt;

&lt;p&gt;When rollbacks are instant, the cost of a bad deploy drops dramatically. When canaries give you real signal before full exposure, the fear that slows release cadence starts to lift. Teams that once deployed weekly because deployments were scary start deploying daily — not because they became less careful, but because the system became more forgiving.&lt;/p&gt;

&lt;p&gt;That's the real promise of progressive delivery.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cicd</category>
      <category>docker</category>
      <category>argocd</category>
    </item>
    <item>
      <title>Cilium &amp; Hubble: How eBPF Is Replacing kube-proxy in Modern Kubernetes</title>
      <dc:creator>aman kohli</dc:creator>
      <pubDate>Thu, 19 Mar 2026 02:37:56 +0000</pubDate>
      <link>https://dev.to/aman_kohli_6a14e8f0da3d37/cilium-hubble-how-ebpf-is-replacing-kube-proxy-in-modern-kubernetes-ha4</link>
      <guid>https://dev.to/aman_kohli_6a14e8f0da3d37/cilium-hubble-how-ebpf-is-replacing-kube-proxy-in-modern-kubernetes-ha4</guid>
      <description>&lt;h2&gt;
  
  
  Cilium &amp;amp; Hubble: The eBPF-Powered Nervous System of Your Kubernetes Cluster
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;How modern Kubernetes networking is moving into the kernel — and why it matters for performance, security, and observability.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2f2a1l7yvl7vmy8p470.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2f2a1l7yvl7vmy8p470.webp" alt="Architecture Diagram" width="800" height="688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;If your Kubernetes cluster is still relying on iptables and kube-proxy, you’re paying a hidden tax — in latency, scalability, and visibility.&lt;/p&gt;

&lt;p&gt;Cilium replaces that entire model with eBPF, moving networking, security, and observability directly into the Linux kernel. Paired with Hubble, it gives you real-time insight into every packet flowing through your cluster — without sidecars, instrumentation, or application changes.&lt;/p&gt;

&lt;p&gt;This isn’t just an incremental improvement. It’s a fundamental shift in how Kubernetes networking works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Teams Are Switching to Cilium
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Performance at Scale
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Eliminates iptables rule traversal&lt;/li&gt;
&lt;li&gt;Kernel-level packet processing reduces latency&lt;/li&gt;
&lt;li&gt;Constant-time lookups even with thousands of services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Built-in Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Native L3/L4 policy enforcement&lt;/li&gt;
&lt;li&gt;Identity-aware security (labels instead of IPs)&lt;/li&gt;
&lt;li&gt;No reliance on perimeter-based security&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deep Observability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Real-time, per-flow visibility via Hubble&lt;/li&gt;
&lt;li&gt;No sidecars or instrumentation required&lt;/li&gt;
&lt;li&gt;Instant insight into drops, retries, and service dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Simplicity &amp;amp; Efficiency
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No kube-proxy required&lt;/li&gt;
&lt;li&gt;Reduced moving parts (no iptables tuning)&lt;/li&gt;
&lt;li&gt;Lower CPU and memory overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Future-Proof Architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;eBPF is becoming a Linux standard&lt;/li&gt;
&lt;li&gt;Automatically adapts to kernel capabilities&lt;/li&gt;
&lt;li&gt;Aligns with modern cloud-native trends&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Big Picture: How Cilium Works
&lt;/h2&gt;

&lt;p&gt;At a high level, Cilium acts as a programmable data plane powered by eBPF:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic is intercepted at the Linux kernel level&lt;/li&gt;
&lt;li&gt;Policies and load balancing are applied inline&lt;/li&gt;
&lt;li&gt;Observability data is emitted directly from the kernel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This eliminates the need for iptables-based rule processing and drastically reduces overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes Networking (Quick Refresher)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Container ↔ Container&lt;/strong&gt; → Same pod (localhost)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod ↔ Pod&lt;/strong&gt; → Routed via CNI using veth pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod ↔ Service&lt;/strong&gt; → kube-proxy rewrites traffic using iptables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ingress/Egress&lt;/strong&gt; → Load balancers + NAT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model works — but it doesn’t scale efficiently.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The problem isn’t correctness. It’s that iptables scales linearly, while modern clusters scale exponentially.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Cilium Architecture Explained
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Cilium Agent (Per Node Brain)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Runs as a DaemonSet&lt;/li&gt;
&lt;li&gt;Watches Kubernetes API&lt;/li&gt;
&lt;li&gt;Compiles policies into eBPF programs&lt;/li&gt;
&lt;li&gt;Enforces networking rules at packet level&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Cilium Operator (Cluster Coordinator)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Handles IPAM and cluster-wide state&lt;/li&gt;
&lt;li&gt;Not in the data path&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Cilium CNI (Pod On-Ramp)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Configures networking when pods start/stop&lt;/li&gt;
&lt;li&gt;Attaches eBPF programs to pod interfaces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmn95aarce81jf0n2m5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmn95aarce81jf0n2m5i.png" alt="Observability Architecture Diagram" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hubble: Observability Without Instrumentation
&lt;/h2&gt;

&lt;p&gt;Hubble builds on top of Cilium and provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time flow visibility&lt;/li&gt;
&lt;li&gt;Service-to-service communication mapping&lt;/li&gt;
&lt;li&gt;Drop/forward decisions at packet level&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hubble Relay&lt;/strong&gt; → Aggregates data across nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hubble UI&lt;/strong&gt; → Visual service graph&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Extended Observability Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VictoriaMetrics&lt;/strong&gt; → Stores time-series metrics from Cilium and Hubble&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana&lt;/strong&gt; → Visualizes metrics via dashboards and alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;You get deep observability without sidecars, agents, or code changes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why eBPF Changes Everything
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Traditional Networking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Relies on iptables rules&lt;/li&gt;
&lt;li&gt;Requires walking large rule chains&lt;/li&gt;
&lt;li&gt;Adds latency as rules grow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  eBPF Approach
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Runs JIT-compiled programs in the kernel&lt;/li&gt;
&lt;li&gt;Executes logic at packet hook points&lt;/li&gt;
&lt;li&gt;Avoids user-space transitions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mental Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iptables → "Which rule matches this packet?"
eBPF    → "I already know what to do."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Replacing kube-proxy with Cilium
&lt;/h2&gt;

&lt;p&gt;Cilium can fully replace kube-proxy by handling service routing in eBPF.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9zvi6rjc9c8vi3snsoy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9zvi6rjc9c8vi3snsoy.png" alt="Difference between KubeProxy and Cilium" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Differences
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Rule-based vs kernel-native routing&lt;/li&gt;
&lt;li&gt;Linear scaling vs constant-time lookups&lt;/li&gt;
&lt;li&gt;Context switching vs direct execution&lt;/li&gt;
&lt;li&gt;Limited vs per-flow observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enable kube-proxy replacement
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kubeProxyReplacement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚠️ Requires modern kernels (5.x+) and proper validation in production setups.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Enforcing a Network Policy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Goal
&lt;/h3&gt;

&lt;p&gt;Allow only the &lt;code&gt;checkout&lt;/code&gt; service to access &lt;code&gt;payments&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Apply Policy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cilium.io/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allow-checkout-to-payments&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpointSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fromEndpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;checkout&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; allow-checkout-to-payments.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Observe with Hubble
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hubble observe &lt;span class="nt"&gt;--namespace&lt;/span&gt; payments &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;checkout → payments  FORWARDED
frontend → payments  DROPPED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Enforcement happens at the kernel level — and is instantly observable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Cilium vs Istio: Do You Need Both?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Cilium replaces infrastructure. Istio augments application behavior.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Cilium
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;L3/L4 networking and security&lt;/li&gt;
&lt;li&gt;Kernel-native performance&lt;/li&gt;
&lt;li&gt;No sidecars&lt;/li&gt;
&lt;li&gt;Built-in observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Istio
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;L7 traffic management (retries, routing)&lt;/li&gt;
&lt;li&gt;mTLS between services&lt;/li&gt;
&lt;li&gt;Sidecar-based architecture&lt;/li&gt;
&lt;li&gt;Higher overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rule of Thumb
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Cilium by default&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;Istio only if you need advanced L7 features&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;Cilium isn’t just another CNI plugin — it’s a shift in how Kubernetes networking operates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User space → Kernel space&lt;/strong&gt;&lt;br&gt;
Networking and policy enforcement move into eBPF programs in the kernel&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Static rules → Programmable logic&lt;/strong&gt;&lt;br&gt;
The data plane becomes dynamic and adaptable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coarse metrics → Per-flow visibility&lt;/strong&gt;&lt;br&gt;
Every connection, drop, and retry is observable in real time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CRDs → etcd at scale&lt;/strong&gt;&lt;br&gt;
CRDs work for most clusters; etcd helps at massive scale&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;If Kubernetes was the control plane revolution, eBPF is quietly becoming the data plane revolution.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>kubernetes</category>
      <category>monitoring</category>
      <category>networking</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
