<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: saiyam1814</title>
    <description>The latest articles on DEV Community by saiyam1814 (@saiyam1814).</description>
    <link>https://dev.to/saiyam1814</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F225861%2F37ded022-03b0-4576-a65e-efd05897799d.jpeg</url>
      <title>DEV Community: saiyam1814</title>
      <link>https://dev.to/saiyam1814</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saiyam1814"/>
    <language>en</language>
    <item>
      <title>A Kubeconfig for GKE That Doesn't Need gcloud</title>
      <dc:creator>saiyam1814</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:23:30 +0000</pubDate>
      <link>https://dev.to/saiyam1814/a-kubeconfig-for-gke-that-doesnt-need-gcloud-5b8m</link>
      <guid>https://dev.to/saiyam1814/a-kubeconfig-for-gke-that-doesnt-need-gcloud-5b8m</guid>
      <description>&lt;p&gt;When you run &lt;code&gt;gcloud container clusters get-credentials&lt;/code&gt;, the kubeconfig it writes looks innocent — until you hand it to a teammate and they hit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1beta1"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…or the classic &lt;code&gt;gke-gcloud-auth-plugin: executable not found&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That's because the generated kubeconfig doesn't actually contain a credential. It contains an &lt;code&gt;exec:&lt;/code&gt; block that shells out to &lt;code&gt;gke-gcloud-auth-plugin&lt;/code&gt;, which in turn calls &lt;code&gt;gcloud&lt;/code&gt; to mint a fresh OAuth token on every kubectl call. If you look at the &lt;code&gt;users&lt;/code&gt; section of a stock GKE kubeconfig, this is what's in there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke_saiyam-project_us-east1-b_demo-test&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;client.authentication.k8s.io/v1beta1&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gke-gcloud-auth-plugin&lt;/span&gt;
      &lt;span class="na"&gt;installHint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install gke-gcloud-auth-plugin for use with kubectl by following&lt;/span&gt;
        &lt;span class="s"&gt;https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_plugin&lt;/span&gt;
      &lt;span class="na"&gt;interactiveMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IfAvailable&lt;/span&gt;
      &lt;span class="na"&gt;provideClusterInfo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No token. No cert. Just "run this plugin and ask it for auth." No gcloud on the machine, no access.&lt;/p&gt;

&lt;p&gt;If you want a kubeconfig that &lt;em&gt;anyone&lt;/em&gt; can use — a CI runner, a contractor's laptop, a script on a VM — you need to swap that exec-plugin auth for something self-contained. The cleanest answer: a Kubernetes ServiceAccount and a bearer token.&lt;/p&gt;

&lt;p&gt;Here's the full flow, run end-to-end against a live GKE cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mental model
&lt;/h2&gt;

&lt;p&gt;Four pieces, in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identity&lt;/strong&gt; — a ServiceAccount in the cluster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permissions&lt;/strong&gt; — a (Cluster)RoleBinding attaching a role to that SA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential&lt;/strong&gt; — a token the SA can present to the API server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portable config&lt;/strong&gt; — a kubeconfig file wrapping the token + cluster endpoint + CA cert&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The API server validates the token itself. No Google, no gcloud, no OAuth round-trip.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Identity and permissions
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create serviceaccount shared-access &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system

kubectl create clusterrolebinding shared-access-binding &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--clusterrole&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cluster-admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--serviceaccount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kube-system:shared-access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;serviceaccount/shared-access created
clusterrolebinding.rbac.authorization.k8s.io/shared-access-binding created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things worth calling out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The SA lives in &lt;code&gt;kube-system&lt;/code&gt; because it's a cluster-wide utility identity. The namespace doesn't restrict its access — RBAC does.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cluster-admin&lt;/code&gt; is &lt;code&gt;*&lt;/code&gt; on &lt;code&gt;*&lt;/code&gt;. Scope it down in production. &lt;code&gt;view&lt;/code&gt;, &lt;code&gt;edit&lt;/code&gt;, or a custom ClusterRole are usually what you actually want. If you only need namespace-scoped access, use a &lt;code&gt;RoleBinding&lt;/code&gt; in that namespace instead of a &lt;code&gt;ClusterRoleBinding&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Mint a long-lived token
&lt;/h2&gt;

&lt;p&gt;Before Kubernetes 1.24, creating a ServiceAccount automatically created a companion Secret with a non-expiring token. That was removed — long-lived bearer tokens are a security footgun — so now you opt in explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; - &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
apiVersion: v1
kind: Secret
metadata:
  name: shared-access-token
  namespace: kube-system
  annotations:
    kubernetes.io/service-account.name: shared-access
type: kubernetes.io/service-account-token
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;secret/shared-access-token created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The magic is in two fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;type: kubernetes.io/service-account-token&lt;/code&gt;&lt;/strong&gt; — tells the token controller (built into &lt;code&gt;kube-controller-manager&lt;/code&gt;) "I'm a Secret you should populate."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kubernetes.io/service-account.name&lt;/code&gt; annotation&lt;/strong&gt; — tells it &lt;em&gt;which&lt;/em&gt; ServiceAccount's identity to embed in the token.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wait a couple of seconds, then inspect the Secret — the controller has filled in the data for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get secret shared-access-token &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ca.crt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVMVENDQXBXZ0F3SUJB...&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;a3ViZS1zeXN0ZW0=&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklrWnNZMkk0VFRkWmFrVjN...&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kubernetes.io/service-account.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shared-access&lt;/span&gt;
    &lt;span class="na"&gt;kubernetes.io/service-account.uid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;9e8d4bdb-46ea-4893-9306-d56bea6aa304&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shared-access-token&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-system&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/service-account-token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three fields got populated by the controller:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.data.token&lt;/code&gt; — a signed JWT, the actual bearer credential&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.data.ca.crt&lt;/code&gt; — the cluster's CA certificate (so your client can trust the API server's TLS)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.data.namespace&lt;/code&gt; — the SA's namespace&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you'd rather have a short-lived token, skip the Secret and run &lt;code&gt;kubectl create token shared-access -n kube-system --duration=24h&lt;/code&gt;. Good for automation that rotates. Bad for a "hand someone a file" use case, which is what we're doing here.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 3: Extract the three things a kubeconfig needs
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;SERVER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl config view &lt;span class="nt"&gt;--minify&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.clusters[0].cluster.server}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;CA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get secret shared-access-token &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.data.ca\.crt}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get secret shared-access-token &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.data.token}'&lt;/span&gt; | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"SERVER = &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"CA     = &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CA&lt;/span&gt;:0:60&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"TOKEN  = &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;:0:40&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;SERVER&lt;/span&gt; = &lt;span class="n"&gt;https&lt;/span&gt;://&lt;span class="m"&gt;35&lt;/span&gt;.&lt;span class="m"&gt;196&lt;/span&gt;.&lt;span class="m"&gt;129&lt;/span&gt;.&lt;span class="m"&gt;174&lt;/span&gt;
&lt;span class="n"&gt;CA&lt;/span&gt;     = &lt;span class="n"&gt;LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0WERQWERk1JSUVMVENDQXBXZ0F3SUJB&lt;/span&gt;...
&lt;span class="n"&gt;TOKEN&lt;/span&gt;  = &lt;span class="n"&gt;eyJhbGciOiJSUzIsImtpZCI6IkZsY2I4TTdZ&lt;/span&gt;...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SERVER&lt;/code&gt; — the GKE API endpoint, pulled straight from your current context&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CA&lt;/code&gt; — already base64, drops straight into the kubeconfig as-is&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TOKEN&lt;/code&gt; — we decode it because kubeconfig wants the raw JWT string, not base64&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Assemble the kubeconfig
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/shared-kubeconfig.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: v1
kind: Config
clusters:
- name: cluster-1
  cluster:
    server: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
    certificate-authority-data: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CA&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
contexts:
- name: cluster-1
  context:
    cluster: cluster-1
    user: shared-access
current-context: cluster-1
users:
- name: shared-access
  user:
    token: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A kubeconfig is three independent lists — &lt;code&gt;clusters&lt;/code&gt;, &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;contexts&lt;/code&gt; — glued together by a &lt;code&gt;context&lt;/code&gt; that names one cluster + one user. Nothing more.&lt;/p&gt;

&lt;p&gt;Notice what's &lt;em&gt;not&lt;/em&gt; in the &lt;code&gt;users&lt;/code&gt; block: no &lt;code&gt;auth-provider&lt;/code&gt;, no &lt;code&gt;exec&lt;/code&gt;. kubectl has nothing to shell out to. It just sends &lt;code&gt;Authorization: Bearer &amp;lt;token&amp;gt;&lt;/code&gt; on every request and the API server validates the JWT.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Prove it works without gcloud
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;KUBECONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/shared-kubeconfig.yaml kubectl get nodes
&lt;span class="nv"&gt;KUBECONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/shared-kubeconfig.yaml kubectl auth &lt;span class="nb"&gt;whoami
&lt;/span&gt;&lt;span class="nv"&gt;KUBECONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/shared-kubeconfig.yaml kubectl auth can-i &lt;span class="s1"&gt;'*'&lt;/span&gt; &lt;span class="s1"&gt;'*'&lt;/span&gt; &lt;span class="nt"&gt;--all-namespaces&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME                                       STATUS   ROLES    AGE   VERSION
gke-demo-test-default-pool-a5aaa3f4-jcnk   Ready    &amp;lt;none&amp;gt;   18h   v1.35.1-gke.1396002

ATTRIBUTE   VALUE
Username    system:serviceaccount:kube-system:shared-access
UID         9e8d4bdb-46ea-4893-9306-d56bea6aa304
Groups      [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated]

yes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole proof. The API server sees &lt;code&gt;system:serviceaccount:kube-system:shared-access&lt;/code&gt;, not your Google identity. You can put this file on a machine that has never seen &lt;code&gt;gcloud&lt;/code&gt; in its life, and it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to know before you ship this
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Private clusters still need network reachability.&lt;/strong&gt; The kubeconfig removes the auth dependency, not the network one. If your control plane is private, the recipient still needs VPN, authorized networks, or a public endpoint. The token won't help if they can't reach the API server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The kubeconfig is a credential.&lt;/strong&gt; Anyone with the file has whatever RBAC you bound. Store it like you'd store an SSH key or an API token. Don't commit it to Git.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revocation is deletion.&lt;/strong&gt; To kill access, delete the Secret:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete secret shared-access-token &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To kill it harder, also delete the binding and the SA. There's no "rotate" — you mint a new Secret and redistribute the new kubeconfig.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope down.&lt;/strong&gt; &lt;code&gt;cluster-admin&lt;/code&gt; is the demo default, not the production default. A &lt;code&gt;RoleBinding&lt;/code&gt; to &lt;code&gt;edit&lt;/code&gt; in a single namespace is usually closer to what a real sharing use case needs. &lt;code&gt;ClusterRoleBinding&lt;/code&gt; + &lt;code&gt;cluster-admin&lt;/code&gt; only when you truly mean it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap
&lt;/h2&gt;

&lt;p&gt;The trick isn't really about GKE — it's about understanding what a kubeconfig &lt;em&gt;is&lt;/em&gt;. Once you see it as a glue file between a cluster endpoint and any credential the API server will accept, the exec-plugin auth stops feeling magical and the bearer-token swap becomes obvious.&lt;/p&gt;

&lt;p&gt;Same approach works for EKS (where the plugin is &lt;code&gt;aws-iam-authenticator&lt;/code&gt; / &lt;code&gt;aws eks get-token&lt;/code&gt;), AKS (&lt;code&gt;kubelogin&lt;/code&gt;), and anything else that ships exec-based auth. Replace the &lt;code&gt;user:&lt;/code&gt; block, keep the &lt;code&gt;cluster:&lt;/code&gt; block, and you've got a kubeconfig that travels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpafjnzrsj2637fd2i4t5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpafjnzrsj2637fd2i4t5.png" alt="The swap: only the users: block changes" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>gke</category>
      <category>kubeconfig</category>
      <category>kubectl</category>
    </item>
    <item>
      <title>The Ingress NGINX Migration Just Got Easier: 119 Annotations, 3 Targets, Impact Ratings</title>
      <dc:creator>saiyam1814</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:23:28 +0000</pubDate>
      <link>https://dev.to/saiyam1814/the-ingress-nginx-migration-just-got-easier-119-annotations-3-targets-impact-ratings-27mj</link>
      <guid>https://dev.to/saiyam1814/the-ingress-nginx-migration-just-got-easier-119-annotations-3-targets-impact-ratings-27mj</guid>
      <description>&lt;p&gt;A few months ago, I built &lt;a href="https://github.com/saiyam1814/ing-switch" rel="noopener noreferrer"&gt;ing-switch&lt;/a&gt; and &lt;a href="https://blog.kubesimplify.com/ing-switch-migrate-from-ingress-nginx-to-traefik-or-gateway-api-in-minutes-not-days" rel="noopener noreferrer"&gt;wrote about it on kubesimplify&lt;/a&gt;. The response was incredible -- people loved the annotation mapping and the visual dashboard.&lt;/p&gt;

&lt;p&gt;Since then, &lt;strong&gt;ingress-nginx was officially archived&lt;/strong&gt; (March 24, 2026). March 31 is end of life -- zero security patches after that date.&lt;/p&gt;

&lt;p&gt;Based on community feedback from KubeCon, this is the biggest update yet: &lt;strong&gt;119 annotations&lt;/strong&gt; (up from 50), &lt;strong&gt;Gateway API with Traefik as the provider&lt;/strong&gt; (the #1 request), and &lt;strong&gt;impact ratings&lt;/strong&gt; on every annotation so you know exactly what matters.&lt;/p&gt;

&lt;p&gt;This post walks through a &lt;strong&gt;complete end-to-end migration&lt;/strong&gt; on a &lt;a href="https://github.com/loft-sh/vind" rel="noopener noreferrer"&gt;vind&lt;/a&gt; cluster with actual command outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Need to Migrate Now
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Nov 11, 2025:&lt;/strong&gt; Kubernetes SIG Network announces ingress-nginx retirement&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Jan 29, 2026:&lt;/strong&gt; Joint statement from Kubernetes Steering + Security Response Committees urging immediate migration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mar 24, 2026:&lt;/strong&gt; GitHub repository archived (read-only)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mar 31, 2026:&lt;/strong&gt; End of life -- zero support from this date&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chainguard maintains a fork for CVE-level fixes only -- no features, no community PRs, no pre-built images. You're on your own.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Migration Paths
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd35nbr6q84q6ltm368g8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd35nbr6q84q6ltm368g8.png" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;What Changes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Traefik v3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fastest migration, lowest friction&lt;/td&gt;
&lt;td&gt;Keep Ingress API, swap annotations to Middleware CRDs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gateway API (Envoy)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Future-proof standard&lt;/td&gt;
&lt;td&gt;Replace Ingresses with HTTPRoutes, Envoy policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gateway API (Traefik)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rancher / k3s users&lt;/td&gt;
&lt;td&gt;Standard HTTPRoutes + Gateway resources, with Traefik as the controller implementation. Advanced features (rate limiting, auth, IP filtering) use Traefik Middleware CRDs as extension policies.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Annotation Problem
&lt;/h2&gt;

&lt;p&gt;The real complexity isn't swapping controllers -- it's the &lt;strong&gt;annotations&lt;/strong&gt;. A typical production Ingress has 10-15 NGINX annotations for SSL, auth, rate limiting, CORS, session affinity, and more.&lt;/p&gt;

&lt;p&gt;ing-switch maps &lt;strong&gt;119 annotations&lt;/strong&gt; with impact ratings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traefik&lt;/th&gt;
&lt;th&gt;Gateway API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Supported (direct equivalent)&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial (needs minor adjustment)&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unsupported (with impact notes)&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every unsupported annotation gets an &lt;strong&gt;impact rating&lt;/strong&gt;: &lt;code&gt;NONE&lt;/code&gt; (safe to ignore), &lt;code&gt;LOW&lt;/code&gt; (better defaults), &lt;code&gt;MEDIUM&lt;/code&gt; (needs workaround), or &lt;code&gt;VARIES&lt;/code&gt; (review your snippets). Most teams discover &lt;strong&gt;70%+ of "unsupported" annotations are safe to ignore&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-End Demo: vCluster + ing-switch
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://asciinema.org/a/nOYDQukAC4bzdSVI" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fasciinema.org%2Fa%2FnOYDQukAC4bzdSVI.svg" alt="asciicast" width="690" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's walk through a complete migration on a real cluster. We'll use &lt;a href="https://www.vcluster.com/" rel="noopener noreferrer"&gt;vCluster&lt;/a&gt; to spin up a Kubernetes cluster in Docker, deploy 3 services with NGINX annotations, and migrate them to Gateway API with Traefik.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create a Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vcluster create demo &lt;span class="nt"&gt;--driver&lt;/span&gt; docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;info  Using vCluster driver 'docker' to create your virtual clusters
info  Ensuring environment for vCluster demo...
done  Created network vcluster.demo
info  Starting vCluster standalone demo
done  Successfully created virtual cluster demo
info  Waiting for vCluster to become ready...
done  vCluster is ready
done  Switched active kube context to vcluster-docker_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get namespaces
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME                 STATUS   AGE
default              Active   16s
kube-flannel         Active   6s
kube-node-lease      Active   16s
kube-public          Active   16s
kube-system          Active   16s
local-path-storage   Active   6s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Install Ingress NGINX
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm &lt;span class="nb"&gt;install &lt;/span&gt;ingress-nginx ingress-nginx/ingress-nginx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; ingress-nginx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; controller.service.type&lt;span class="o"&gt;=&lt;/span&gt;ClusterIP &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; controller.admissionWebhooks.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 120s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME: ingress-nginx
LAST DEPLOYED: Sun Mar 29 11:15:57 2026
NAMESPACE: ingress-nginx
STATUS: deployed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; ingress-nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME                                        READY   STATUS    RESTARTS   AGE
ingress-nginx-controller-5486dbd97f-vc9wv   1/1     Running   0          54s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Deploy 3 Apps with NGINX Annotations
&lt;/h3&gt;

&lt;p&gt;We deploy three services, each with different annotation patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;App 1 -- Basic web app&lt;/strong&gt; (SSL redirect + timeouts):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/ssl-redirect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-read-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;60"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-connect-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web.example.com&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web-app&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;App 2 -- API with CORS + rate limiting&lt;/strong&gt; (10 annotations):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-cors&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/force-ssl-redirect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/enable-cors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/cors-allow-origin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://app.example.com,https://admin.example.com"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/cors-allow-methods&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;POST,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PUT,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;DELETE,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OPTIONS"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/cors-allow-headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Authorization,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;X-API-Key"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/cors-allow-credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/cors-max-age&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;86400"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/limit-rps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/limit-burst-multiplier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/proxy-body-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5m"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api.example.com&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/v1&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-service&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;App 3 -- Auth-protected dashboard&lt;/strong&gt; (external auth + IP allowlist + session affinity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dashboard&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/ssl-redirect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/auth-url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://auth.example.com/verify"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/auth-response-headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-User-ID,X-User-Email"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/whitelist-source-range&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.0.0/8,172.16.0.0/12"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cookie"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/session-cookie-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dashboard-session"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/session-cookie-max-age&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dashboard.example.com&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dashboard&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After applying all three:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get ingress &lt;span class="nt"&gt;-n&lt;/span&gt; demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME        CLASS   HOSTS                   ADDRESS   PORTS   AGE
api-cors    nginx   api.example.com                   80      5s
dashboard   nginx   dashboard.example.com             80      5s
web-app     nginx   web.example.com                   80      5s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME                           READY   STATUS    RESTARTS   AGE
api-service-5f99b6d99d-x7vmn   1/1     Running   0          24s
dashboard-9ddbf867-7dbgf       1/1     Running   0          24s
web-app-969c76b7c-7wqw5        1/1     Running   0          24s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3 ingresses, 20 NGINX annotations, 3 services running. Now let's see what ing-switch makes of this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Scan the Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ing-switch scan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ing-switch -- Cluster Scan Results
  Cluster: vcluster-docker_demo

  Ingress Controller Detected
  Type:      ingress-nginx
  Version:   unknown
  Namespace: ingress-nginx

  Found 3 Ingress resource(s)

  NAMESPACE   NAME        HOSTS                   ANNOTATIONS   TLS   COMPLEXITY
  ---------   ----        -----                   -----------   ---   ----------
  demo        api-cors    api.example.com         10            no    unsupported
  demo        dashboard   dashboard.example.com   7             no    complex
  demo        web-app     web.example.com         3             no    complex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ing-switch detected the NGINX controller and found all 3 ingresses with their annotation counts and complexity scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Analyze Compatibility
&lt;/h3&gt;

&lt;p&gt;Let's compare all three targets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traefik v3:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ing-switch analyze &lt;span class="nt"&gt;--target&lt;/span&gt; traefik
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Summary
  -------
  Total ingresses:      3
  Fully compatible:     1
  Needs workarounds:    2
  Has unsupported:      0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gateway API (Envoy):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ing-switch analyze &lt;span class="nt"&gt;--target&lt;/span&gt; gateway-api
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Summary
  -------
  Total ingresses:      3
  Fully compatible:     0
  Needs workarounds:    3
  Has unsupported:      0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gateway API (Traefik):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ing-switch analyze &lt;span class="nt"&gt;--target&lt;/span&gt; gateway-api-traefik
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Summary
  -------
  Total ingresses:      3
  Fully compatible:     0
  Needs workarounds:    3
  Has unsupported:      0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key insight: &lt;strong&gt;Traefik is the highest-compatibility target&lt;/strong&gt; for this workload (1 fully compatible out of 3). The CORS annotations map directly to Traefik's Headers middleware. For Gateway API, CORS is now also fully supported thanks to the native CORS filter in Gateway API v1.5.&lt;/p&gt;

&lt;p&gt;Here's the detailed annotation mapping for the API with CORS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  demo/api-cors
  -------------
  ANNOTATION               STATUS        TARGET RESOURCE                    NOTES
  enable-cors              [supported]   HTTPRoute (CORS filter)            Native CORS filter (GA in Gateway API v1.5)
  cors-allow-origin        [supported]   HTTPRoute (CORS filter)            allowOrigins in CORS filter
  cors-allow-methods       [supported]   HTTPRoute (CORS filter)            allowMethods in CORS filter
  cors-allow-headers       [supported]   HTTPRoute (CORS filter)            allowHeaders in CORS filter
  cors-allow-credentials   [supported]   HTTPRoute (CORS filter)            allowCredentials in CORS filter
  cors-max-age             [supported]   HTTPRoute (CORS filter)            maxAge in CORS filter
  force-ssl-redirect       [supported]   HTTPRoute (RequestRedirect filter) 301 redirect to HTTPS
  limit-rps                [partial]     BackendTrafficPolicy (RateLimit)   Envoy Gateway BackendTrafficPolicy
  limit-burst-multiplier   [partial]     BackendTrafficPolicy (RateLimit)   Burst configurable but uses tokens
  proxy-body-size          [partial]     BackendTrafficPolicy               requestBuffer.limit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;7 out of 10 annotations are fully supported. The 3 "partial" ones work -- they just use a slightly different API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Generate Migration Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ing-switch migrate &lt;span class="nt"&gt;--target&lt;/span&gt; gateway-api-traefik &lt;span class="nt"&gt;--output-dir&lt;/span&gt; ./migration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ing-switch -- Generating Migration Files
  Target:     gateway-api-traefik
  Output dir: ./migration

  + 00-migration-report.md
  + 01-install-gateway-api-crds/install.sh
  + 02-install-traefik-gateway/helm-install.sh
  + 02-install-traefik-gateway/values.yaml
  + 03-gateway/gatewayclass.yaml
  + 03-gateway/gateway.yaml
  + 04-httproutes/demo-api-cors.yaml
  + 04-httproutes/demo-dashboard.yaml
  + 04-httproutes/demo-web-app.yaml
  + 05-policies/demo-api-cors-ratelimit.yaml
  + 05-policies/demo-dashboard-forwardauth.yaml
  + 05-policies/demo-dashboard-ipallowlist.yaml
  + 06-verify.sh
  + 07-cleanup/remove-nginx.sh
  Generated 13 files in ./migration/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Inspect the Generated YAML
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;GatewayClass -- points to Traefik, not Envoy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GatewayClass&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;controllerName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.io/gateway-controller&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;HTTPRoute with native CORS filter&lt;/strong&gt; (no more ResponseHeaderModifier hacks):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPRoute&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-cors&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parentRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ing-switch-gateway&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;hostnames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api.example.com"&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PathPrefix&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1"&lt;/span&gt;
    &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CORS&lt;/span&gt;
      &lt;span class="na"&gt;cors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;allowOrigins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exact&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://app.example.com"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exact&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://admin.example.com"&lt;/span&gt;
        &lt;span class="na"&gt;allowMethods&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PUT"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPTIONS"&lt;/span&gt;
        &lt;span class="na"&gt;allowHeaders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key"&lt;/span&gt;
        &lt;span class="na"&gt;allowCredentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;maxAge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;86400s"&lt;/span&gt;
    &lt;span class="na"&gt;backendRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-service&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Traefik Middleware CRDs&lt;/strong&gt; (not Envoy-specific policies):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Rate Limiting&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Middleware&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-api-cors-ratelimit&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rateLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;average&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
    &lt;span class="na"&gt;burst&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ForwardAuth (external authentication)&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Middleware&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-dashboard-forwardauth&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;forwardAuth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://auth.example.com/verify"&lt;/span&gt;
  &lt;span class="na"&gt;authResponseHeaders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-User-ID"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-User-Email"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# IP AllowList&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Middleware&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-dashboard-ipallowlist&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ipAllowList&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;sourceRange&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.0.0/8"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;172.16.0.0/12"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8: Review the Migration Report
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;migrate&lt;/code&gt; command automatically generates &lt;code&gt;00-migration-report.md&lt;/code&gt; in the output directory. Open it to see the full summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ./migration/00-migration-report.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# ing-switch Migration Report&lt;/span&gt;
&lt;span class="gs"&gt;**Target Controller:**&lt;/span&gt; gateway-api-traefik

&lt;span class="gu"&gt;## Summary&lt;/span&gt;
| Metric | Count |
|--------|-------|
| Total Ingresses | 3 |
| Fully Compatible | 0 |
| Needs Workarounds | 3 |
| Has Unsupported Annotations | 0 |

&lt;span class="gu"&gt;## demo/api-cors -- Needs workaround&lt;/span&gt;
| Annotation | Status | Target Resource | Notes |
|-----------|--------|-----------------|-------|
| enable-cors | OK | HTTPRoute (CORS filter) | Native CORS filter (GA in v1.5) |
| cors-allow-origin | OK | HTTPRoute (CORS filter) | allowOrigins in CORS filter |
| limit-rps | WARN | BackendTrafficPolicy | Envoy Gateway BackendTrafficPolicy |
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 9: Apply (Dry-Run First)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fko8uwedoqnd5cwsbg14u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fko8uwedoqnd5cwsbg14u.png" width="800" height="514"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Gateway API CRDs&lt;/span&gt;
bash ./migration/01-install-gateway-api-crds/install.sh

&lt;span class="c"&gt;# Install Traefik with Gateway API provider&lt;/span&gt;
bash ./migration/02-install-traefik-gateway/helm-install.sh

&lt;span class="c"&gt;# Dry-run all resources first&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ./migration/03-gateway/ &lt;span class="nt"&gt;--dry-run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;server
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ./migration/04-httproutes/ &lt;span class="nt"&gt;--dry-run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;server

&lt;span class="c"&gt;# If dry-run passes, apply for real&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ./migration/03-gateway/
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ./migration/04-httproutes/
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ./migration/05-policies/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, &lt;strong&gt;both NGINX and Traefik are running side by side&lt;/strong&gt;. DNS still points to NGINX. Production traffic is untouched.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Verify and Cutover
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run the generated verification script&lt;/span&gt;
bash ./migration/06-verify.sh

&lt;span class="c"&gt;# Once verified, update DNS to Traefik's IP&lt;/span&gt;
&lt;span class="c"&gt;# Then clean up NGINX&lt;/span&gt;
bash ./migration/07-cleanup/remove-nginx.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 11: Use the Web UI
&lt;/h3&gt;

&lt;p&gt;For teams that prefer a visual workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ing-switch ui
&lt;span class="c"&gt;# Opens http://localhost:8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dashboard provides four pages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detect&lt;/strong&gt; -- Scan your cluster and see all ingresses with annotation counts and complexity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vw8va3h6ikmk8lrv0ey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vw8va3h6ikmk8lrv0ey.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analyze&lt;/strong&gt; -- Choose between 3 targets and see the full annotation compatibility matrix:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g2uym5x6f5b507duym8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g2uym5x6f5b507duym8.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrate&lt;/strong&gt; -- One-click generation with step-by-step checklist and dry-run buttons:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2809i0ear8nw6su69hnp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2809i0ear8nw6su69hnp.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;View all generated files inline with syntax highlighting:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtoiacddrvk1bl3mgxfx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtoiacddrvk1bl3mgxfx.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See migration gaps with impact ratings and fix instructions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllhtxig606joibkv8va1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllhtxig606joibkv8va1.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validate&lt;/strong&gt; -- Run live cluster checks to confirm your migration phase:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8u2a34jhmpxce4e1x7a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8u2a34jhmpxce4e1x7a.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vcluster delete demo &lt;span class="nt"&gt;--driver&lt;/span&gt; docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;done  Successfully deleted virtual cluster demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Makes ing-switch Different
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;ing-switch&lt;/th&gt;
&lt;th&gt;ingress2gateway&lt;/th&gt;
&lt;th&gt;Manual&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Annotation coverage&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;td&gt;30+&lt;/td&gt;
&lt;td&gt;You count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Traefik Ingress target&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gateway API (Traefik)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gateway API (Envoy)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Impact ratings&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web UI&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Install scripts&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification scripts&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DNS migration guide&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dry-run mode&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Ecosystem Is Ready
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gateway API v1.5&lt;/strong&gt; -- CORS filter, TLSRoute, BackendTLSPolicy all GA&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ingress2gateway v1.0&lt;/strong&gt; -- Official tool with emitter architecture&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traefik v3.7&lt;/strong&gt; -- Native NGINX annotation provider (80+ annotations)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Envoy Gateway v1.7&lt;/strong&gt; -- XListenerSet, enhanced policies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;cert-manager v1.20&lt;/strong&gt; -- Gateway API ListenerSet support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kubernetes 1.36&lt;/strong&gt; -- Ships April 22, first release post-NGINX archival&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tools exist. The standards are stable. The only thing left is to actually run the migration.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Star it, fork it, migrate today:&lt;/strong&gt; &lt;a href="https://github.com/saiyam1814/ing-switch" rel="noopener noreferrer"&gt;github.com/saiyam1814/ing-switch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ing-switch is open source under the MIT license. PRs welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>gatewayapi</category>
      <category>traefik</category>
    </item>
    <item>
      <title>What Actually Happens When You Run kubectl run nginx</title>
      <dc:creator>saiyam1814</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:23:26 +0000</pubDate>
      <link>https://dev.to/saiyam1814/what-actually-happens-when-you-run-kubectl-run-nginx-34bh</link>
      <guid>https://dev.to/saiyam1814/what-actually-happens-when-you-run-kubectl-run-nginx-34bh</guid>
      <description>&lt;p&gt;So you type &lt;code&gt;kubectl run nginx --image nginx&lt;/code&gt;. One line, one pod. About a second later on a warm cluster, the pod is Running. But what actually happens behind the scenes? Let us walk through it, step by step, step by step.&lt;/p&gt;

&lt;p&gt;%[&lt;a href="https://www.youtube.com/watch?v=LLuUhU3SwJo&amp;amp;t=4s" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=LLuUhU3SwJo&amp;amp;t=4s&lt;/a&gt;] &lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR, the 23 steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;kubectl&lt;/code&gt; parses argv and builds a minimal Pod object.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It reads &lt;code&gt;~/.kube/config&lt;/code&gt; for cluster, user, and context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A TCP connection is opened to the API server. TLS 1.3 negotiates keys in one round trip with mutual cert auth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;kubectl&lt;/code&gt; sends &lt;code&gt;POST /api/v1/namespaces/default/pods&lt;/code&gt; with a JSON body over HTTP/2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The API server authenticates the caller (x509, bearer token, OIDC, or webhook).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It authorizes the request against RBAC. Can this user create pods in default?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mutating admission runs. &lt;code&gt;ServiceAccount&lt;/code&gt; injects a projected token volume, &lt;code&gt;LimitRanger&lt;/code&gt; fills in default requests and limits, and so on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The API server defaults missing fields (DNS policy, restart policy, termination grace period) and then validates against the OpenAPI schema.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Validating admission runs. &lt;code&gt;ResourceQuota&lt;/code&gt;, &lt;code&gt;PodSecurity&lt;/code&gt;, any &lt;code&gt;ValidatingAdmissionWebhook&lt;/code&gt;, and the built in &lt;code&gt;ValidatingAdmissionPolicy&lt;/code&gt; CEL engine (GA since 1.30).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The API server writes to etcd via Raft. Leader replicates, followers fsync, a majority acks, and only then does the pod exist.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The API server returns &lt;code&gt;201 Created&lt;/code&gt;. &lt;code&gt;kubectl&lt;/code&gt; prints &lt;code&gt;pod/nginx created&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Watch fanout. Every component holding an open watch stream (scheduler, kubelets, controllers) is notified within milliseconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The scheduler runs Filter plugins. &lt;code&gt;NodeResourcesFit&lt;/code&gt;, &lt;code&gt;NodeAffinity&lt;/code&gt;, &lt;code&gt;TaintToleration&lt;/code&gt;, &lt;code&gt;PodTopologySpread&lt;/code&gt;, &lt;code&gt;VolumeBinding&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It runs Score plugins. &lt;code&gt;NodeResourcesBalancedAllocation&lt;/code&gt;, &lt;code&gt;ImageLocality&lt;/code&gt;, &lt;code&gt;InterPodAffinity&lt;/code&gt;, &lt;code&gt;NodeAffinity&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The winning node gets picked. Scheduler POSTs to &lt;code&gt;/pods/nginx/binding&lt;/code&gt;, which updates &lt;code&gt;spec.nodeName&lt;/code&gt;. One more etcd write.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The kubelet on that node sees the bound pod through its watch. &lt;code&gt;syncPod&lt;/code&gt; fires.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubelet calls the container runtime over CRI (&lt;code&gt;RunPodSandbox&lt;/code&gt;). containerd creates the pause container, PID 1, calling &lt;code&gt;pause(2)&lt;/code&gt; and holding the pod's network namespace.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The CNI plugin (Calico, Flannel, Cilium, your choice) runs ADD. It creates a veth pair, allocates an IP from the pod CIDR, programs routes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Image pull. containerd fetches the manifest, then the layers, verifying each with SHA-256.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Container create. The runtime stacks image layers with overlayfs, writes the OCI runtime spec, and asks runc to create.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;runc takes over. &lt;code&gt;clone3&lt;/code&gt; with namespace flags (PID, mount, UTS, IPC), &lt;code&gt;setns&lt;/code&gt; into the sandbox's network namespace, mount &lt;code&gt;/proc&lt;/code&gt;, &lt;code&gt;pivot_root&lt;/code&gt;, drop capabilities, apply the seccomp filter, &lt;code&gt;execve&lt;/code&gt; nginx.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubelet's PLEG notices the container started. Most clusters still poll the runtime every second. Evented PLEG is the newer event stream version but it is still alpha in 1.36, so don't assume it is on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The status manager patches &lt;code&gt;pod.status&lt;/code&gt; to Running back to the API server. Done.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setting the stage
&lt;/h2&gt;

&lt;p&gt;I teach Kubernetes on the &lt;a href="https://www.youtube.com/@kubesimplify" rel="noopener noreferrer"&gt;Kubesimplify YouTube&lt;/a&gt;ouTube channel, and I still get asked the same question in workshops. What actually happens when I run &lt;code&gt;kubectl run&lt;/code&gt;? Most answers stop at "the API server writes to etcd and the scheduler picks a node." That is true, but it is the one line summary of a story that has twenty-three chapters.&lt;/p&gt;

&lt;p&gt;So this post is the long form of the six-minute video I just shipped, paired with an &lt;a href="https://kubernetes-explained.vercel.app/pod" rel="noopener noreferrer"&gt;interactive site&lt;/a&gt; you can scrub through step by step. If you are a platform engineer who already knows what a pod is, my goal is that by the end of this you can name the plugins, the syscalls, the admission chain order, and the CRI calls. And you should be able to point at the Kubernetes source tree when you need to go deeper.&lt;/p&gt;

&lt;p&gt;Everything below is checked against Kubernetes 1.36.0, which shipped on April 22, 2026. Where a feature gate matters, I call the version out explicitly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1, the client side (kubectl)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: kubectl parses your command
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;kubectl run&lt;/code&gt; is a subcommand whose job is to take sparse user input and build a valid Pod object. The code lives in &lt;code&gt;staging/src/k8s.io/kubectl/pkg/cmd/run/run.go&lt;/code&gt;. For &lt;code&gt;kubectl run nginx --image nginx&lt;/code&gt;, the object kubectl builds locally is roughly this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So notice what is not there. No &lt;code&gt;restartPolicy&lt;/code&gt;, no &lt;code&gt;dnsPolicy&lt;/code&gt;, no &lt;code&gt;terminationGracePeriodSeconds&lt;/code&gt;, no &lt;code&gt;serviceAccountName&lt;/code&gt;, no &lt;code&gt;imagePullPolicy&lt;/code&gt;. kubectl deliberately sends a minimal object. All those fields are filled in by the API server during defaulting, which happens after admission and before validation. This is the first real insight. The object you POST and the object etcd ends up storing, they are not the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Reading kubeconfig
&lt;/h3&gt;

&lt;p&gt;kubectl needs to know where to send the request. It reads &lt;code&gt;~/.kube/config&lt;/code&gt; (or whatever &lt;code&gt;$KUBECONFIG&lt;/code&gt; points at) and resolves three things. The cluster (API server URL, CA bundle), the user (client certs, token, exec plugin), and the context (which cluster and user pair plus a default namespace). The logic sits in &lt;code&gt;client-go/tools/clientcmd&lt;/code&gt;. If you run &lt;code&gt;kubectl --v=8&lt;/code&gt;, you can watch this resolution happen inline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: TCP plus TLS 1.3 handshake
&lt;/h3&gt;

&lt;p&gt;kubectl opens a TCP connection to the API server on port 6443 and runs a TLS 1.3 handshake. TLS 1.3 is important here. It negotiates keys in a single round trip (TLS 1.2 needed two), and it does so with mutual authentication when you are using a client certificate. Both sides present certs, both sides verify against a CA. Same primitives your browser uses, nothing exotic. But worth noticing because every subsequent byte rides this mTLS tunnel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: HTTP/2 POST to the API server
&lt;/h3&gt;

&lt;p&gt;kubectl serializes the pod object to JSON, not YAML. YAML is a client side convenience, the wire format is JSON by default. Then it sends &lt;code&gt;POST /api/v1/namespaces/default/pods&lt;/code&gt; over HTTP/2. Content-Type is &lt;code&gt;application/json&lt;/code&gt;. HTTP/2 matters because all the watch streams later in the story will multiplex over the same connection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Request lands at the API server
&lt;/h3&gt;

&lt;p&gt;The request hits kube-apiserver. The code path is the generic API server filter chain in &lt;code&gt;staging/src/k8s.io/apiserver/pkg/server/filters&lt;/code&gt;. Every inbound request goes through the same stack of filters in order. Panic recovery, request deadline, auditing, authentication, impersonation, authorization, admission, validation. Most of the next phase is those filters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2, the API server gate
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 6: Authentication, "who are you?"
&lt;/h3&gt;

&lt;p&gt;So the API server asks the first question. Who are you? The API server has four authenticator backends chained together. x509 client certificates, bearer tokens (static, service account, or OIDC), OIDC directly (with JWT verification against the configured issuer), and authentication webhooks (the TokenReview API). The first one that returns a positive identity wins.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;kubectl&lt;/code&gt; with a standard kubeconfig, you are usually on x509. The cert you presented in the TLS handshake is reused to populate &lt;code&gt;user.Info&lt;/code&gt; with the CN as the username and the O values as groups. Code: &lt;code&gt;staging/src/k8s.io/apiserver/pkg/authentication&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Authorization, "can you do this?"
&lt;/h3&gt;

&lt;p&gt;With identity established, the next question. Can this user perform create on the resource pods in the namespace default? The default authorizer is RBAC, backed by &lt;code&gt;Role&lt;/code&gt;, &lt;code&gt;ClusterRole&lt;/code&gt;, &lt;code&gt;RoleBinding&lt;/code&gt;, &lt;code&gt;ClusterRoleBinding&lt;/code&gt; objects. Multiple authorizers can be chained. In managed clusters you will often see &lt;code&gt;Node,RBAC&lt;/code&gt;. The Node authorizer restricts what a kubelet can ask for, RBAC handles everything else. A single "allow" is enough. Explicit denies don't exist in RBAC.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Mutating admission
&lt;/h3&gt;

&lt;p&gt;This is the fun one. Mutating admission plugins run first, before schema validation, and they are allowed to change the object. Built-in mutators that fire for a pod create include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ServiceAccount&lt;/code&gt;. Injects the projected service account token volume and the &lt;code&gt;automountServiceAccountToken&lt;/code&gt; default.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;DefaultStorageClass&lt;/code&gt;, &lt;code&gt;DefaultTolerationSeconds&lt;/code&gt;, &lt;code&gt;PodNodeSelector&lt;/code&gt;, &lt;code&gt;RuntimeClass&lt;/code&gt;, depending on cluster config.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;LimitRanger&lt;/code&gt;. Applies default &lt;code&gt;resources.requests&lt;/code&gt; and limits when a &lt;code&gt;LimitRange&lt;/code&gt; exists in the namespace.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Every &lt;code&gt;MutatingAdmissionWebhook&lt;/code&gt; you have registered. Service meshes like Istio inject their sidecar here.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;MutatingAdmissionPolicy&lt;/code&gt;. The CEL based in-process alternative to webhooks. This went GA (v1) in 1.36, so you no longer need a feature gate for the stable path.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each plugin runs sequentially. The order that ships in the API server defaults matters. &lt;code&gt;ServiceAccount&lt;/code&gt; before &lt;code&gt;LimitRanger&lt;/code&gt;, for example. Source: &lt;code&gt;plugin/pkg/admission&lt;/code&gt; in kubernetes/kubernetes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 9: Schema validation
&lt;/h3&gt;

&lt;p&gt;After mutation, the API server defaults remaining missing fields (&lt;code&gt;restartPolicy: Always&lt;/code&gt;, &lt;code&gt;dnsPolicy: ClusterFirst&lt;/code&gt;, &lt;code&gt;terminationGracePeriodSeconds: 30&lt;/code&gt;, &lt;code&gt;serviceAccountName: default&lt;/code&gt;) and validates the now complete object against the OpenAPI v3 schema published at &lt;code&gt;/openapi/v3&lt;/code&gt;. Invalid names, empty required fields, wrong field types, all rejected here with a &lt;code&gt;422 Invalid&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Validating admission
&lt;/h3&gt;

&lt;p&gt;Validating admission is a second admission pass that cannot mutate. Built-ins include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ResourceQuota&lt;/code&gt;. Do the namespace's quotas have room for this pod's requests?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;PodSecurity&lt;/code&gt;. Does the pod meet the restricted, baseline, or privileged profile the namespace is labeled with?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Every &lt;code&gt;ValidatingAdmissionWebhook&lt;/code&gt; you have registered.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ValidatingAdmissionPolicy&lt;/code&gt;. CEL based in-process validation, GA since 1.30. A great replacement for Kyverno or OPA in many cases.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So here is the subtle bit. Mutating admission runs before validating admission. If a user's webhook mutates a field, the validating chain sees the mutated value, not the original. This ordering is easy to get wrong in your head, and it matters when you are writing policy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 11: etcd plus Raft quorum
&lt;/h3&gt;

&lt;p&gt;Now the API server persists the pod. This is not a plain disk write. etcd is a Raft replicated key value store. The leader appends the entry to its Raft log, replicates to followers, each node fsyncs to disk, and only after a majority (3 of 5 in a typical HA setup) acks does the leader commit. The API server's storage layer blocks on that commit.&lt;/p&gt;

&lt;p&gt;So if you ever see API latency spike, it is almost always etcd disk latency. Check &lt;code&gt;etcd_disk_wal_fsync_duration_seconds&lt;/code&gt;. This is really, really important to know when you are debugging a slow cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 12: 201 Created
&lt;/h3&gt;

&lt;p&gt;The API server responds &lt;code&gt;201 Created&lt;/code&gt; with the full defaulted and mutated pod object in the body. kubectl prints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pod/nginx created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From your terminal's perspective, it is done. From the cluster's perspective, the real work has not started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 3, the control loop hands off
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 13: Watch fanout
&lt;/h3&gt;

&lt;p&gt;Every long running component in Kubernetes holds an HTTP/2 watch stream to the API server. The scheduler watches unscheduled pods. Every kubelet watches pods bound to its node. Controllers watch their respective resources.&lt;/p&gt;

&lt;p&gt;So when a new pod is written to etcd, the API server's watch cache broadcasts the event to all subscribers. No polling, no round trips, just a chunked HTTP/2 frame per event. Milliseconds. Source: &lt;code&gt;staging/src/k8s.io/apiserver/pkg/storage/cacher&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 14: Scheduler, Filter
&lt;/h3&gt;

&lt;p&gt;kube-scheduler receives the event. The pod has no &lt;code&gt;spec.nodeName&lt;/code&gt;, so it is scheduler's problem. The scheduler runs a configurable pipeline of plugins, grouped into extension points. &lt;code&gt;PreFilter&lt;/code&gt;, &lt;code&gt;Filter&lt;/code&gt;, &lt;code&gt;PostFilter&lt;/code&gt;, &lt;code&gt;PreScore&lt;/code&gt;, &lt;code&gt;Score&lt;/code&gt;, &lt;code&gt;Reserve&lt;/code&gt;, &lt;code&gt;Permit&lt;/code&gt;, &lt;code&gt;PreBind&lt;/code&gt;, &lt;code&gt;Bind&lt;/code&gt;, &lt;code&gt;PostBind&lt;/code&gt;. For filter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeResourcesFit&lt;/code&gt;. The node has enough allocatable CPU, memory, and ephemeral storage for the pod's requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeAffinity&lt;/code&gt;. The pod's &lt;code&gt;nodeAffinity&lt;/code&gt; and &lt;code&gt;nodeSelector&lt;/code&gt; match the node's labels.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;TaintToleration&lt;/code&gt;. The pod tolerates the node's taints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;PodTopologySpread&lt;/code&gt;. The placement respects any topology spread constraints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;VolumeBinding&lt;/code&gt;. All unbound PVCs can be bound to volumes reachable from this node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;InterPodAffinity&lt;/code&gt; (at the filter level for hard constraints).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any node that fails any filter is eliminated. Plugin source: &lt;code&gt;pkg/scheduler/framework/plugins&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 15: Scheduler, Score
&lt;/h3&gt;

&lt;p&gt;Surviving nodes get scored by a second set of plugins.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeResourcesBalancedAllocation&lt;/code&gt;. Prefers nodes with balanced CPU and memory utilization, so you don't pack a CPU heavy pod onto an already CPU saturated node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ImageLocality&lt;/code&gt;. Prefers nodes that already have the container image cached locally. This saves image pull time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;InterPodAffinity&lt;/code&gt;. Soft affinity and anti-affinity preferences.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeAffinity&lt;/code&gt;. Soft (preferred) affinity terms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;TaintToleration&lt;/code&gt;. Soft toleration scoring.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each plugin returns a score 0 to 100 per node. Scores are normalized, weighted, and summed. Highest total wins. Ties are broken with a random pick using Go's &lt;code&gt;rand.Int()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One thing to flag here. Kubernetes 1.36 graduated Dynamic Resource Allocation (DRA) to GA. If you are scheduling GPU workloads or other devices through DRA, the scheduler's resource claim handling is now stable. Worth reading the KEP if you are running AI workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 16: Scheduler, Bind
&lt;/h3&gt;

&lt;p&gt;The scheduler POSTs to the binding subresource. &lt;code&gt;POST /api/v1/namespaces/default/pods/nginx/binding&lt;/code&gt; with &lt;code&gt;target.name=node-1&lt;/code&gt;. This is what actually updates &lt;code&gt;spec.nodeName&lt;/code&gt; in etcd. One more Raft write.&lt;/p&gt;

&lt;p&gt;So here is a fun detail. The scheduler never writes &lt;code&gt;spec.nodeName&lt;/code&gt; directly on the pod. It always goes through binding. This exists precisely because binding is a separate privilege you can RBAC.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 4, the kubelet brings the pod to life
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 17: Kubelet SyncPod
&lt;/h3&gt;

&lt;p&gt;Kubelet on the bound node has been watching &lt;code&gt;pods?fieldSelector=spec.nodeName=node-1&lt;/code&gt; since startup. It sees the update, runs its pod admission checks (eviction pressure, kubelet level &lt;code&gt;PodSecurityContext&lt;/code&gt; sanity), and calls &lt;code&gt;syncPod&lt;/code&gt; in &lt;code&gt;pkg/kubelet/kubelet.go&lt;/code&gt;. SyncPod is the reconciliation loop. It compares the desired pod spec with the current runtime state and issues CRI calls to bring them into alignment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 18: CRI, sandbox and the pause container
&lt;/h3&gt;

&lt;p&gt;Before any app container runs, the kubelet creates a pod sandbox. It calls &lt;code&gt;RunPodSandbox&lt;/code&gt; over the CRI gRPC API on the runtime's socket (&lt;code&gt;/run/containerd/containerd.sock&lt;/code&gt; by default). containerd launches the pause container. A tiny statically linked binary whose entire job is to call &lt;code&gt;pause(2)&lt;/code&gt; and block forever as PID 1.&lt;/p&gt;

&lt;p&gt;But why? Because the pause container is what owns the pod's Linux namespaces, especially the network namespace. When you add more containers to the pod, they &lt;code&gt;setns&lt;/code&gt; into the pause container's namespaces. If an app container dies and restarts, the namespaces (and the IP) survive because pause is still there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 19: CNI, pod gets networking
&lt;/h3&gt;

&lt;p&gt;With the sandbox up, the runtime invokes the CNI plugin specified in &lt;code&gt;/etc/cni/net.d/*.conflist&lt;/code&gt; (whichever is lexically first). Calico, Flannel, Cilium, Weave, the plugin you installed. CNI's contract is simple. A binary that reads JSON from stdin, takes an action (&lt;code&gt;ADD&lt;/code&gt;, &lt;code&gt;DEL&lt;/code&gt;, &lt;code&gt;CHECK&lt;/code&gt;), and returns JSON to stdout. The &lt;code&gt;ADD&lt;/code&gt; call:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Creates a veth pair. One end in the pod's network namespace, one end on the node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allocates an IP from the pod CIDR. IPAM is either a local store, Kubernetes IPAM, or an external controller.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Programs routes and iptables or eBPF rules on the host.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optionally sets up sysctls inside the pod's netns.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When this returns, &lt;code&gt;kubectl get pod -o wide&lt;/code&gt; will start showing &lt;code&gt;podIP&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 20: Image pull
&lt;/h3&gt;

&lt;p&gt;Kubelet calls &lt;code&gt;PullImage&lt;/code&gt; over CRI. containerd resolves the reference (&lt;code&gt;nginx&lt;/code&gt; to &lt;code&gt;docker.io/library/nginx:latest&lt;/code&gt;), fetches the manifest, then pulls each layer in parallel, verifying SHA-256 digests on every chunk. First pull for a popular image over broadband is a few seconds. Cached? About 100 ms. containerd just revalidates the manifest and returns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 21: Container create
&lt;/h3&gt;

&lt;p&gt;With the image unpacked, the runtime assembles the container.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Stacks the image layers as read only lower layers and adds a writable upper layer using overlayfs. The result is the container's rootfs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Writes the OCI runtime spec (&lt;code&gt;config.json&lt;/code&gt;). A JSON document describing every mount, every namespace flag, every capability, the seccomp profile, the apparmor profile, the cgroup limits, the user, the entrypoint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Creates a bundle directory containing the rootfs and &lt;code&gt;config.json&lt;/code&gt; and hands it to runc with &lt;code&gt;runc create&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OCI runtime spec lives in the &lt;code&gt;opencontainers/runtime-spec&lt;/code&gt; repo. This is the same spec Podman, CRI-O, and gVisor use. It is the portability boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 5, runc, namespaces, and the first breath
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 22: runc
&lt;/h3&gt;

&lt;p&gt;So this is the single coolest part of the whole pipeline. runc takes the bundle and does the following.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Calls &lt;code&gt;clone3&lt;/code&gt; with flags &lt;code&gt;CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC&lt;/code&gt;. On a modern kernel, &lt;code&gt;clone3&lt;/code&gt; is preferred over the older &lt;code&gt;clone&lt;/code&gt; because it takes a structured argument and supports more namespace flags cleanly. The network namespace is not created here. Instead, runc uses &lt;code&gt;setns&lt;/code&gt; to enter the sandbox's network namespace that CNI created earlier, so the new container shares the pod IP.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Inside the new process, mounts &lt;code&gt;/proc&lt;/code&gt; for the new PID namespace.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pivot_root&lt;/code&gt; into the overlayfs rootfs, then unmounts the old root.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Drops Linux capabilities to the OCI spec's bounding set. The default for a non-privileged container is a tight whitelist. No &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt;, no &lt;code&gt;CAP_NET_ADMIN&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Applies the seccomp filter. The runtime default profile blocks around 40 syscalls, like &lt;code&gt;kexec_load&lt;/code&gt;, certain &lt;code&gt;unshare&lt;/code&gt; flags, and &lt;code&gt;bpf&lt;/code&gt; without capability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Joins the cgroup v2 hierarchy with the configured CPU and memory limits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Calls &lt;code&gt;execve&lt;/code&gt; on the container's entrypoint, &lt;code&gt;nginx -g daemon off;&lt;/code&gt;. &lt;code&gt;execve&lt;/code&gt; is the syscall that replaces the current process image with a new program while keeping the PID. This is the moment nginx is alive.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you &lt;code&gt;strace -f&lt;/code&gt; runc during create, you will see this whole dance. It is worth doing once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 23: PLEG and the Running status
&lt;/h3&gt;

&lt;p&gt;Kubelet needs to know the container started. Historically, kubelet's PLEG (Pod Lifecycle Event Generator) polled the runtime every second via &lt;code&gt;ListContainers&lt;/code&gt;, diffed the result, and emitted events. On a big node with hundreds of pods, this was a measurable source of CPU load.&lt;/p&gt;

&lt;p&gt;So there is a newer path called Evented PLEG. It opens a CRI event stream (&lt;code&gt;ContainerEventsRequest&lt;/code&gt;) so containerd pushes events like &lt;code&gt;CONTAINER_STARTED_EVENT&lt;/code&gt; and &lt;code&gt;CONTAINER_STOPPED_EVENT&lt;/code&gt; as they happen. But here is the thing. Evented PLEG is still alpha in 1.36. It was alpha in 1.25, promoted to beta in 1.27, then reverted to alpha in 1.30 after a static pod bug. It is disabled by default. So if you are reading kubelet code today, assume the polling path is what is actually running on your cluster.&lt;/p&gt;

&lt;p&gt;When kubelet sees a new container has started (through polling or evented), the status manager computes the pod's phase as Running and patches &lt;code&gt;pod.status&lt;/code&gt; back to the API server via a JSON merge patch. Watchers (you, with &lt;code&gt;kubectl get pod -w&lt;/code&gt;) see the transition. The status patch is also the signal to any controller waiting on this pod. For example, the endpoints controller, which is about to add the pod's IP to a Service's &lt;code&gt;EndpointSlice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And that is the whole journey. From &lt;code&gt;argv[1]&lt;/code&gt; in your shell to nginx serving on port 80, about a second on a warm cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/kubernetes/kubernetes" rel="noopener noreferrer"&gt;kubernetes/kubernetes&lt;/a&gt;. The source tree. Start in &lt;code&gt;pkg/kubelet&lt;/code&gt;, &lt;code&gt;pkg/scheduler&lt;/code&gt;, &lt;code&gt;staging/src/k8s.io/apiserver&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/kubernetes/cri-api" rel="noopener noreferrer"&gt;CRI spec&lt;/a&gt;. The gRPC contract between kubelet and the runtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/containernetworking/cni" rel="noopener noreferrer"&gt;CNI spec&lt;/a&gt;. The plugin contract for pod networking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/opencontainers/runtime-spec" rel="noopener noreferrer"&gt;OCI runtime spec&lt;/a&gt;. The container bundle and config format runc consumes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/opencontainers/image-spec" rel="noopener noreferrer"&gt;OCI image spec&lt;/a&gt;. Manifests, layers, and the SHA-256 content addressable model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3386-kubelet-evented-pleg" rel="noopener noreferrer"&gt;KEP-3386 Evented PLEG&lt;/a&gt;. The design doc for the CRI event driven PLEG, still alpha in 1.36.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins" rel="noopener noreferrer"&gt;kube-scheduler plugin docs&lt;/a&gt;. The official list of in-tree plugins and their extension points.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Watch and play
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Video (6 min): &lt;a href="https://youtu.be/LLuUhU3SwJo?si=GyN5qYp71OgXMWFA" rel="noopener noreferrer"&gt;What Actually Happens When You Run kubectl run nginx (23 Steps)&lt;/a&gt; on the Kubesimplify YouTube channel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interactive site: &lt;a href="https://kubernetes-explained.vercel.app/pod" rel="noopener noreferrer"&gt;kubernetes-explained.vercel.app/pod&lt;/a&gt;. Pause, scrub, jump to any step, copy the code for yourself.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if you liked this, the next one in the series is the scheduler deep-dive. How &lt;code&gt;kube-scheduler&lt;/code&gt; actually decides. Subscribe on the channel so you catch it, and tell me in the comments which step surprised you. That is how I know what to unpack next.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>containers</category>
    </item>
    <item>
      <title>What Actually Happens When kube-scheduler Picks a Node (13 Stages Inside Kubernetes)</title>
      <dc:creator>saiyam1814</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:23:24 +0000</pubDate>
      <link>https://dev.to/saiyam1814/what-actually-happens-when-kube-scheduler-picks-a-node-13-stages-inside-kubernetes-3g6o</link>
      <guid>https://dev.to/saiyam1814/what-actually-happens-when-kube-scheduler-picks-a-node-13-stages-inside-kubernetes-3g6o</guid>
      <description>

&lt;p&gt;Your pod has just been written to etcd. The API server returned &lt;code&gt;201 Created&lt;/code&gt;. The pod exists. But &lt;code&gt;spec.nodeName&lt;/code&gt; is still empty, and that is the entire reason this post exists.&lt;/p&gt;

&lt;p&gt;A pod with no node is not a real workload. It is a row in a database. Something has to look at it, decide which machine should run it, and atomically claim that machine. That something is &lt;code&gt;kube-scheduler&lt;/code&gt;, and the way it makes the decision is more interesting than "pick the node with the most free CPU."&lt;/p&gt;

&lt;p&gt;There are thirteen separate stages in modern scheduling. The Filter stage alone runs fourteen in-tree plugins, each one capable of disqualifying a candidate node with a single &lt;code&gt;Unschedulable&lt;/code&gt; verdict. There is no appeal, no second chance, no "best effort." Either every plugin says yes, or that node is out.&lt;/p&gt;

&lt;p&gt;This post walks every stage end-to-end against the v1.36 source code, with verbatim outputs from a real cluster at the bottom.&lt;/p&gt;

&lt;p&gt;%[&lt;a href="https://youtu.be/N-dDSCVWdqU" rel="noopener noreferrer"&gt;https://youtu.be/N-dDSCVWdqU&lt;/a&gt;] &lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR, the 13 stages
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9cub5gs94ac0vgw1c5ub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9cub5gs94ac0vgw1c5ub.png" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PreEnqueue.&lt;/strong&gt; Gating plugins decide if the pod is even allowed into the queue. SchedulingGates lives here. If a gate is set, the pod waits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QueueSort.&lt;/strong&gt; The activeQ orders pods by priority. Higher priority first.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PreFilter.&lt;/strong&gt; Eleven plugins precompute what the pod actually wants. Resources, affinity terms, topology spread, all stashed in CycleState. Compute once, read many times.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Filter.&lt;/strong&gt; Fourteen plugins each test every node in parallel. NodeUnschedulable, NodeName, TaintToleration, NodeAffinity, NodePorts, NodeResourcesFit, VolumeRestrictions, NodeVolumeLimits, VolumeBinding, VolumeZone, PodTopologySpread, InterPodAffinity, DynamicResources, NodeDeclaredFeatures. One Unschedulable verdict and the node is out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostFilter.&lt;/strong&gt; Only fires if every node failed Filter. DefaultPreemption asks, "if I evicted some lower priority pods, could this one fit?" If yes, it picks victims and the pod retries next cycle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PreScore.&lt;/strong&gt; Same trick as PreFilter. Plugins that do heavy per node work during scoring precompute once and cache.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Score.&lt;/strong&gt; Nine plugins rate every surviving node, zero to one hundred. In parallel. Each plugin has a weight. TaintToleration is three. NodeAffinity, InterPodAffinity, PodTopologySpread, DynamicResources are all two. Rest are one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NormalizeScore.&lt;/strong&gt; Rescales every plugin's output. Then for each node, multiply scores by weights, add it all up. Highest sum wins. Ties? Go's &lt;code&gt;rand.Int&lt;/code&gt;. Yes, random. Deterministic ties would hot spot the same node every time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reserve.&lt;/strong&gt; The scheduler subtracts the pod's requests from the winning node's in memory snapshot. So the next pod in the same cycle sees that node as already loaded.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Permit.&lt;/strong&gt; A hook. A plugin can Approve, Wait, or Reject. Stock cluster, no op. But Kueue, Volcano, Coscheduling all wait here for gang scheduling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PreBind.&lt;/strong&gt; Last chance to do work before the API server gets told. VolumeBinding finalizes PVC binds here.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bind.&lt;/strong&gt; The DefaultBinder updates spec dot node name on the pod via the API server. Now etcd has the assignment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostBind.&lt;/strong&gt; Cleanup. The pod is gone from the scheduler's queue.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the whole walkthrough. The rest of this post is the part that does not fit in a tweet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scheduling framework
&lt;/h2&gt;

&lt;p&gt;Since Kubernetes 1.19, kube-scheduler has been built on top of the &lt;strong&gt;scheduling framework&lt;/strong&gt; (KEP-624, beta in 1.18, GA in 1.19). The core of the binary is small and intentionally dumb. All of the actual decision-making lives in plugins, registered at well-defined extension points.&lt;/p&gt;

&lt;p&gt;This separation is what makes the rest of the ecosystem possible. You can disable plugins. You can write your own as a Go module or behind a webhook. You can run multiple scheduler profiles side by side and let pods pick one with &lt;code&gt;spec.schedulerName&lt;/code&gt;. Most installations never touch the configuration, but if you have ever wondered how Volcano, Kueue, or Coscheduling plug into the scheduler without forking it, this is the answer: they register against the framework's extension points and the core just calls them at the right time.&lt;/p&gt;

&lt;p&gt;The thirteen extension points are not arbitrary. Each one corresponds to a moment in the pod's lifecycle where it makes sense to ask plugins a question. &lt;em&gt;Should this pod even enter the queue?&lt;/em&gt; That is &lt;code&gt;PreEnqueue&lt;/code&gt;. &lt;em&gt;Is this node a candidate?&lt;/em&gt; That is &lt;code&gt;Filter&lt;/code&gt;. &lt;em&gt;Among the candidates, which one is the best fit?&lt;/em&gt; That is &lt;code&gt;Score&lt;/code&gt;. The framework gives you the seam; the plugin fills in the logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three queues, before any plugin runs
&lt;/h2&gt;

&lt;p&gt;Before any plugin gets called, the pod has to make it into the right queue. The scheduler maintains three of them, and they each serve a different purpose.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;activeQ&lt;/strong&gt; is a priority heap. Unscheduled pods are ordered by &lt;code&gt;spec.priority&lt;/code&gt;, and the scheduler always pops from the head. Higher-priority pods cut in line, which is exactly what you want for things like critical control-plane pods or paid-tier workloads.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;backoffQ&lt;/strong&gt; holds pods that just failed a scheduling attempt. They sit there for a small (and exponentially growing) timeout before being promoted back into the activeQ. This is not laziness; it is a correctness property. If a pod could not be scheduled in this cycle, retrying it immediately almost always fails the same way. Backoff lets the cluster state change first.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;unschedulableQ&lt;/strong&gt; (the source actually calls it &lt;code&gt;unschedulablePods&lt;/code&gt;, but the docs and the metrics use the queue suffix) is an indexed map of pods that have been declared unschedulable for now. These pods do not retry on a timer. They retry on &lt;em&gt;events&lt;/em&gt;. If a new node is added, an informer event fires &lt;code&gt;MoveAllToActiveOrBackoffQueue&lt;/code&gt; and they all get a fresh shot. Same thing if a pod is deleted and frees up resources. There is also a five-minute fallback timer for pods that have been waiting too long, in case the event stream missed an update.&lt;/p&gt;

&lt;p&gt;All three queues live in &lt;code&gt;pkg/scheduler/backend/queue/scheduling_queue.go&lt;/code&gt;. Their names are also exposed as labels on the &lt;code&gt;scheduler_pending_pods&lt;/code&gt; metric, which is the easiest way to debug a stuck cluster: a queue full of pods in &lt;code&gt;Unschedulable&lt;/code&gt; is telling you something different than a queue full of pods in &lt;code&gt;Backoff&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1: PreEnqueue (the gate)
&lt;/h2&gt;

&lt;p&gt;PreEnqueue plugins decide whether a pod is even allowed into the activeQ. If any plugin returns &lt;code&gt;Unschedulable&lt;/code&gt;, the pod sits in the unschedulableQ until something causes its gate to clear.&lt;/p&gt;

&lt;p&gt;The canonical example is the &lt;code&gt;SchedulingGates&lt;/code&gt; plugin. By setting &lt;code&gt;spec.schedulingGates&lt;/code&gt; on a pod, you can create the pod object now but defer its scheduling until you explicitly remove the gate. This pattern shows up in batch workloads, in cost-aware scheduling controllers, and in anything that wants to express "this pod exists but isn't ready to run yet."&lt;/p&gt;

&lt;p&gt;Most pods sail through PreEnqueue with no gates set, but it is the very first checkpoint and worth knowing about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 2: QueueSort (the order)
&lt;/h2&gt;

&lt;p&gt;Pods waiting in the activeQ have to be ordered somehow. QueueSort plugins define that order. The default is &lt;code&gt;PrioritySort&lt;/code&gt;: it ranks pods by &lt;code&gt;spec.priority&lt;/code&gt; (an integer) descending, and falls back to creation timestamp for ties. Older pod with the same priority wins.&lt;/p&gt;

&lt;p&gt;There is one plugin, it does one thing, and you almost never want to change it. Worth a sentence in the model, not much more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: PreFilter (cache once)
&lt;/h2&gt;

&lt;p&gt;Once a pod is popped off the activeQ, the scheduler's first real job is to look at what the pod actually wants. That is PreFilter, and it runs exactly once per pod per cycle.&lt;/p&gt;

&lt;p&gt;The default profile registers eleven PreFilter plugins, each one extracting a different facet of the pod's requirements: &lt;code&gt;NodeResourcesFit&lt;/code&gt; pulls out CPU, memory, and extended-resource requests; &lt;code&gt;NodeAffinity&lt;/code&gt; normalizes the affinity term tree; &lt;code&gt;PodTopologySpread&lt;/code&gt; builds its per-topology-key constraint sets; &lt;code&gt;InterPodAffinity&lt;/code&gt; walks the affinity and anti-affinity rules; &lt;code&gt;VolumeBinding&lt;/code&gt; figures out which PVCs still need binding; and so on.&lt;/p&gt;

&lt;p&gt;All of this work is cached in a &lt;code&gt;framework.CycleState&lt;/code&gt; object. Think of &lt;code&gt;CycleState&lt;/code&gt; as a per-pod scratch pad. compute the expensive things once, read them many times. The reason it matters becomes obvious in the next stage, where each plugin is about to be called several thousand times in tight loops.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bhj96i2wtgu1qcbqcjt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bhj96i2wtgu1qcbqcjt.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 4: Filter (every node, every plugin, in parallel)
&lt;/h2&gt;

&lt;p&gt;Filter is where the bulk of the scheduling work actually happens. Fourteen plugins are called against every candidate node, in parallel, and any single &lt;code&gt;Unschedulable&lt;/code&gt; verdict eliminates that node from the rest of the cycle.&lt;/p&gt;

&lt;p&gt;Here is the verified list, straight from &lt;code&gt;pkg/scheduler/apis/config/testing/defaults/defaults.go&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeUnschedulable&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeName&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;TaintToleration&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeAffinity&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodePorts&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeResourcesFit&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;VolumeRestrictions&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeVolumeLimits&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;VolumeBinding&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;VolumeZone&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;PodTopologySpread&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;InterPodAffinity&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;DynamicResources&lt;/code&gt; (went GA in 1.36)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeDeclaredFeatures&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each plugin receives the pod, the candidate node's info, and the &lt;code&gt;CycleState&lt;/code&gt; that PreFilter built up. Each plugin returns &lt;code&gt;Success&lt;/code&gt; or &lt;code&gt;Unschedulable&lt;/code&gt;. If any of them says &lt;code&gt;Unschedulable&lt;/code&gt;, that node is gone. There is no aggregation, no scoring at this stage, no "well, three out of four said yes." It is binary, and that is what makes Filter fast: the scheduler can fan out to all 14 plugins in parallel, and short-circuit on the first failure per node.&lt;/p&gt;

&lt;p&gt;Most engineers will only ever care about a handful of these by name. A quick tour:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeUnschedulable&lt;/code&gt; is the first line of defense. If the node has &lt;code&gt;spec.unschedulable: true&lt;/code&gt;, this plugin filters it out. That is exactly what &lt;code&gt;kubectl cordon&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeName&lt;/code&gt; is the simplest possible filter. If the pod has &lt;code&gt;spec.nodeName&lt;/code&gt; set (you can set it manually and skip the scheduler entirely), only that node passes; the scheduler effectively becomes a no-op.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;TaintToleration&lt;/code&gt; is the one most engineers will recognize. The node has taints, the pod has tolerations, and any unmatched &lt;code&gt;NoSchedule&lt;/code&gt; or &lt;code&gt;NoExecute&lt;/code&gt; taint kills candidacy. The "GPU node" pattern in the demo at the bottom of this post is just a &lt;code&gt;NoSchedule&lt;/code&gt; taint that nothing tolerates.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeAffinity&lt;/code&gt; evaluates the pod's &lt;code&gt;spec.affinity.nodeAffinity&lt;/code&gt; rules. Required affinity terms must match here at Filter; preferred terms get scored later.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeResourcesFit&lt;/code&gt; is the one most people intuitively understand. Does the node have enough free CPU, memory, and any other Kubernetes resource (hugepages, GPUs, custom resources) to fit the pod's requests? Notably, only requests are considered, not limits, which is why a node can be massively over-subscribed on limits and the scheduler still happily places more pods.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;VolumeBinding&lt;/code&gt; deserves a paragraph of its own. If the pod has PVCs that are not yet bound, VolumeBinding has to decide whether each unbound PVC &lt;em&gt;could&lt;/em&gt; be bound on this specific node. For a &lt;code&gt;WaitForFirstConsumer&lt;/code&gt; storage class the answer depends on zone, on the storage backend's topology, and on which PVs exist. VolumeBinding doesn't just filter; it also remembers the provisioning plan it chose, and that plan gets locked in during the Reserve stage further down.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DynamicResources&lt;/code&gt; is the new kid on the block. It implements the DRA framework, which went GA in 1.36. If your pod uses ResourceClaims (the modern way to ask for GPUs and other devices), DynamicResources is the plugin that figures out whether a node can satisfy the claim.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeDeclaredFeatures&lt;/code&gt; is newer still. It compares features the node has declared against the pod's required features, and is feature-gated in some configurations.&lt;/p&gt;

&lt;p&gt;Run all 14 plugins in parallel against every node, collect the verdicts, and whatever survives all 14 votes moves on. If nothing survives, the scheduler doesn't give up: it runs PostFilter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 5: PostFilter (preemption, the expensive escape hatch)
&lt;/h2&gt;

&lt;p&gt;If every node failed Filter, the scheduler is in trouble. The pod is unschedulable on the cluster as it stands today. PostFilter exists for exactly this case, and the default plugin is &lt;code&gt;DefaultPreemption&lt;/code&gt;. It asks a single question: &lt;em&gt;if I evicted some lower-priority pods, could this one fit?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The algorithm sounds simple but is genuinely expensive. For each node, the scheduler:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Gathers all pods on the node with priority lower than the pending pod.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simulates evicting them one at a time, lowest priority first.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After each simulated eviction, re-runs Filter on the hypothetical state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the pod becomes schedulable, the node is a candidate, and the minimum set of pods that need to die is recorded.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After all candidate nodes have been evaluated, the scheduler picks the "best" one. fewest victims, lowest victim priority, latest creation time as a tiebreaker. The exact ordering lives in &lt;code&gt;pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once a candidate is picked, two things happen. The scheduler sets &lt;code&gt;nominatedNodeName&lt;/code&gt; on the pending pod, so anyone watching the API can see this pod is targeting a specific node. Then it gracefully deletes the victims through the API server, respecting their &lt;code&gt;terminationGracePeriodSeconds&lt;/code&gt;. The pending pod itself goes back into the activeQ to be retried in the next cycle.&lt;/p&gt;

&lt;p&gt;This whole process is &lt;em&gt;expensive&lt;/em&gt;. The first Filter sweep already touched every node. Now the scheduler is running Filter again, multiple times, against simulated state per candidate. Tens to hundreds of milliseconds easily, seconds on large clusters. The good news is that the vast majority of pods never hit this path; they schedule cleanly on the first try.&lt;/p&gt;

&lt;p&gt;PostFilter has a second plugin now: &lt;code&gt;DynamicResources&lt;/code&gt;. Same idea, but for ResourceClaims rather than pods. If a Filter cycle failed because of a claim that is busy, DynamicResources' PostFilter can deallocate idle claims to make room.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 6: PreScore (cache once again)
&lt;/h2&gt;

&lt;p&gt;Filter has done its work. Maybe four nodes are left, maybe forty. Either way, it is time to score them, and the scheduler reuses the same precompute trick from PreFilter.&lt;/p&gt;

&lt;p&gt;Some Score plugins do expensive per-node work. To avoid recomputing the same input data once per node, those plugins do their work once in PreScore and stash the result in &lt;code&gt;CycleState&lt;/code&gt;. The default PreScore plugins are &lt;code&gt;TaintToleration&lt;/code&gt;, &lt;code&gt;NodeAffinity&lt;/code&gt;, &lt;code&gt;NodeResourcesFit&lt;/code&gt;, &lt;code&gt;VolumeBinding&lt;/code&gt;, &lt;code&gt;PodTopologySpread&lt;/code&gt;, &lt;code&gt;InterPodAffinity&lt;/code&gt;, and &lt;code&gt;NodeResourcesBalancedAllocation&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;InterPodAffinity&lt;/code&gt; is the heaviest of the bunch. It walks the cluster's existing pods, builds a topology map of where each pod sits, and converts the new pod's affinity rules into an indexed structure. &lt;code&gt;PodTopologySpread&lt;/code&gt; does similar work, building per-topology-key counts.&lt;/p&gt;

&lt;p&gt;After PreScore, each individual Score call becomes effectively O(1). a lookup against precomputed state. Without it, scoring large clusters would be unworkable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 7: Score (the leaderboard, weighted)
&lt;/h2&gt;

&lt;p&gt;Now the actual ranking. Every Score plugin rates every surviving node from 0 to 100, in parallel.&lt;/p&gt;

&lt;p&gt;The default Score plugins, with weights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;TaintToleration&lt;/code&gt;, weight 3&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeAffinity&lt;/code&gt;, weight 2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeResourcesFit&lt;/code&gt;, weight 1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;VolumeBinding&lt;/code&gt;, weight 1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;PodTopologySpread&lt;/code&gt;, weight 2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;InterPodAffinity&lt;/code&gt;, weight 2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;DynamicResources&lt;/code&gt;, weight 2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;NodeResourcesBalancedAllocation&lt;/code&gt;, weight 1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ImageLocality&lt;/code&gt;, weight 1&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is nine plugins. All verified against &lt;code&gt;defaults.go&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The weights are not arbitrary. The source comments explain the reasoning: &lt;code&gt;TaintToleration&lt;/code&gt; is tripled because user-expressed taint preference is a strong signal. &lt;code&gt;NodeAffinity&lt;/code&gt;, &lt;code&gt;PodTopologySpread&lt;/code&gt;, &lt;code&gt;InterPodAffinity&lt;/code&gt;, and &lt;code&gt;DynamicResources&lt;/code&gt; are doubled because they encode user intent. The rest are weight 1 because they are infrastructure-level signals (balance, cache hits) that should influence the decision but not dominate it.&lt;/p&gt;

&lt;p&gt;It is worth zooming in on &lt;code&gt;ImageLocality&lt;/code&gt; for a moment. Once you understand it, you start noticing its effect everywhere.&lt;/p&gt;

&lt;p&gt;ImageLocality asks one question per node: do you already have the container image's layers cached? If yes, score 100. If no, score 0. That is the entire plugin.&lt;/p&gt;

&lt;p&gt;It matters because on a cold node, the kubelet has to pull the image over the network. seconds for a small image, tens of seconds for a fat ML or LLM image. On a warm node, the pod starts in milliseconds. ImageLocality is a soft preference (it doesn't filter), but it nudges the scheduler toward already-warm nodes when other things are equal, and the cumulative effect on workload startup latency is huge.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeResourcesFit&lt;/code&gt; is the resource-balance plugin you've probably tuned at some point. By default it uses &lt;code&gt;LeastAllocated&lt;/code&gt;, which prefers nodes with more free capacity (spreading the load). You can flip it to &lt;code&gt;MostAllocated&lt;/code&gt; for bin-packing, or to &lt;code&gt;RequestedToCapacityRatio&lt;/code&gt; for custom curves, via &lt;code&gt;KubeSchedulerConfiguration&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NodeResourcesBalancedAllocation&lt;/code&gt; is subtler. It rewards nodes whose CPU and memory utilization are balanced. A node at 80% CPU and 20% memory scores &lt;em&gt;worse&lt;/em&gt; than a node at 50%/50%, because imbalanced nodes are more likely to fragment future scheduling decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 8: NormalizeScore and picking a winner
&lt;/h2&gt;

&lt;p&gt;All nine plugins have scored every surviving node. The scheduler now picks the winner.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NormalizeScore&lt;/code&gt; rescales every plugin's output to a uniform 0 to 100 range. Some plugins return raw counts or other custom scales; this stage brings everything to the same units.&lt;/p&gt;

&lt;p&gt;For each node, the scheduler then sums &lt;code&gt;score × weight&lt;/code&gt; across all nine plugins. Highest sum wins.&lt;/p&gt;

&lt;p&gt;The interesting question is what happens when two nodes have exactly the same total. The scheduler picks one at random. specifically, it uses Go's &lt;code&gt;math/big.Int&lt;/code&gt; random (&lt;code&gt;rand.Int&lt;/code&gt;), not &lt;code&gt;rand.Intn&lt;/code&gt;. The choice matters more than it might seem.&lt;/p&gt;

&lt;p&gt;Random tie-breaking exists to prevent hot-spotting. Imagine two equally suitable nodes for a workload. If the scheduler always picked the first one in some deterministic order, every pod from that workload would pile onto the same node and the other one would sit empty. Randomization spreads the load.&lt;/p&gt;

&lt;p&gt;The choice of &lt;code&gt;rand.Int&lt;/code&gt; over &lt;code&gt;rand.Intn&lt;/code&gt; matters because &lt;code&gt;rand.Intn&lt;/code&gt; has a subtle modulo bias for non-power-of-two ranges. Over millions of scheduling decisions across a large cluster, that bias becomes a real distribution skew. &lt;code&gt;rand.Int&lt;/code&gt; avoids it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 9: Reserve (claim the resources in memory, before the API knows)
&lt;/h2&gt;

&lt;p&gt;The winner is picked, but at this point the API server still does not know about the decision. As far as etcd is concerned, the pod is still unscheduled.&lt;/p&gt;

&lt;p&gt;Reserve fixes that locally. The scheduler takes the winning node's in-memory snapshot and subtracts whatever the pod requested: CPU, memory, extended resources, PVCs that need binding.&lt;/p&gt;

&lt;p&gt;A critical detail: the scheduler operates on &lt;strong&gt;requests&lt;/strong&gt;, not limits. And if your pod has no requests at all, the scheduler does not invent defaults. that is &lt;code&gt;LimitRanger&lt;/code&gt;'s job, much earlier at admission time. Here, Reserve subtracts whatever requests the pod has, even if it is zero. The scheduler's view of node capacity is purely request-based; a node could be massively over-subscribed on limits and the scheduler would never know or care.&lt;/p&gt;

&lt;p&gt;The reason Reserve happens in memory &lt;em&gt;before&lt;/em&gt; the bind is so the next pod in the same scheduling cycle sees this node as already loaded. Picture scaling a deployment to twenty replicas all at once: without Reserve, the scheduler's cache would still show the same node as fully free for every pod, and they would all pile onto it. Reserve makes the cache reflect the scheduler's intent immediately, even before etcd has acknowledged anything.&lt;/p&gt;

&lt;p&gt;If anything fails after Reserve, &lt;code&gt;Unreserve&lt;/code&gt; rolls it back. The in-memory subtraction is undone and the node looks free again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 10: Permit (the gang-scheduling hook)
&lt;/h2&gt;

&lt;p&gt;Permit is a hook with three possible outcomes per plugin. &lt;code&gt;Approve&lt;/code&gt; lets the bind proceed (the default). &lt;code&gt;Wait&lt;/code&gt; parks the pod, waiting for an external signal. &lt;code&gt;Reject&lt;/code&gt; fails scheduling outright.&lt;/p&gt;

&lt;p&gt;A stock cluster has no Permit plugins registered, so most pods sail through. But Permit is the seam where gang scheduling lives. Kueue, Volcano, and Coscheduling all register Permit plugins, and the pattern is the same: when the first pod of a gang arrives, return &lt;code&gt;Wait&lt;/code&gt; and park it. When the last pod of the gang arrives, signal all the parked pods to proceed. They all bind together, atomically.&lt;/p&gt;

&lt;p&gt;Without Permit, gang scheduling on Kubernetes would be effectively impossible. You would have to bind each pod individually and then evict the rest when one failed. Permit lets you wait at the right point. before any pod is bound. so failures cost nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stages 11, 12, 13: PreBind, Bind, PostBind (commit and clean up)
&lt;/h2&gt;

&lt;p&gt;Permit returned &lt;code&gt;Approve&lt;/code&gt;. Three stages left, all of them short.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PreBind&lt;/strong&gt; is the last opportunity to do work before the API server is told. The biggest user is &lt;code&gt;VolumeBinding&lt;/code&gt;: for dynamically provisioned PVCs, this is where the PV is actually created and the PVC's &lt;code&gt;spec.volumeName&lt;/code&gt; is set. By the time Bind runs, the PVC is bound and ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bind&lt;/strong&gt; does the actual API call. The default is &lt;code&gt;DefaultBinder&lt;/code&gt;, which calls &lt;code&gt;pods.Bind()&lt;/code&gt; on the API server. a special endpoint that updates &lt;code&gt;spec.nodeName&lt;/code&gt; and creates a &lt;code&gt;Binding&lt;/code&gt; object. etcd persists it via Raft, followers fsync, and the pod is now officially assigned.&lt;/p&gt;

&lt;p&gt;The kubelet on the chosen node has been watching the API server for pods with its own &lt;code&gt;nodeName&lt;/code&gt;. The instant the bind lands, the kubelet's informer fires. The pod is no longer the scheduler's concern; it now belongs to a different deep-dive (image pull, runc, the five syscalls).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostBind&lt;/strong&gt; is cleanup. The scheduler removes the pod from its internal queue, and that scheduling cycle is done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The live demo, preemption in action
&lt;/h2&gt;

&lt;p&gt;Theory only carries so far. To watch the scheduler actually preempt a pod, we ran this against a real cluster (Kubernetes 1.36.1, three workers, one tainted). What follows are verbatim outputs from the live recording.&lt;/p&gt;

&lt;p&gt;The setup: three worker nodes, with &lt;code&gt;kube-worker-3&lt;/code&gt; tainted as a fake GPU node so the scheduler refuses to put general workloads there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
kube-cp-01      Ready    control-plane   41d   v1.36.1
kube-worker-1   Ready    &amp;lt;none&amp;gt;          41d   v1.36.1
kube-worker-2   Ready    &amp;lt;none&amp;gt;          41d   v1.36.1
kube-worker-3   Ready    &amp;lt;none&amp;gt;          12d   v1.36.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl describe node kube-worker-3 | grep -E 'Taints|cpu:|memory:'
Taints:             workload=gpu:NoSchedule
  cpu:                8
  memory:             32852Mi
  cpu:                7800m
  memory:             30100Mi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We deploy a regular nginx pod requesting eight CPU. It schedules cleanly onto &lt;code&gt;kube-worker-1&lt;/code&gt; and starts up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl apply -f nginx-pod.yaml
pod/nginx-demo created

$ kubectl get events --sort-by=.lastTimestamp | tail -5
LAST SEEN   TYPE     REASON      OBJECT           MESSAGE
6s          Normal   Scheduled   pod/nginx-demo   Successfully assigned default/nginx-demo to kube-worker-1
5s          Normal   Pulling     pod/nginx-demo   Pulling image "nginx:1.27"
3s          Normal   Pulled      pod/nginx-demo   Successfully pulled image "nginx:1.27" in 1.812s
2s          Normal   Created     pod/nginx-demo   Created container: nginx
2s          Normal   Started     pod/nginx-demo   Started container nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl get pod nginx-demo -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
nginx-demo   1/1     Running   0          18s   10.244.2.47  kube-worker-1   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cluster is now in a deliberately uncomfortable state. &lt;code&gt;kube-worker-1&lt;/code&gt; is mostly full. &lt;code&gt;kube-worker-2&lt;/code&gt; is similarly loaded. &lt;code&gt;kube-worker-3&lt;/code&gt; is empty but tainted. Then we apply a critical pod that asks for the same eight CPU, with priority &lt;code&gt;1,000,000&lt;/code&gt;, and no taint toleration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl apply -f payments-high-prio.yaml
pod/payments-critical created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first scheduling cycle has nothing to give it. Three nodes are insufficient, the fourth has the wrong taint. The scheduler turns to PostFilter, which walks each node looking for a preemption victim. The tainted node is no help. The non-tainted nodes each have a candidate to evict. The scheduler picks one, sets &lt;code&gt;nominatedNodeName&lt;/code&gt;, and gracefully evicts the lower-priority nginx pod.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl describe pod payments-critical | tail -14
QoS Class:        Guaranteed
Priority:         1000000
Priority Class:   high-priority-payments
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  14s   default-scheduler  0/4 nodes are available: 3 Insufficient cpu, 1 node(s) had untolerated taint {workload: gpu}. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for incoming pod.
  Normal   Preempted         9s    default-scheduler  Preempted by default/nginx-demo on node kube-worker-1
  Warning  FailedScheduling  9s    default-scheduler  0/4 nodes are available: 3 Insufficient cpu. preemption: 0/4 nodes are available.
  Normal   Scheduled         4s    default-scheduler  Successfully assigned default/payments-critical to kube-worker-1
  Normal   Pulling           3s    kubelet            Pulling image "payments:v2.4.1"
  Normal   Pulled            1s    kubelet            Successfully pulled image "payments:v2.4.1" in 1.632s
  Normal   Created           1s    kubelet            Created container: payments
  Normal   Started           1s    kubelet            Started container payments
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read those events carefully. There is a &lt;code&gt;FailedScheduling&lt;/code&gt; at 14s, then &lt;code&gt;Preempted by default/nginx-demo on node kube-worker-1&lt;/code&gt; at 9s, then another &lt;code&gt;FailedScheduling&lt;/code&gt; (the cycle right after preemption, where the nginx pod was still terminating), then &lt;code&gt;Scheduled&lt;/code&gt; at 4s. From request to running, on a real cluster, about ten seconds. That includes the graceful eviction of the victim, which is the slow part.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl get pods -o wide
NAME                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
payments-critical   1/1     Running   0          22s   10.244.2.58  kube-worker-1   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is preemption working as designed. A higher-priority pod arrives, the scheduler refuses to leave it pending when there is a lower-priority pod that could be moved, and the cluster reshuffles. No human intervention. No alert at 3 a.m.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three takeaways
&lt;/h2&gt;

&lt;p&gt;If only three things from this post stick with you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The scheduler is plugins all the way down.&lt;/strong&gt; Since 1.19, every meaningful decision is delegated to a plugin at one of thirteen extension points. You can write your own, disable the defaults, run multiple profiles in parallel. Volcano, Kueue, and Coscheduling exist because of this design. they did not have to fork the scheduler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Filter is binary; Score is weighted.&lt;/strong&gt; A single &lt;code&gt;Unschedulable&lt;/code&gt; verdict from any of fourteen Filter plugins kills a node's candidacy. But Score is a weighted vote across nine plugins, and the weights are not equal. &lt;code&gt;TaintToleration&lt;/code&gt; (×3) is the strongest single signal at scoring time, followed by the four ×2 plugins (&lt;code&gt;NodeAffinity&lt;/code&gt;, &lt;code&gt;PodTopologySpread&lt;/code&gt;, &lt;code&gt;InterPodAffinity&lt;/code&gt;, &lt;code&gt;DynamicResources&lt;/code&gt;). Weights matter much more than most engineers realize.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Reserve is why your scheduling is consistent.&lt;/strong&gt; When you scale a deployment from one to twenty replicas and they all hit the scheduling queue in the same one-second window, Reserve's in-memory subtraction is what stops them from piling onto the same node. The scheduler commits an opinion before the API server even confirms the bind, and that opinion is visible to the next pod's scheduling cycle immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to go from here
&lt;/h2&gt;

&lt;p&gt;The full scheduler walkthrough on YouTube has the live demo, every stage animated, the preemption flow shown end-to-end. Link is at the top of this post.&lt;/p&gt;

&lt;p&gt;If you want to step through it yourself rather than watch, the interactive at &lt;a href="https://kubernetes-explained.vercel.app/scheduler" rel="noopener noreferrer"&gt;https://kubernetes-explained.vercel.app/scheduler&lt;/a&gt; walks every internal step with annotations and lets you pause anywhere.&lt;/p&gt;

&lt;p&gt;Sources for every claim in this post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pkg/scheduler/apis/config/testing/defaults/defaults.go&lt;/code&gt;: plugin lists and weights&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pkg/scheduler/framework/plugins/&lt;/code&gt;: individual plugin implementations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pkg/scheduler/backend/queue/scheduling_queue.go&lt;/code&gt;: the three queues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go&lt;/code&gt;: the preemption algorithm&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;KEP-624. the scheduling framework graduation history&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;kubectl describe pod&lt;/code&gt; events shown in the demo above are verbatim from a real Kubernetes 1.36.1 cluster, captured for this post.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>scheduler</category>
      <category>internals</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
