<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pendela BhargavaSai</title>
    <description>The latest articles on DEV Community by Pendela BhargavaSai (@pendelabhargavasai).</description>
    <link>https://dev.to/pendelabhargavasai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F862755%2Fdccb0f1a-a7eb-46c5-a5c7-c0d4514eaae6.png</url>
      <title>DEV Community: Pendela BhargavaSai</title>
      <link>https://dev.to/pendelabhargavasai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pendelabhargavasai"/>
    <language>en</language>
    <item>
      <title>Kubernetes CNI Complete Guide: Flannel vs Cilium vs Calico + Cloud Provider CNIs</title>
      <dc:creator>Pendela BhargavaSai</dc:creator>
      <pubDate>Tue, 12 May 2026 03:30:00 +0000</pubDate>
      <link>https://dev.to/pendelabhargavasai/kubernetes-cni-complete-guide-flannel-vs-cilium-vs-calico-cloud-provider-cnis-5c6c</link>
      <guid>https://dev.to/pendelabhargavasai/kubernetes-cni-complete-guide-flannel-vs-cilium-vs-calico-cloud-provider-cnis-5c6c</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiu2b1dngo2gmr0srlif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiu2b1dngo2gmr0srlif.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;K3s&lt;/strong&gt; v1.29+  |  &lt;strong&gt;Flannel&lt;/strong&gt; v0.24+  |  &lt;strong&gt;Cilium&lt;/strong&gt; v1.15+  |  &lt;strong&gt;Calico&lt;/strong&gt; v3.27+  |  &lt;strong&gt;AWS VPC CNI&lt;/strong&gt; v1.18+  |  &lt;strong&gt;Azure CNI&lt;/strong&gt; v1.5+  |  &lt;strong&gt;GKE Dataplane V2&lt;/strong&gt; (Cilium-based)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A definitive comparison of every major Kubernetes CNI — open-source plugins (Flannel, Calico, Cilium, Weave, Antrea, Multus) and cloud-managed defaults (AWS VPC CNI on EKS, Azure CNI on AKS, and GKE's Dataplane V2 on GKE) — across architecture, performance, network policy, observability, encryption, and when to choose each.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CNI&lt;/th&gt;
&lt;th&gt;Identity&lt;/th&gt;
&lt;th&gt;Core Approach&lt;/th&gt;
&lt;th&gt;Default On&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🟢 &lt;strong&gt;Flannel&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Simple Overlay&lt;/td&gt;
&lt;td&gt;VXLAN tunnel, zero policy&lt;/td&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟠 &lt;strong&gt;Calico&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Policy Powerhouse&lt;/td&gt;
&lt;td&gt;BGP routing, iptables/eBPF&lt;/td&gt;
&lt;td&gt;Self-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔵 &lt;strong&gt;Cilium&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;eBPF Native&lt;/td&gt;
&lt;td&gt;Kernel eBPF, replaces kube-proxy&lt;/td&gt;
&lt;td&gt;GKE (Dataplane V2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟡 &lt;strong&gt;Weave Net&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Mesh Overlay&lt;/td&gt;
&lt;td&gt;Gossip-based mesh routing&lt;/td&gt;
&lt;td&gt;Self-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟣 &lt;strong&gt;Antrea&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;VMware-backed&lt;/td&gt;
&lt;td&gt;OVS dataplane, Antrea policies&lt;/td&gt;
&lt;td&gt;Self-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔶 &lt;strong&gt;AWS VPC CNI&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Cloud-native&lt;/td&gt;
&lt;td&gt;Native VPC IP assignment&lt;/td&gt;
&lt;td&gt;EKS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔷 &lt;strong&gt;Azure CNI&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Cloud-native&lt;/td&gt;
&lt;td&gt;Azure VNET IP assignment&lt;/td&gt;
&lt;td&gt;AKS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;♦️ &lt;strong&gt;GKE CNI / Dataplane V2&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Cloud-native + eBPF&lt;/td&gt;
&lt;td&gt;Cilium-based eBPF on GKE&lt;/td&gt;
&lt;td&gt;GKE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What Is a CNI?&lt;/li&gt;
&lt;li&gt;
Open Source CNIs

&lt;ul&gt;
&lt;li&gt;2.1 Flannel — Simple Overlay
&lt;/li&gt;
&lt;li&gt;2.2 Cilium — eBPF Native
&lt;/li&gt;
&lt;li&gt;2.3 Calico — BGP + Flexible Dataplane
&lt;/li&gt;
&lt;li&gt;2.4 Weave Net — Mesh Overlay
&lt;/li&gt;
&lt;li&gt;2.5 Antrea — OVS-based CNI
&lt;/li&gt;
&lt;li&gt;2.6 Multus — Meta CNI
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Cloud Provider CNIs

&lt;ul&gt;
&lt;li&gt;3.1 AWS VPC CNI — EKS Default
&lt;/li&gt;
&lt;li&gt;3.2 Azure CNI — AKS Default
&lt;/li&gt;
&lt;li&gt;3.3 GKE Dataplane V2 — GKE Default
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Plane Comparison&lt;/li&gt;
&lt;li&gt;Network Policy&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Performance Benchmarks&lt;/li&gt;
&lt;li&gt;Encryption&lt;/li&gt;
&lt;li&gt;Multi-Cluster&lt;/li&gt;
&lt;li&gt;Resource Usage&lt;/li&gt;
&lt;li&gt;Full Feature Comparison&lt;/li&gt;
&lt;li&gt;When to Choose Each&lt;/li&gt;
&lt;li&gt;K3s-Specific Setup&lt;/li&gt;
&lt;li&gt;Migration Guide on K3s&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. What Is a CNI and Why Does It Matter?
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Container Network Interface&lt;/strong&gt; (CNI) is the plugin layer every Kubernetes cluster depends on for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assigning IP addresses to pods from a defined CIDR range&lt;/li&gt;
&lt;li&gt;Creating virtual Ethernet (veth) pairs between pod namespaces and the host&lt;/li&gt;
&lt;li&gt;Programming cross-node routing so pods on Node A can reach pods on Node B&lt;/li&gt;
&lt;li&gt;Optionally enforcing &lt;code&gt;NetworkPolicy&lt;/code&gt; resources to control traffic flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud providers like AWS, Azure, and GCP have built proprietary CNI plugins that deeply integrate with their underlying VPC/VNET networking primitives — providing native IP assignment, cloud-aware routing, and tight integration with cloud IAM, load balancers, and security groups.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;K3s Key Flag&lt;/strong&gt;&lt;br&gt;
To replace the default CNI on K3s, install with &lt;code&gt;--flannel-backend=none --disable-network-policy&lt;/code&gt;. This leaves the CNI slot open for Calico or Cilium to fill.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Open Source CNIs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Flannel Simple Overlay
&lt;/h3&gt;

&lt;p&gt;Flannel's design philosophy: do one thing well. A user-space daemon (&lt;code&gt;flanneld&lt;/code&gt;) manages subnet allocation, while the kernel's own VXLAN and bridge code handles all actual forwarding. No policy, no observability — just connectivity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pod A (eth0: 10.244.0.2)          Pod B (eth0: 10.244.0.5)
        │                                  │
        │ veth pair                        │ veth pair
        ▼                                  ▼
           cni0 Linux bridge (kernel)
                    │
      iptables PREROUTING / FORWARD / POSTROUTING
                    │
         VXLAN encapsulation — UDP 8472
                    │
     flanneld (user-space) ← etcd / K8s API
                    │
          Physical NIC → Node B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyp31b0v2tfpvqia6vgb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyp31b0v2tfpvqia6vgb.png" alt="Fannel Architecture" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Available backends:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Transport&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vxlan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;UDP encap (default)&lt;/td&gt;
&lt;td&gt;Works across any network, even routers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;host-gw&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Direct routing&lt;/td&gt;
&lt;td&gt;Fastest, requires L2 adjacency between nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;wireguard-native&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Encrypted WireGuard tunnel&lt;/td&gt;
&lt;td&gt;When you need encryption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;udp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Legacy user-space&lt;/td&gt;
&lt;td&gt;Fallback only — very slow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Network Policy:&lt;/strong&gt; Flannel enforces zero NetworkPolicy. Resources are silently ignored. You must pair it with Calico (Canal) to get policy — adding a second DaemonSet, version compatibility risk, and split ownership between two projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flannel Encryption:&lt;/strong&gt; Flannel encrypts cross-node traffic only — pod-to-pod on the same node travels through the &lt;code&gt;cni0&lt;/code&gt; bridge unencrypted. No auto key rotation; restart &lt;code&gt;flanneld&lt;/code&gt; to rotate keys.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Network"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.244.0.0/16"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Backend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wireguard"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Dev/CI clusters, Raspberry Pi, edge nodes, K3s defaults.&lt;/p&gt;




&lt;h3&gt;
  
  
  2.2 Cilium — eBPF Native
&lt;/h3&gt;

&lt;p&gt;Cilium compiles and injects eBPF programs into the Linux kernel at TC/XDP hook points. There is no bridge, no iptables — packets are forwarded via &lt;code&gt;bpf_redirect()&lt;/code&gt; at line rate, and policy is enforced via O(1) BPF map lookups.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pod A (eth0)                         Pod B (eth0)
       │                                  │
       │ veth pair                        │
       ▼                                  ▼
TC eBPF hook ──── bpf_redirect() ──── TC eBPF hook
                  │
BPF maps: identity · policy · NAT · LB
                  │
cilium-agent — compiles eBPF, watches K8s API
                  │
  Physical NIC — GENEVE / native routing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftei3oiyqewc7l9s5tk6p.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftei3oiyqewc7l9s5tk6p.webp" alt="K8S Network vs Cilium" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Datapath modes:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Encapsulation&lt;/th&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tunnel: geneve&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GENEVE (default)&lt;/td&gt;
&lt;td&gt;Any network topology&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;native-routing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;L2 adjacency or BGP underlay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;wireguard&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;WireGuard transparent&lt;/td&gt;
&lt;td&gt;Kernel ≥ 5.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ipsec&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;IPsec&lt;/td&gt;
&lt;td&gt;FIPS-regulated environments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Network Policy:&lt;/strong&gt; 4.3 Cilium — L3 Through L7, No Sidecar&lt;/p&gt;

&lt;p&gt;Cilium enforces standard NetworkPolicy and extends it with &lt;code&gt;CiliumNetworkPolicy&lt;/code&gt; (CNP) for Layer 7 rules — no sidecar required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# CiliumNetworkPolicy — L7 HTTP rule&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cilium.io/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allow-get-only&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpointSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fromEndpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;
    &lt;span class="na"&gt;toPorts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080"&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GET&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/api/v1/.*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🔭 Cilium + Hubble
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Per-flow visibility on every packet&lt;/li&gt;
&lt;li&gt;✅ Live service dependency map (Hubble UI)&lt;/li&gt;
&lt;li&gt;✅ L7 HTTP / DNS / Kafka / gRPC flows&lt;/li&gt;
&lt;li&gt;✅ Drop reason per endpoint&lt;/li&gt;
&lt;li&gt;✅ Rich Prometheus metrics
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Hubble and UI&lt;/span&gt;
cilium hubble &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--ui&lt;/span&gt;

&lt;span class="c"&gt;# Watch live flows in a namespace&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--namespace&lt;/span&gt; production &lt;span class="nt"&gt;--follow&lt;/span&gt;

&lt;span class="c"&gt;# Show only policy drops with reason&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--verdict&lt;/span&gt; DROPPED &lt;span class="nt"&gt;--follow&lt;/span&gt;

&lt;span class="c"&gt;# Sample output:&lt;/span&gt;
&lt;span class="c"&gt;# 12:34:01: default/frontend → default/backend  FORWARDED  TCP:SYN&lt;/span&gt;
&lt;span class="c"&gt;# 12:34:02: default/attacker → default/backend  DROPPED    Policy denied&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cilium Encryption:&lt;/strong&gt; Cilium WireGuard + IPsec&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# WireGuard with strict mode (drops unencrypted packets)&lt;/span&gt;
cilium &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--encryption&lt;/span&gt; wireguard &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--encryption-strict-mode&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# IPsec for FIPS-regulated environments&lt;/span&gt;
cilium &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--encryption&lt;/span&gt; ipsec
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Large-scale production, L7 policy, observability (Hubble), zero-trust, multi-cluster.&lt;/p&gt;




&lt;h3&gt;
  
  
  2.3 Calico — BGP + Flexible Dataplane
&lt;/h3&gt;

&lt;p&gt;Calico uses &lt;strong&gt;BGP&lt;/strong&gt; (Border Gateway Protocol) to distribute pod routes across nodes — no encapsulation by default. Each node acts as a BGP peer, advertising its pod CIDR to other nodes and upstream routers. Calico's data plane is pluggable: iptables, eBPF, or even Windows HNS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pod A (eth0: 192.168.0.2)          Pod B (eth0: 192.168.1.2)
        │                                  │
        │ veth pair                        │ veth pair
        ▼                                  ▼
      Host routing table (no bridge needed)
                    │
      iptables / eBPF policy enforcement
                    │
     Felix (per-node agent) ← Typha (fan-out)
                    │
     BIRD (BGP daemon) — peers with other nodes
                    │
    Physical NIC — direct IP routing (no encap)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2yz0fuiiezgssjwfb85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2yz0fuiiezgssjwfb85.png" alt="Calico Architecture" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Calico components:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Felix&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-node agent; programs iptables/eBPF rules and routes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BIRD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open-source BGP daemon; advertises pod subnets to peers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typha&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fan-out proxy for the K8s datastore; recommended at 50+ nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;calico-kube-controllers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Garbage-collects stale Calico resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Network Policy:&lt;/strong&gt; 4.2 Calico — L3/L4 Policy Leader&lt;/p&gt;

&lt;p&gt;Calico is widely regarded as the gold standard for L3/L4 NetworkPolicy. It supports standard &lt;code&gt;NetworkPolicy&lt;/code&gt; resources plus its own &lt;code&gt;GlobalNetworkPolicy&lt;/code&gt; and &lt;code&gt;NetworkSet&lt;/code&gt; CRDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Calico GlobalNetworkPolicy — cluster-wide deny-all&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;projectcalico.org/v3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GlobalNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default-deny-all&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all()&lt;/span&gt;
  &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Egress&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Calico NetworkSet — group external CIDRs&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;projectcalico.org/v3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkSet&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trusted-external&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;nets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;203.0.113.0/24&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;198.51.100.0/24&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ Calico does &lt;strong&gt;not&lt;/strong&gt; support L7 HTTP/gRPC policy natively in OSS. For that you need its optional Envoy-based Application Layer Policy (ALP), which adds a sidecar and complexity.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Calico Encryption:&lt;/strong&gt; Calico supports WireGuard for node-to-node encryption, enabled with a single patch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl patch felixconfiguration default &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt; merge &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--patch&lt;/span&gt; &lt;span class="s1"&gt;'{"spec":{"wireguardEnabled":true}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Starting in Calico v3.26, same-node pod traffic encryption is also supported via host-to-pod WireGuard options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; BGP-integrated DCs, Windows node support, bare-metal L3, robust L3/L4 policy.&lt;/p&gt;




&lt;h3&gt;
  
  
  2.4 Weave Net — Mesh Overlay
&lt;/h3&gt;

&lt;p&gt;Weave Net uses a gossip protocol to build a full mesh topology between all cluster nodes without any central store. It wraps packets in a sleeve (VXLAN-like) tunnel and can optionally encrypt all traffic with NaCl. Weave is simpler to operate than Calico/Cilium but is no longer under active development (archived by Weaveworks in 2023).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pod A (eth0)
       │
    weave bridge
       │
  weave daemon (gossip mesh peer discovery)
       │
  Sleeve / Fast Datapath (VXLAN kernel bypass)
       │
    Node B weave daemon
       │
    Pod B (eth0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gossip — no external etcd needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Datapath&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sleeve (user-space) or Fast Datapath (kernel VXLAN)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Encryption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NaCl (enabled per-pod connection)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Standard K8s policy supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Status&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚠️ Archived/maintenance mode (use Cilium or Calico for new clusters)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important:&lt;/strong&gt; Weaveworks ceased active development in 2023. Weave Net is community-maintained but no longer receives feature updates. It is &lt;strong&gt;not recommended&lt;/strong&gt; for new clusters — migrate to Cilium or Calico.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Legacy clusters already running Weave with migration on the roadmap.&lt;/p&gt;




&lt;h3&gt;
  
  
  2.5 Antrea — OVS-based CNI
&lt;/h3&gt;

&lt;p&gt;Antrea is a CNI backed by VMware (now Broadcom) that uses &lt;strong&gt;Open vSwitch (OVS)&lt;/strong&gt; as its dataplane. It supports both Linux and Windows nodes and provides its own &lt;code&gt;AntreaNetworkPolicy&lt;/code&gt; and &lt;code&gt;ClusterNetworkPolicy&lt;/code&gt; CRDs with tiered policy enforcement. Antrea integrates well with NSX-T for enterprise SD-WAN environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pod A (eth0)
       │
   OVS (Open vSwitch) bridge
       │
   antrea-agent (per-node DaemonSet)
       │
   antrea-controller (centralized)
       │
   Encap: Geneve / VXLAN / GRE (configurable)
       │
   Node B OVS bridge → Pod B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Antrea&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dataplane&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open vSwitch (OVS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windows support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Full (OVS on Windows)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ K8s standard + AntreaNetworkPolicy CRDs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tiered policy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ (Emergency / Security / Application tiers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Encryption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ IPsec / WireGuard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Antrea Octant plugin, Prometheus metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NSX-T integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Enterprise add-on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;eBPF support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ AntreaProxy (partial eBPF)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; VMware/NSX-T environments, Windows-heavy clusters, tiered network policy.&lt;/p&gt;




&lt;h3&gt;
  
  
  2.6 Multus — Meta CNI
&lt;/h3&gt;

&lt;p&gt;Multus is not a standalone CNI — it is a &lt;strong&gt;meta CNI&lt;/strong&gt; that allows pods to attach multiple network interfaces simultaneously. A pod can have its primary network (managed by Flannel/Calico/Cilium) and secondary interfaces (SR-IOV, DPDK, Macvlan) for specialized workloads like telco NFV or HPC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pod with Multiple NICs:
  eth0 (primary) ← Flannel/Calico/Cilium (cluster network)
  net1 (secondary) ← SR-IOV (high-throughput direct NIC)
  net2 (secondary) ← Macvlan (storage network)

Multus reads NetworkAttachmentDefinition CRDs and delegates
to the correct CNI for each interface.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NetworkAttachmentDefinition for secondary interface&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s.cni.cncf.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkAttachmentDefinition&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sriov-net&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"type": "sriov",&lt;/span&gt;
      &lt;span class="s"&gt;"name": "sriov-net",&lt;/span&gt;
      &lt;span class="s"&gt;"ipam": { "type": "static" }&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Telco/NFV workloads, HPC, pods that need to straddle multiple network segments.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Cloud Provider CNIs
&lt;/h2&gt;

&lt;p&gt;Cloud-managed Kubernetes services ship their own CNI plugins that are deeply integrated with the underlying cloud networking fabric. These provide first-class VPC routing, cloud IAM integration, and managed lifecycle — but are typically locked to their respective cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 AWS VPC CNI — EKS Default
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Amazon EKS&lt;/strong&gt; uses the &lt;strong&gt;Amazon VPC CNI plugin&lt;/strong&gt; (&lt;code&gt;aws-node&lt;/code&gt; DaemonSet) by default. Instead of an overlay, it assigns &lt;strong&gt;real VPC secondary IP addresses&lt;/strong&gt; directly to pods from Elastic Network Interfaces (ENIs) attached to the worker node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Worker Node (EC2 instance)
    │
    ├── Primary ENI (node IP: 10.0.1.10)
    │      └── eth0
    │
    ├── Secondary ENI (attached by vpc-cni)
    │      ├── 10.0.1.20 → Pod A (eth0 via veth)
    │      ├── 10.0.1.21 → Pod B (eth0 via veth)
    │      └── 10.0.1.22 → Pod C (eth0 via veth)
    │
    └── vpc-cni (aws-node DaemonSet)
           manages ENI lifecycle via EC2 API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How pod IPs work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each EC2 instance can attach multiple ENIs; each ENI holds multiple secondary IPs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vpc-cni&lt;/code&gt; pre-warms a pool of secondary IPs per node via EC2 API calls&lt;/li&gt;
&lt;li&gt;Pods receive a real VPC IP — &lt;strong&gt;routable natively&lt;/strong&gt; across the VPC, peered VPCs, VPNs, and Direct Connect — with no overlay&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pod density limits per node (examples):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance Type&lt;/th&gt;
&lt;th&gt;Max ENIs&lt;/th&gt;
&lt;th&gt;Max IPs (pod limit)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;t3.medium&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;m5.large&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;m5.xlarge&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;m5.4xlarge&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;234&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;c5.18xlarge&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;750&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important:&lt;/strong&gt; Default pod density is capped by the ENI/IP limit per instance type. For IP-constrained environments, use &lt;strong&gt;VPC CNI with prefix delegation&lt;/strong&gt; (&lt;code&gt;ENABLE_PREFIX_DELEGATION=true&lt;/code&gt;) to assign /28 prefixes instead of individual IPs, dramatically increasing pod density.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;AWS VPC CNI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IP assignment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native VPC secondary IPs from ENIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overlay&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗ None — native VPC routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗ Not built-in — requires Calico or Cilium add-on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security Groups&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Security Groups for Pods (SGP) — per-pod AWS SGs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IPv6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prefix delegation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ /28 prefix per ENI (more pods per node)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windows nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom networking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Pods in different subnet than node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;eBPF acceleration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ via Cilium add-on (EKS + Cilium mode)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Enabling Network Policy on EKS:&lt;/strong&gt;&lt;br&gt;
AWS VPC CNI itself does not enforce NetworkPolicy. You must add one of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calico&lt;/strong&gt; (most common) — install as an add-on alongside vpc-cni&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cilium in chained mode&lt;/strong&gt; — replaces policy enforcement, keeps VPC IP routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon VPC CNI Network Policy&lt;/strong&gt; (AWS-native, GA as of 2024) — uses eBPF for policy enforcement
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable AWS-native network policy controller (EKS add-on)&lt;/span&gt;
aws eks create-addon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; my-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--addon-name&lt;/span&gt; vpc-cni &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--configuration-values&lt;/span&gt; &lt;span class="s1"&gt;'{"nodeAgent":{"enablePolicyEventLogs":"true"}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;When to choose AWS VPC CNI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Running EKS — it is the default and AWS-managed&lt;/li&gt;
&lt;li&gt;✅ Need pods directly reachable from on-premises via Direct Connect / VPN&lt;/li&gt;
&lt;li&gt;✅ Need per-pod AWS Security Groups (SGP feature)&lt;/li&gt;
&lt;li&gt;✅ Compliance requires no overlay network&lt;/li&gt;
&lt;li&gt;⚠️ Watch instance type ENI limits for large pod densities&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  3.2 Azure CNI — AKS Default
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Azure Kubernetes Service (AKS)&lt;/strong&gt; offers multiple CNI modes. The default for most production clusters is &lt;strong&gt;Azure CNI&lt;/strong&gt;, which assigns pod IPs directly from the Azure Virtual Network (VNET) subnet — similar in concept to AWS VPC CNI but using Azure's networking primitives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AKS CNI Modes:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Default?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;kubenet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic overlay; nodes get VNET IPs, pods get private overlay IPs (NAT)&lt;/td&gt;
&lt;td&gt;Legacy default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure CNI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pods get real VNET IPs from a pre-allocated subnet&lt;/td&gt;
&lt;td&gt;Current recommended default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure CNI Overlay&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pods get overlay IPs (larger scale, fewer VNET IPs needed)&lt;/td&gt;
&lt;td&gt;Recommended for large clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Azure CNI + Cilium&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Azure CNI routing + Cilium eBPF dataplane + Hubble&lt;/td&gt;
&lt;td&gt;Recommended for policy/observability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bring Your Own CNI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Disable Azure CNI; install Calico, Flannel, etc.&lt;/td&gt;
&lt;td&gt;Advanced&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Azure CNI (traditional):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AKS Worker Node (Azure VM)
    │
    ├── Primary NIC (node IP: 10.240.0.4)
    │      └── VNET: 10.240.0.0/16
    │
    └── Pod IPs pre-allocated from subnet:
           ├── 10.240.0.10 → Pod A
           ├── 10.240.0.11 → Pod B
           └── 10.240.0.12 → Pod C

azure-vnet (CNI plugin) programs routes in Azure SDN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure CNI Overlay (recommended for scale):&lt;/strong&gt;&lt;br&gt;
Introduced to solve IP exhaustion. Pods get IPs from a private overlay CIDR (e.g., 10.244.0.0/16) while nodes get real VNET IPs. Azure SDN handles the translation — no overlay encap at the packet level from the VM's perspective.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create AKS cluster with Azure CNI Overlay + Cilium dataplane&lt;/span&gt;
az aks create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; myRG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; myAKS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-plugin&lt;/span&gt; azure &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-plugin-mode&lt;/span&gt; overlay &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-dataplane&lt;/span&gt; cilium &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 192.168.0.0/16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;kubenet&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;Azure CNI Overlay&lt;/th&gt;
&lt;th&gt;Azure CNI + Cilium&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pod IPs&lt;/td&gt;
&lt;td&gt;Overlay (NAT)&lt;/td&gt;
&lt;td&gt;Real VNET IPs&lt;/td&gt;
&lt;td&gt;Overlay (Azure SDN)&lt;/td&gt;
&lt;td&gt;Overlay (Azure SDN)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP exhaustion risk&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct pod routing&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (via Azure SDN)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NetworkPolicy&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Azure Network Policy / Calico&lt;/td&gt;
&lt;td&gt;Azure NP / Calico&lt;/td&gt;
&lt;td&gt;✅ Cilium (eBPF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows nodes&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hubble observability&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max pods/node&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Network Policy options on AKS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure Network Policy Manager (NPM)&lt;/strong&gt; — iptables-based, Azure-native, limited feature set&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calico&lt;/strong&gt; — add-on, full L3/L4 policy, most commonly used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cilium&lt;/strong&gt; — available with Azure CNI Overlay mode, eBPF enforcement + Hubble&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to choose Azure CNI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Running AKS — Azure CNI Overlay is the modern recommended choice&lt;/li&gt;
&lt;li&gt;✅ Need pods directly reachable from on-premises via ExpressRoute&lt;/li&gt;
&lt;li&gt;✅ Want Hubble observability → use Azure CNI Overlay + Cilium dataplane&lt;/li&gt;
&lt;li&gt;✅ Large clusters (100+ nodes) → use Overlay mode to avoid VNET IP exhaustion&lt;/li&gt;
&lt;li&gt;⚠️ Traditional Azure CNI requires pre-allocating pod IPs per node — plan subnet size carefully&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3.3 GKE Dataplane V2 — GKE Default
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Google Kubernetes Engine (GKE)&lt;/strong&gt; introduced &lt;strong&gt;Dataplane V2&lt;/strong&gt; in 2021, which is based on &lt;strong&gt;Cilium's eBPF engine&lt;/strong&gt;. It is the default for new GKE clusters and brings production-grade eBPF networking, built-in NetworkPolicy enforcement, and a subset of Hubble observability — all managed by Google.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GKE networking modes:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Default?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legacy (iptables)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;kube-proxy + iptables, no Dataplane V2&lt;/td&gt;
&lt;td&gt;Older clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dataplane V2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cilium eBPF, managed by GKE, no full Cilium control plane&lt;/td&gt;
&lt;td&gt;Default for new clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dataplane V2 + Hubble&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same + network telemetry via Hubble&lt;/td&gt;
&lt;td&gt;Optional add-on&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GKE Node (GCE VM)
    │
    ├── Alias IP range (VPC-native pod CIDRs)
    │     Pods get real VPC IPs, routed via Google SDN
    │
    └── Dataplane V2 (Cilium eBPF engine)
           ├── TC eBPF hooks on veth interfaces
           ├── BPF maps for policy, NAT, LB
           ├── kube-proxy replaced by eBPF
           └── Hubble telemetry (if enabled)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GKE uses &lt;strong&gt;VPC-native networking&lt;/strong&gt; (alias IP ranges) — pods get real VPC CIDRs routed natively through Google's Andromeda SDN. Dataplane V2 sits on top, adding eBPF policy enforcement and observability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabling Dataplane V2 on GKE:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create GKE cluster with Dataplane V2 (default for new clusters)&lt;/span&gt;
gcloud container clusters create my-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-dataplane-v2&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-ip-alias&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; us-central1

&lt;span class="c"&gt;# Enable Hubble observability add-on&lt;/span&gt;
gcloud container clusters update my-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-dataplane-v2-flow-observability&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;GKE Dataplane V2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dataplane&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cilium eBPF (managed subset)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;kube-proxy replacement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ eBPF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ eBPF-enforced (L3/L4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FQDN policy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ (GKE 1.28+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hubble observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Optional add-on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L7 policy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚠️ Not exposed (managed limitations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pod IPs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real VPC IPs (alias ranges)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windows nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ via GKE Fleet / Anthos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Managed lifecycle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Google manages upgrades&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Dataplane V2 vs self-managed Cilium on GKE:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;GKE Dataplane V2&lt;/th&gt;
&lt;th&gt;Self-managed Cilium on GKE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Management&lt;/td&gt;
&lt;td&gt;Google-managed&lt;/td&gt;
&lt;td&gt;You manage Helm values/upgrades&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature exposure&lt;/td&gt;
&lt;td&gt;Subset of Cilium&lt;/td&gt;
&lt;td&gt;Full Cilium feature set&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hubble&lt;/td&gt;
&lt;td&gt;Basic (add-on)&lt;/td&gt;
&lt;td&gt;Full Hubble UI + Relay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cluster Mesh&lt;/td&gt;
&lt;td&gt;✗ (use GKE Fleet)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L7 CNP&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support&lt;/td&gt;
&lt;td&gt;GKE SLA&lt;/td&gt;
&lt;td&gt;Community / Isovalent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;GKE Recommendation:&lt;/strong&gt; For most workloads, &lt;strong&gt;Dataplane V2 is the right choice&lt;/strong&gt; — Google manages it, it's eBPF-based, and it covers L3/L4 policy. If you need full CiliumNetworkPolicy L7 rules or Cluster Mesh, consider self-managed Cilium on GKE with &lt;code&gt;--network-plugin=cni&lt;/code&gt; and disabling kube-proxy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;When to choose GKE Dataplane V2:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Running GKE — it is the default and Google-managed&lt;/li&gt;
&lt;li&gt;✅ Want eBPF performance without managing Cilium yourself&lt;/li&gt;
&lt;li&gt;✅ NetworkPolicy enforcement at scale (eBPF O(1) lookups)&lt;/li&gt;
&lt;li&gt;✅ Need basic Hubble network telemetry&lt;/li&gt;
&lt;li&gt;⚠️ For full L7 policy or Cluster Mesh, self-manage Cilium on GKE instead&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Data Plane Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Service Scalability — All CNIs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Services&lt;/th&gt;
&lt;th&gt;Flannel (iptables)&lt;/th&gt;
&lt;th&gt;Calico (iptables)&lt;/th&gt;
&lt;th&gt;Calico (eBPF)&lt;/th&gt;
&lt;th&gt;Cilium (eBPF)&lt;/th&gt;
&lt;th&gt;AWS VPC CNI&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;GKE DPv2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;~10 ms&lt;/td&gt;
&lt;td&gt;~10 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;~10 ms&lt;/td&gt;
&lt;td&gt;~10 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;~80 ms&lt;/td&gt;
&lt;td&gt;~80 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;~80 ms&lt;/td&gt;
&lt;td&gt;~80 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;~800 ms&lt;/td&gt;
&lt;td&gt;~800 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;~800 ms&lt;/td&gt;
&lt;td&gt;~800 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;td&gt;⚠️ drops&lt;/td&gt;
&lt;td&gt;⚠️ drops&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;td&gt;⚠️ drops&lt;/td&gt;
&lt;td&gt;⚠️ drops&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5. Network Policy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Policy Feature Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Policy Feature&lt;/th&gt;
&lt;th&gt;Flannel&lt;/th&gt;
&lt;th&gt;Calico&lt;/th&gt;
&lt;th&gt;Cilium&lt;/th&gt;
&lt;th&gt;Weave&lt;/th&gt;
&lt;th&gt;Antrea&lt;/th&gt;
&lt;th&gt;AWS VPC CNI&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;GKE DPv2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard NetworkPolicy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (add-on)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Egress Policy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GlobalNetworkPolicy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ CCNP&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ ClusterNetworkPolicy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FQDN / DNS policy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (1.28+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L7 HTTP method/path&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;⚠️ ALP&lt;/td&gt;
&lt;td&gt;✅ no sidecar&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka / gRPC policy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiered policy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security Groups (cloud)&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ SGP&lt;/td&gt;
&lt;td&gt;✅ NSG&lt;/td&gt;
&lt;td&gt;✅ Firewall rules&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  6. Observability
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Flannel&lt;/th&gt;
&lt;th&gt;Calico&lt;/th&gt;
&lt;th&gt;Cilium&lt;/th&gt;
&lt;th&gt;Weave&lt;/th&gt;
&lt;th&gt;Antrea&lt;/th&gt;
&lt;th&gt;AWS VPC CNI&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;GKE DPv2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L3/L4 flow logs&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ VPC Flow Logs&lt;/td&gt;
&lt;td&gt;✅ NSG Flow Logs&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L7 HTTP flows&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗ (OSS)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Live service map&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ Hubble UI&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ Octant&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (add-on)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drop reason&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus metrics&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ Rich&lt;/td&gt;
&lt;td&gt;✅ Basic&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ CloudWatch&lt;/td&gt;
&lt;td&gt;✅ Azure Monitor&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Built-in UI&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗ (OSS)&lt;/td&gt;
&lt;td&gt;✅ Hubble UI&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ Octant&lt;/td&gt;
&lt;td&gt;✅ CloudWatch&lt;/td&gt;
&lt;td&gt;✅ Azure Monitor&lt;/td&gt;
&lt;td&gt;✅ Cloud Console&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  7. Performance Benchmarks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  TCP Throughput — iperf3, Pod-to-Pod Same Node
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CNI&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Flannel&lt;/td&gt;
&lt;td&gt;VXLAN&lt;/td&gt;
&lt;td&gt;~8 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flannel&lt;/td&gt;
&lt;td&gt;host-gw&lt;/td&gt;
&lt;td&gt;~9.5 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calico&lt;/td&gt;
&lt;td&gt;BGP direct (iptables)&lt;/td&gt;
&lt;td&gt;~9.3 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calico&lt;/td&gt;
&lt;td&gt;BGP direct (eBPF)&lt;/td&gt;
&lt;td&gt;~9.7 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cilium&lt;/td&gt;
&lt;td&gt;GENEVE tunnel&lt;/td&gt;
&lt;td&gt;~8.5 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cilium&lt;/td&gt;
&lt;td&gt;native-routing&lt;/td&gt;
&lt;td&gt;~9.8 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cilium&lt;/td&gt;
&lt;td&gt;XDP&lt;/td&gt;
&lt;td&gt;line rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS VPC CNI&lt;/td&gt;
&lt;td&gt;Native VPC routing&lt;/td&gt;
&lt;td&gt;~9.5 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure CNI&lt;/td&gt;
&lt;td&gt;Native VNET routing&lt;/td&gt;
&lt;td&gt;~9.4 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GKE Dataplane V2&lt;/td&gt;
&lt;td&gt;Alias IP + eBPF&lt;/td&gt;
&lt;td&gt;~9.7 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Results are representative — hardware, kernel version, and NIC driver all affect real-world numbers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  p99 Latency — Same Node
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CNI&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;p99 Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Flannel&lt;/td&gt;
&lt;td&gt;VXLAN&lt;/td&gt;
&lt;td&gt;~0.35 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flannel&lt;/td&gt;
&lt;td&gt;host-gw&lt;/td&gt;
&lt;td&gt;~0.18 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calico&lt;/td&gt;
&lt;td&gt;BGP direct (eBPF)&lt;/td&gt;
&lt;td&gt;~0.15 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cilium&lt;/td&gt;
&lt;td&gt;native-routing&lt;/td&gt;
&lt;td&gt;~0.16 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS VPC CNI&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;td&gt;~0.17 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure CNI&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;td&gt;~0.18 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GKE Dataplane V2&lt;/td&gt;
&lt;td&gt;eBPF&lt;/td&gt;
&lt;td&gt;~0.15 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  8. Encryption
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Flannel WG&lt;/th&gt;
&lt;th&gt;Calico WG&lt;/th&gt;
&lt;th&gt;Cilium WG&lt;/th&gt;
&lt;th&gt;Cilium IPsec&lt;/th&gt;
&lt;th&gt;Antrea WG/IPsec&lt;/th&gt;
&lt;th&gt;AWS CNI&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;GKE DPv2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cross-node encryption&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (NLB/TLS)&lt;/td&gt;
&lt;td&gt;✅ (Azure Firewall)&lt;/td&gt;
&lt;td&gt;✅ (WireGuard, beta)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Same-node encryption&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (v3.26+)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strict drop mode&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto key rotation&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Managed&lt;/td&gt;
&lt;td&gt;Managed&lt;/td&gt;
&lt;td&gt;Managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FIPS compliance&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ IPsec&lt;/td&gt;
&lt;td&gt;✅ (AWS FIPS)&lt;/td&gt;
&lt;td&gt;✅ (Azure FIPS)&lt;/td&gt;
&lt;td&gt;✅ (Google FIPS)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  9. Multi-Cluster
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Flannel&lt;/th&gt;
&lt;th&gt;Calico&lt;/th&gt;
&lt;th&gt;Cilium&lt;/th&gt;
&lt;th&gt;Antrea&lt;/th&gt;
&lt;th&gt;AWS EKS&lt;/th&gt;
&lt;th&gt;Azure AKS&lt;/th&gt;
&lt;th&gt;GKE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Native multi-cluster&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ BGP&lt;/td&gt;
&lt;td&gt;✅ Cluster Mesh&lt;/td&gt;
&lt;td&gt;✅ Antrea Multi-cluster&lt;/td&gt;
&lt;td&gt;✅ EKS Connector&lt;/td&gt;
&lt;td&gt;✅ AKS Fleet&lt;/td&gt;
&lt;td&gt;✅ GKE Fleet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unified service DNS&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️ (manual)&lt;/td&gt;
&lt;td&gt;⚠️ (manual)&lt;/td&gt;
&lt;td&gt;✅ (Anthos)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-cluster NetworkPolicy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗ (OSS)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (Anthos)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-cluster observability&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ Hubble&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ CloudWatch&lt;/td&gt;
&lt;td&gt;✅ Azure Monitor&lt;/td&gt;
&lt;td&gt;✅ Cloud Ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max clusters&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  10. Resource Usage
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Flannel&lt;/th&gt;
&lt;th&gt;Calico&lt;/th&gt;
&lt;th&gt;Cilium&lt;/th&gt;
&lt;th&gt;Weave&lt;/th&gt;
&lt;th&gt;Antrea&lt;/th&gt;
&lt;th&gt;AWS VPC CNI&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;GKE DPv2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DaemonSet CPU (idle)&lt;/td&gt;
&lt;td&gt;~5 mCPU&lt;/td&gt;
&lt;td&gt;~20–60 mCPU&lt;/td&gt;
&lt;td&gt;~30–80 mCPU&lt;/td&gt;
&lt;td&gt;~10–30 mCPU&lt;/td&gt;
&lt;td&gt;~20–50 mCPU&lt;/td&gt;
&lt;td&gt;~10–25 mCPU&lt;/td&gt;
&lt;td&gt;~10–30 mCPU&lt;/td&gt;
&lt;td&gt;~30–80 mCPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DaemonSet RAM (idle)&lt;/td&gt;
&lt;td&gt;~30 MB&lt;/td&gt;
&lt;td&gt;~60–150 MB&lt;/td&gt;
&lt;td&gt;~100–300 MB&lt;/td&gt;
&lt;td&gt;~50–100 MB&lt;/td&gt;
&lt;td&gt;~50–100 MB&lt;/td&gt;
&lt;td&gt;~30–80 MB&lt;/td&gt;
&lt;td&gt;~40–80 MB&lt;/td&gt;
&lt;td&gt;~100–300 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Startup time&lt;/td&gt;
&lt;td&gt;~5s&lt;/td&gt;
&lt;td&gt;~10–20s&lt;/td&gt;
&lt;td&gt;~30–60s&lt;/td&gt;
&lt;td&gt;~10s&lt;/td&gt;
&lt;td&gt;~10–15s&lt;/td&gt;
&lt;td&gt;~5–10s&lt;/td&gt;
&lt;td&gt;~5–10s&lt;/td&gt;
&lt;td&gt;Managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional CRDs&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;~8&lt;/td&gt;
&lt;td&gt;~15&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;~10&lt;/td&gt;
&lt;td&gt;0–2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum kernel&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Any / ≥5.3 (eBPF)&lt;/td&gt;
&lt;td&gt;≥4.9&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;GKE-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operator required&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ tigera&lt;/td&gt;
&lt;td&gt;✅ cilium-operator&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ antrea-controller&lt;/td&gt;
&lt;td&gt;AWS-managed&lt;/td&gt;
&lt;td&gt;Azure-managed&lt;/td&gt;
&lt;td&gt;GKE-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  11. Full Feature Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Flannel&lt;/th&gt;
&lt;th&gt;Calico&lt;/th&gt;
&lt;th&gt;Cilium&lt;/th&gt;
&lt;th&gt;Weave&lt;/th&gt;
&lt;th&gt;Antrea&lt;/th&gt;
&lt;th&gt;AWS VPC CNI&lt;/th&gt;
&lt;th&gt;Azure CNI&lt;/th&gt;
&lt;th&gt;GKE DPv2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data plane&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bridge + iptables&lt;/td&gt;
&lt;td&gt;BGP + iptables/eBPF&lt;/td&gt;
&lt;td&gt;eBPF kernel-native&lt;/td&gt;
&lt;td&gt;Mesh sleeve/VXLAN&lt;/td&gt;
&lt;td&gt;OVS&lt;/td&gt;
&lt;td&gt;VPC native&lt;/td&gt;
&lt;td&gt;VNET native&lt;/td&gt;
&lt;td&gt;eBPF (Cilium)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;kube-proxy replacement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (eBPF)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ AntreaProxy&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Encapsulation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VXLAN&lt;/td&gt;
&lt;td&gt;None/IPIP/VXLAN&lt;/td&gt;
&lt;td&gt;GENEVE&lt;/td&gt;
&lt;td&gt;Sleeve/VXLAN&lt;/td&gt;
&lt;td&gt;Geneve/VXLAN&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BGP routing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ native&lt;/td&gt;
&lt;td&gt;✅ optional&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L3/L4 NetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (add-on)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L7 HTTP/gRPC policy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;⚠️ ALP&lt;/td&gt;
&lt;td&gt;✅ no sidecar&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FQDN-based policy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (1.28+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GlobalNetworkPolicy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ CCNP&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ CNP&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flow observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ flow logs&lt;/td&gt;
&lt;td&gt;✅ Hubble&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ Octant&lt;/td&gt;
&lt;td&gt;✅ VPC Flow&lt;/td&gt;
&lt;td&gt;✅ NSG Flow&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L7 flow visibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗ (OSS)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-node encryption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ WG&lt;/td&gt;
&lt;td&gt;✅ WG&lt;/td&gt;
&lt;td&gt;✅ WG/IPsec&lt;/td&gt;
&lt;td&gt;✅ NaCl&lt;/td&gt;
&lt;td&gt;✅ WG/IPsec&lt;/td&gt;
&lt;td&gt;Cloud-layer&lt;/td&gt;
&lt;td&gt;Cloud-layer&lt;/td&gt;
&lt;td&gt;✅ WG (beta)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Same-node encryption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ (v3.26+)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FIPS encryption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ IPsec&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ IPsec&lt;/td&gt;
&lt;td&gt;✅ (AWS)&lt;/td&gt;
&lt;td&gt;✅ (Azure)&lt;/td&gt;
&lt;td&gt;✅ (GCP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅ BGP&lt;/td&gt;
&lt;td&gt;✅ Cluster Mesh&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;EKS Fleet&lt;/td&gt;
&lt;td&gt;AKS Fleet&lt;/td&gt;
&lt;td&gt;GKE Fleet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windows nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;✅ HNS&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;GKE&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;EKS&lt;/td&gt;
&lt;td&gt;AKS&lt;/td&gt;
&lt;td&gt;GKE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM per node (idle)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~30 MB&lt;/td&gt;
&lt;td&gt;~60–150 MB&lt;/td&gt;
&lt;td&gt;~100–300 MB&lt;/td&gt;
&lt;td&gt;~50–100 MB&lt;/td&gt;
&lt;td&gt;~50–100 MB&lt;/td&gt;
&lt;td&gt;~30–80 MB&lt;/td&gt;
&lt;td&gt;~40–80 MB&lt;/td&gt;
&lt;td&gt;~100–300 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operational complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium–High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low (managed)&lt;/td&gt;
&lt;td&gt;Low (managed)&lt;/td&gt;
&lt;td&gt;Low (managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active development&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️ Archived&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  12. When to Choose Each
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🟢 Choose Flannel when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Dev, CI, or home lab cluster with no production traffic&lt;/li&gt;
&lt;li&gt;✅ No NetworkPolicy requirement whatsoever&lt;/li&gt;
&lt;li&gt;✅ RAM-constrained nodes (Raspberry Pi, 1 GB edge devices)&lt;/li&gt;
&lt;li&gt;✅ You want the absolute lowest operational overhead&lt;/li&gt;
&lt;li&gt;✅ Running a legacy kernel (RHEL 7 / CentOS 7)&lt;/li&gt;
&lt;li&gt;✅ Already using a service mesh (Istio, Linkerd) for policy and observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🟠 Choose Calico when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ NetworkPolicy is required and Cilium feels like overkill&lt;/li&gt;
&lt;li&gt;✅ You need BGP peering with upstream physical routers&lt;/li&gt;
&lt;li&gt;✅ Windows nodes exist in your cluster&lt;/li&gt;
&lt;li&gt;✅ No-encap direct routing is preferred for performance&lt;/li&gt;
&lt;li&gt;✅ Your team already has Calico expertise&lt;/li&gt;
&lt;li&gt;✅ Medium cluster size (10–200 nodes) with moderate policy complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔵 Choose Cilium when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ L7 HTTP/gRPC/Kafka policy without a service mesh sidecar&lt;/li&gt;
&lt;li&gt;✅ Hubble observability and a live service map are needed&lt;/li&gt;
&lt;li&gt;✅ 100+ services with high service churn (eBPF O(1) matters)&lt;/li&gt;
&lt;li&gt;✅ End-to-end pod traffic encryption including same-node&lt;/li&gt;
&lt;li&gt;✅ Multi-cluster federation with unified DNS and policy&lt;/li&gt;
&lt;li&gt;✅ Building toward zero-trust networking inside the cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🟡 Choose Weave when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Generally not recommended for new clusters&lt;/strong&gt; — Weaveworks is archived&lt;/li&gt;
&lt;li&gt;✅ Only if migrating from an existing Weave deployment with no immediate migration path&lt;/li&gt;
&lt;li&gt;✅ Simple overlay needed with built-in NaCl encryption (short term)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🟣 Choose Antrea when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ VMware NSX-T / Tanzu environment requiring deep SD-WAN integration&lt;/li&gt;
&lt;li&gt;✅ Tiered network policy enforcement (Emergency / Security / Application tiers)&lt;/li&gt;
&lt;li&gt;✅ Windows and Linux mixed clusters in an enterprise VMware stack&lt;/li&gt;
&lt;li&gt;✅ OVS dataplane is a hard requirement (telco, NFV)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔶 Choose AWS VPC CNI (EKS) when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Running EKS — it is the default AWS-recommended CNI&lt;/li&gt;
&lt;li&gt;✅ Pods must be natively routable across VPC, VPN, or Direct Connect&lt;/li&gt;
&lt;li&gt;✅ Per-pod AWS Security Groups are required (SGP feature)&lt;/li&gt;
&lt;li&gt;✅ Compliance mandates no overlay network&lt;/li&gt;
&lt;li&gt;✅ Integrate with AWS services that need pod-level VPC routing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔷 Choose Azure CNI (AKS) when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Running AKS — use Azure CNI Overlay mode for most production workloads&lt;/li&gt;
&lt;li&gt;✅ Pods need to be reachable from on-prem via ExpressRoute&lt;/li&gt;
&lt;li&gt;✅ Want eBPF performance + Hubble → choose Azure CNI Overlay + Cilium dataplane&lt;/li&gt;
&lt;li&gt;✅ Large clusters → Azure CNI Overlay avoids VNET IP exhaustion&lt;/li&gt;
&lt;li&gt;✅ Windows node support is required (all Azure CNI modes support it)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ♦️ Choose GKE Dataplane V2 (GKE) when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Running GKE — it is the default for new clusters&lt;/li&gt;
&lt;li&gt;✅ Want eBPF-based policy without managing Cilium yourself&lt;/li&gt;
&lt;li&gt;✅ Need Hubble network telemetry (enable as add-on)&lt;/li&gt;
&lt;li&gt;✅ FQDN-based NetworkPolicy (GKE 1.28+)&lt;/li&gt;
&lt;li&gt;✅ Google-managed lifecycle and upgrades are preferred&lt;/li&gt;
&lt;li&gt;⚠️ For L7 CNP or Cluster Mesh, self-manage Cilium on GKE instead&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  13. K3s-Specific Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Flannel — Built-In, Nothing to Do
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Flannel ships with K3s — just install&lt;/span&gt;
curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.k3s.io | sh -

&lt;span class="c"&gt;# Change backend in /etc/rancher/k3s/config.yaml&lt;/span&gt;
flannel-backend: host-gw   &lt;span class="c"&gt;# vxlan | host-gw | wireguard-native | none&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Installing Calico on K3s
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Install K3s without Flannel:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.k3s.io | &lt;span class="nv"&gt;INSTALL_K3S_EXEC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--flannel-backend=none &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  --disable-network-policy &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  --cluster-cidr=192.168.0.0/16"&lt;/span&gt; sh -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2 — Install Calico operator:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/tigera-operator.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3 — Apply Installation CR:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;operator.tigera.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Installation&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;calicoNetwork&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ipPools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;192.168.0.0/16&lt;/span&gt;
      &lt;span class="na"&gt;encapsulation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VXLANCrossSubnet&lt;/span&gt;
      &lt;span class="na"&gt;natOutgoing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enabled&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Installing Cilium on K3s
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Install K3s without Flannel:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.k3s.io | &lt;span class="nv"&gt;INSTALL_K3S_EXEC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--flannel-backend=none &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  --disable-network-policy &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  --disable=servicelb"&lt;/span&gt; sh -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2 — Install Cilium via Helm:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add cilium https://helm.cilium.io/
helm &lt;span class="nb"&gt;install &lt;/span&gt;cilium cilium/cilium &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; kube-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; operator.replicas&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;kubeProxyReplacement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;k8sServiceHost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_K3S_API_IP&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;k8sServicePort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; bpf.masquerade&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; ipam.mode&lt;span class="o"&gt;=&lt;/span&gt;kubernetes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Minimum Kernel Requirements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Cilium&lt;/th&gt;
&lt;th&gt;Calico eBPF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic CNI&lt;/td&gt;
&lt;td&gt;≥ 4.9&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;kube-proxy replacement&lt;/td&gt;
&lt;td&gt;≥ 5.2&lt;/td&gt;
&lt;td&gt;≥ 5.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WireGuard encryption&lt;/td&gt;
&lt;td&gt;≥ 5.6&lt;/td&gt;
&lt;td&gt;≥ 5.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XDP acceleration&lt;/td&gt;
&lt;td&gt;≥ 5.10&lt;/td&gt;
&lt;td&gt;≥ 5.10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Ubuntu 22.04 ships kernel 5.15, Debian 12 ships 6.1, Raspberry Pi OS Bookworm ships 6.1 — all satisfy every requirement.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  14. Migration Guide on K3s
&lt;/h2&gt;

&lt;p&gt;All migrations follow the same pattern:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;drain → clean CNI state → restart K3s with &lt;code&gt;--flannel-backend=none&lt;/code&gt; → install new CNI → uncordon&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Flannel → Calico
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Drain the node&lt;/span&gt;
kubectl drain &amp;lt;node&amp;gt; &lt;span class="nt"&gt;--ignore-daemonsets&lt;/span&gt; &lt;span class="nt"&gt;--delete-emptydir-data&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: Remove Flannel state on the node&lt;/span&gt;
systemctl stop k3s
ip &lt;span class="nb"&gt;link &lt;/span&gt;delete flannel.1 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
&lt;/span&gt;ip &lt;span class="nb"&gt;link &lt;/span&gt;delete cni0 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/cni /etc/cni/net.d

&lt;span class="c"&gt;# Step 3: Set flannel-backend: none in /etc/rancher/k3s/config.yaml, then restart&lt;/span&gt;
systemctl start k3s

&lt;span class="c"&gt;# Step 4: Install Calico operator&lt;/span&gt;
kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/tigera-operator.yaml

&lt;span class="c"&gt;# Step 5: Uncordon&lt;/span&gt;
kubectl uncordon &amp;lt;node&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Flannel → Cilium
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Steps 1–3 same as above (drain, clean, restart with flannel-backend=none)&lt;/span&gt;

&lt;span class="c"&gt;# Step 4: Install Cilium&lt;/span&gt;
helm repo add cilium https://helm.cilium.io/
helm &lt;span class="nb"&gt;install &lt;/span&gt;cilium cilium/cilium &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; kube-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;kubeProxyReplacement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;k8sServiceHost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;API_IP&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;k8sServicePort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6443

&lt;span class="c"&gt;# Step 5: Uncordon&lt;/span&gt;
kubectl uncordon &amp;lt;node&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Pro Tip:&lt;/strong&gt; For single-node K3s lab environments, a clean reinstall is always faster and safer than a live migration. Run &lt;code&gt;k3s-uninstall.sh&lt;/code&gt;, reinstall with the correct flags, then Helm install your chosen CNI — total time is about 10 minutes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  15. Conclusion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Open-Source CNIs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🟢 Flannel&lt;/strong&gt; — A masterpiece of minimalism. One job, done perfectly, with near-zero operational overhead. The right choice when simplicity and RAM constraints matter more than policy or observability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🟠 Calico&lt;/strong&gt; — The policy-first CNI. BGP-native routing, mature L3/L4 NetworkPolicy, Windows node support, and a pluggable data plane. The right choice when you need robust policy enforcement, prefer no-encap routing, or operate in an environment with existing BGP infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔵 Cilium&lt;/strong&gt; — The platform CNI. eBPF-native with O(1) service lookup, L7-aware policy with no sidecar, Hubble observability, full pod-traffic encryption, and Cluster Mesh multi-cluster. The most capable networking layer available in Kubernetes today.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🟡 Weave Net&lt;/strong&gt; — Once a popular choice for simplicity and built-in encryption. Now archived — migrate to Cilium or Calico for any new or long-running cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🟣 Antrea&lt;/strong&gt; — The VMware-native CNI. OVS dataplane, tiered policy, Windows support, and NSX-T integration. The right choice in Tanzu or NSX environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔷 Multus&lt;/strong&gt; — Not a CNI replacement but a CNI multiplier. Essential for telco/NFV workloads needing multiple pod network interfaces.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud Provider CNIs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔶 AWS VPC CNI (EKS)&lt;/strong&gt; — Native VPC IP assignment with no overlay. Pods are first-class VPC citizens. Add Calico or the AWS-native policy controller for NetworkPolicy. Choose prefix delegation for high pod density.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔷 Azure CNI (AKS)&lt;/strong&gt; — Use &lt;strong&gt;Azure CNI Overlay&lt;/strong&gt; for most production workloads to avoid IP exhaustion, and add the &lt;strong&gt;Cilium dataplane&lt;/strong&gt; for eBPF policy + Hubble observability. Azure CNI traditional still works, but requires careful subnet pre-planning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;♦️ GKE Dataplane V2 (GKE)&lt;/strong&gt; — Google's managed Cilium eBPF layer. The default for new GKE clusters. Handles NetworkPolicy at scale with eBPF O(1) lookups. Add the Hubble observability add-on for network telemetry. Self-manage Cilium on GKE only if you need L7 CNP or Cluster Mesh.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; If you run a managed Kubernetes service, use the cloud-default CNI and layer policy/observability on top. If you run self-managed clusters, Cilium is the most capable long-term investment, with Calico as the pragmatic choice if BGP integration or Windows nodes are required.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The networking layer of your cluster is not where you want to cut corners at scale.&lt;br&gt;
&lt;strong&gt;Choose based on where your cluster is going — not just where it is today.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.cilium.io/en/stable/installation/k3s/" rel="noopener noreferrer"&gt;Cilium K3s Installation Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cilium.io/blog/2021/05/11/cni-benchmark/" rel="noopener noreferrer"&gt;Cilium Network Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.tigera.io/calico/latest/getting-started/kubernetes/k3s/" rel="noopener noreferrer"&gt;Calico on K3s — Official Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/flannel-io/flannel" rel="noopener noreferrer"&gt;Flannel GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mvallim.github.io/kubernetes-under-the-hood/documentation/kube-flannel.html" rel="noopener noreferrer"&gt;Flannel Networking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/antrea-io/antrea" rel="noopener noreferrer"&gt;Antrea GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws/amazon-vpc-cni-k8s" rel="noopener noreferrer"&gt;AWS VPC CNI GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/aks/azure-cni-overlay" rel="noopener noreferrer"&gt;Azure CNI Overlay Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2" rel="noopener noreferrer"&gt;GKE Dataplane V2 Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cilium.io/en/stable/network/clustermesh/" rel="noopener noreferrer"&gt;Cilium Cluster Mesh Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.tigera.io/calico/latest/reference/resources/globalnetworkpolicy" rel="noopener noreferrer"&gt;Calico GlobalNetworkPolicy Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://antrea.io/docs/main/docs/antrea-network-policy/" rel="noopener noreferrer"&gt;Antrea Network Policy Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Written for K3s v1.29+, Cilium v1.15+, Calico v3.27+, Flannel v0.24+, AWS VPC CNI v1.18+, Azure CNI v1.5+, GKE 1.28+. Benchmark figures are representative — always test with your own hardware and workload before production decisions.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>networking</category>
      <category>cni</category>
      <category>devops</category>
    </item>
    <item>
      <title>🔴 Supply Chain Attacks Are Breaking the Internet in 2026 — Every Major Hack Explained</title>
      <dc:creator>Pendela BhargavaSai</dc:creator>
      <pubDate>Tue, 05 May 2026 04:00:00 +0000</pubDate>
      <link>https://dev.to/pendelabhargavasai/supply-chain-attacks-are-breaking-the-internet-in-2026-every-major-hack-explained-3bln</link>
      <guid>https://dev.to/pendelabhargavasai/supply-chain-attacks-are-breaking-the-internet-in-2026-every-major-hack-explained-3bln</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Your vulnerability scanner is hacking you. Your password manager got weaponized. Your AI coding tool is the new attack surface. Welcome to 2026.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeo5gx8i2y64sy3z9z3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeo5gx8i2y64sy3z9z3i.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Year Everything Became a Weapon
&lt;/h2&gt;

&lt;p&gt;In 2025, supply chain attacks were a concern. In 2026, they became the dominant threat vector in software security.&lt;/p&gt;

&lt;p&gt;The numbers are staggering: a single compromised maintainer account poisoned a library with &lt;strong&gt;100 million weekly downloads&lt;/strong&gt;. A misconfigured CI/CD workflow cascaded into &lt;strong&gt;five separate tool compromises&lt;/strong&gt; within days. A developer downloaded Roblox exploit scripts, and that mistake eventually exposed &lt;strong&gt;Vercel's internal database&lt;/strong&gt; — which was listed for sale at $2 million on BreachForums.&lt;/p&gt;

&lt;p&gt;This isn't theoretical risk. This is what happened between January and April 2026.&lt;/p&gt;

&lt;p&gt;In this post, I'm going to break down every major supply chain attack that hit the IT and software ecosystem this year — what got compromised, how the attackers did it, what the real blast radius looked like, and most importantly, &lt;strong&gt;what you need to do right now&lt;/strong&gt; to protect your pipelines.&lt;/p&gt;

&lt;p&gt;Let's start with what a supply chain attack actually is — because most explanations bury the lead.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Software Supply Chain Attack?
&lt;/h2&gt;

&lt;p&gt;Here's the mental model that matters:&lt;/p&gt;

&lt;p&gt;Instead of breaking into your house, the attacker bribes your locksmith.&lt;/p&gt;

&lt;p&gt;When you run &lt;code&gt;npm install&lt;/code&gt; or &lt;code&gt;pip install&lt;/code&gt;, you're implicitly trusting thousands of strangers who maintain open-source packages. You're trusting their accounts, their CI/CD pipelines, their GitHub credentials, and their judgment. Every single one of those trust relationships is an attack surface.&lt;/p&gt;

&lt;p&gt;A supply chain attack exploits that trust. Instead of targeting you directly — which requires defeating your firewall, your endpoint detection, your access controls — attackers target the &lt;strong&gt;supplier&lt;/strong&gt;. Compromise one maintainer account, and you've just compromised every developer who installs that package.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The attack chain looks like this:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. `Identify` a maintainer of a widely-used package
2. Phish their npm/GitHub credentials, or exploit a misconfigured CI/CD workflow
3. Push backdoored versions — the malware runs at install time or on startup
4. Harvest: cloud credentials, SSH keys, API tokens, Kubernetes configs
5. Cascade: use stolen tokens to compromise more repos, more pipelines, more packages
6. Monetize: ransomware, data sale on BreachForums, cryptomining
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The asymmetry is what makes this so devastating. The attacker breaks in once, at one point in the supply chain, and inherits access to thousands of downstream organizations simultaneously.&lt;/p&gt;

&lt;p&gt;Now let's talk about what actually happened in 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  January 2026 — Cisco Unified Communications Zero-Day (CVE-2026-20045)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What got compromised
&lt;/h3&gt;

&lt;p&gt;Cisco's entire enterprise voice stack: Unified Communications Manager, IM &amp;amp; Presence Service, Unity Connection, and Webex Calling Dedicated Instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it happened
&lt;/h3&gt;

&lt;p&gt;A critical zero-day in the web-based management interface allowed unauthenticated remote attackers to send crafted HTTP requests and execute arbitrary commands on the underlying OS — then escalate straight to &lt;strong&gt;root&lt;/strong&gt;. No credentials needed. No user interaction required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it's a supply chain risk (not just a vulnerability)
&lt;/h3&gt;

&lt;p&gt;This one is subtler than the package ecosystem attacks below, but it's a textbook supply chain risk: &lt;strong&gt;managed service providers&lt;/strong&gt;. Thousands of organizations outsource their voice and UC infrastructure to third parties. If your managed service provider is running vulnerable Cisco UC components, your business communications become a pivot point into your environment — even if your own perimeter is airtight.&lt;/p&gt;

&lt;p&gt;This is the definition of inherited risk. You didn't deploy the vulnerable software. You didn't configure it. But you're exposed because you trusted someone who did.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to protect yourself
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Apply Cisco's emergency patch immediately (see &lt;a href="https://tools.cisco.com/security/center/publicationListing.x" rel="noopener noreferrer"&gt;Cisco Security Advisory cisco-sa-20260115-uc&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Implement continuous vendor monitoring — when a critical advisory drops, you need instant visibility into which of your vendors is exposed&lt;/li&gt;
&lt;li&gt;Restrict management interface access to known IP ranges only&lt;/li&gt;
&lt;li&gt;Map which applications and data flows depend on your vendors' UC components so you can assess blast radius before an attack, not after&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  February 2026 — GitHub Actions: The Misconfiguration That Started Everything
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What got compromised
&lt;/h3&gt;

&lt;p&gt;This is the origin point of the largest multi-tool supply chain campaign of 2026. A threat actor operating under the GitHub handle &lt;strong&gt;hackerbot-claw&lt;/strong&gt; (account created February 20, 2026) ran an automated campaign scanning public repositories for a specific GitHub Actions misconfiguration: the &lt;code&gt;pull_request_target&lt;/code&gt; event trigger with excessive token permissions.&lt;/p&gt;

&lt;p&gt;On &lt;strong&gt;February 27–28&lt;/strong&gt;, the attacker successfully exploited this misconfiguration in Aqua Security's Trivy repository, exfiltrating the &lt;code&gt;aqua-bot&lt;/code&gt; service account's Personal Access Token (PAT). This PAT had write access to release automation — which is everything the attacker needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it happened
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pull_request_target&lt;/code&gt; workflow is a GitHub Actions feature that lets CI pipelines trigger automatically on pull requests from external contributors. The problem: when misconfigured, external code gets access to the repository's internal secrets. The workflow essentially hands an untrusted contributor the keys to your pipeline.&lt;/p&gt;

&lt;p&gt;Aqua detected the intrusion and attempted credential rotation. But here's the critical failure: &lt;strong&gt;the rotation was not atomic&lt;/strong&gt;. Sequential token replacement left a window during which newly issued tokens may have been captured. As Aqua's VP of Open Source, Itay Shakury, later confirmed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We rotated secrets and tokens, but the process wasn't atomic, and attackers may have been privy to refreshed tokens."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This residual access enabled everything that followed in March.&lt;/p&gt;

&lt;h3&gt;
  
  
  The lesson about &lt;code&gt;pull_request_target&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is a well-documented dangerous pattern, but it keeps getting deployed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ⚠️ DANGEROUS — external PRs can access your secrets&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request_target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opened&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ci&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;  &lt;span class="c1"&gt;# ← This is the mistake&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ SAFE — pin to SHA, restrict permissions&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ci&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;  &lt;span class="c1"&gt;# minimum required&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to protect yourself
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Never&lt;/strong&gt; use &lt;code&gt;pull_request_target&lt;/code&gt; with write permissions for workflows triggered by external contributors&lt;/li&gt;
&lt;li&gt;Pin all GitHub Actions to full 40-character commit SHAs — not version tags (more on why this matters below)&lt;/li&gt;
&lt;li&gt;Rotate credentials atomically — revoke all, reissue all, in a single synchronized operation&lt;/li&gt;
&lt;li&gt;Limit service account tokens to minimum required permissions and scope&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  March 2026 — The Month Everything Went Wrong
&lt;/h2&gt;

&lt;p&gt;March 2026 will go down as the most significant month in software supply chain history. Five major compromises. One threat group. A cascade that went from a misconfigured GitHub workflow to a ransomware operation targeting 1,000+ enterprise SaaS environments.&lt;/p&gt;

&lt;p&gt;Let me break each one down.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔍 Trivy (Aqua Security) — March 19–20, 2026
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;CVE-2026-33634 | Severity: CRITICAL&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What happened
&lt;/h4&gt;

&lt;p&gt;At approximately 17:43 UTC on March 19, 2026, an attacker with residual access from the February compromise force-pushed malicious code to &lt;strong&gt;75 of 77 version tags&lt;/strong&gt; in &lt;code&gt;aquasecurity/trivy-action&lt;/code&gt; — the official GitHub Action for Trivy, one of the most widely deployed open-source vulnerability scanners in the world.&lt;/p&gt;

&lt;p&gt;Simultaneously, all 7 tags in &lt;code&gt;aquasecurity/setup-trivy&lt;/code&gt; were poisoned, and a weaponized Trivy binary (v0.69.4) was published to GitHub Releases, Docker Hub, GHCR, ECR Public, and deb/rpm repositories.&lt;/p&gt;

&lt;p&gt;Safe versions: only &lt;code&gt;trivy-action v0.35.0&lt;/code&gt;, &lt;code&gt;setup-trivy v0.2.6&lt;/code&gt;, and &lt;code&gt;trivy v0.69.3&lt;/code&gt; were unaffected.&lt;/p&gt;

&lt;h4&gt;
  
  
  The attack was elegant and terrifying
&lt;/h4&gt;

&lt;p&gt;The malicious &lt;code&gt;entrypoint.sh&lt;/code&gt; ran the credential-harvesting payload &lt;strong&gt;first&lt;/strong&gt;, then ran the legitimate Trivy scan. Workflows completed normally. No errors. No indication of compromise. Developers watching their CI logs saw a clean vulnerability scan — while their secrets were being exfiltrated in the background.&lt;/p&gt;

&lt;p&gt;The malware (named "TeamPCP Cloud Stealer") performed three operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dumped &lt;code&gt;Runner.Worker&lt;/code&gt; process memory to extract GitHub PATs and CI secrets&lt;/li&gt;
&lt;li&gt;Swept SSH keys, cloud credentials (AWS, GCP, Azure), Kubernetes tokens, Docker configs, Git credentials&lt;/li&gt;
&lt;li&gt;Encrypted the bundle with AES-256 + RSA-4096 and exfiltrated to attacker-controlled servers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the primary C2 channel failed, the malware fell back to &lt;strong&gt;creating a repository called &lt;code&gt;tpcp-docs&lt;/code&gt; inside the victim's own GitHub organization&lt;/strong&gt; to store stolen secrets. Check your org for that repo right now.&lt;/p&gt;

&lt;h4&gt;
  
  
  The forensic tells (that most teams missed)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Each malicious commit had an impossible timestamp:&lt;/span&gt;
&lt;span class="gh"&gt;# - Claimed to be from 2021/2022&lt;/span&gt;
&lt;span class="gh"&gt;# - But parent commit was dated March 2026&lt;/span&gt;

&lt;span class="gh"&gt;# Additionally:&lt;/span&gt;
&lt;span class="gh"&gt;# - Only entrypoint.sh was modified per commit&lt;/span&gt;
&lt;span class="gh"&gt;# - Original commits touched multiple files&lt;/span&gt;
&lt;span class="gh"&gt;# - GitHub's "Immutable" release badge was present (but meaningless)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  How to protect yourself
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ VULNERABLE — tag can be rewritten silently&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@v0.34.2&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ SECURE — commit SHA is immutable&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@f781cce5aab226378d021711787766a7d423d18d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If you ran Trivy between 17:43 and 23:13 UTC on March 19, 2026:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search your GitHub org for any repo named &lt;code&gt;tpcp-docs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Check DNS/network logs for connections to &lt;code&gt;scan.aquasecurtiy[.]org&lt;/code&gt; (note the typo — deliberate)&lt;/li&gt;
&lt;li&gt;Check for connections to &lt;code&gt;45.148.10.212&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Treat all CI/CD secrets from that window as fully compromised — rotate everything&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🧠 LiteLLM — March 24, 2026
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Severity: CRITICAL | ~3.4M daily downloads | 40-minute exposure window&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What happened
&lt;/h4&gt;

&lt;p&gt;LiteLLM is a Python package providing a unified interface for 100+ LLM APIs — OpenAI, Anthropic, Google, AWS Bedrock, Azure OpenAI. Because it sits between your applications and multiple AI providers, it has access to API keys and cloud credentials for all of them. That's exactly why it was targeted.&lt;/p&gt;

&lt;p&gt;The compromise was a cascade from Trivy: LiteLLM's CI/CD pipeline used Trivy for security scanning. When Trivy was poisoned on March 19, the malware in LiteLLM's pipeline exfiltrated its &lt;strong&gt;PyPI publish token&lt;/strong&gt; to TeamPCP. Five days later, attackers used that token to upload two malicious versions directly to PyPI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;litellm==1.82.7&lt;/code&gt; — published 10:39 UTC&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;litellm==1.82.8&lt;/code&gt; — published 10:52 UTC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both were live for approximately &lt;strong&gt;40 minutes&lt;/strong&gt; before PyPI quarantined them. During that window, they accumulated tens of thousands of downloads.&lt;/p&gt;

&lt;h4&gt;
  
  
  The 3-stage payload
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Stage 1: Credential Harvesting
# Exfiltrates to models.litellm.cloud (attacker-controlled, not official BerriAI domain)
&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM API keys (OpenAI, Anthropic, Google...)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cloud credentials (AWS, GCP, Azure)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SSH keys, shell history, .env files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Crypto wallets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kubernetes configs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Stage 2: Kubernetes Lateral Movement
# Deploys privileged DaemonSets → full cluster access
&lt;/span&gt;
&lt;span class="c1"&gt;# Stage 3: Persistence
# Installs ~/.config/systemd/user/sysmon.service
# Polls attacker server for additional payloads
# Survives package removal
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;.pth&lt;/code&gt; file mechanism in &lt;code&gt;v1.82.8&lt;/code&gt; was particularly nasty: it placed a &lt;code&gt;litellm_init.pth&lt;/code&gt; file that executed on &lt;strong&gt;every Python interpreter startup&lt;/strong&gt; — meaning the payload fired even when LiteLLM wasn't explicitly imported.&lt;/p&gt;

&lt;h4&gt;
  
  
  Disclosure suppression
&lt;/h4&gt;

&lt;p&gt;When the community opened GitHub issue #24512 to report the compromise, TeamPCP deployed &lt;strong&gt;88 bots from 73 unique compromised developer accounts in a 102-second window&lt;/strong&gt; to spam the thread. They used the compromised maintainer account to close the issue as "not planned." This is one of the first documented uses of AI-assisted bot networks for supply chain attack disclosure suppression.&lt;/p&gt;

&lt;h4&gt;
  
  
  Immediate action
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if you're affected&lt;/span&gt;
pip show litellm | &lt;span class="nb"&gt;grep &lt;/span&gt;Version
&lt;span class="c"&gt;# v1.82.7 or v1.82.8 = COMPROMISED&lt;/span&gt;

&lt;span class="c"&gt;# Check for persistence&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; ~/.config/systemd/user/sysmon.service
&lt;span class="nb"&gt;ls&lt;/span&gt; ~/.config/sysmon/sysmon.py

&lt;span class="c"&gt;# In Kubernetes&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"node-setup"&lt;/span&gt;

&lt;span class="c"&gt;# Purge cache&lt;/span&gt;
pip cache purge
&lt;span class="c"&gt;# or&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; ~/.cache/uv

&lt;span class="c"&gt;# Safe version&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;litellm&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;1.82.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  📦 Axios (npm) — March 30–31, 2026
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Severity: CRITICAL | ~100M weekly downloads | Attributed: UNC1069 (North Korea)&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What happened
&lt;/h4&gt;

&lt;p&gt;Axios is one of the most depended-upon libraries in the JavaScript ecosystem. At the time of the attack, it was present in approximately 80% of cloud and code environments. The attack didn't exploit any code vulnerability — it was a straightforward account takeover.&lt;/p&gt;

&lt;p&gt;Attackers compromised the npm account of &lt;strong&gt;jasonsaayman&lt;/strong&gt;, Axios's primary maintainer, by changing the account's associated email from &lt;code&gt;jasonsaayman@gmail.com&lt;/code&gt; to &lt;code&gt;ifstap@proton.me&lt;/code&gt;. This bypassed the GitHub Actions OIDC publish flow entirely.&lt;/p&gt;

&lt;p&gt;The attack timeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-03-30 05:57 UTC — plain-crypto-js@4.2.0 published (clean decoy, builds registry history)
2026-03-30 23:59 UTC — plain-crypto-js@4.2.1 published (malicious postinstall backdoor)
2026-03-31 00:21 UTC — axios@1.14.1 published (MALICIOUS, tagged: latest)
2026-03-31 01:00 UTC — axios@0.30.4 published (MALICIOUS, tagged: legacy)
2026-03-31 03:29 UTC — Detected and removed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;39 minutes. Two malicious versions. Both tagged as the default install.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  The payload
&lt;/h4&gt;

&lt;p&gt;The malicious dependency &lt;code&gt;plain-crypto-js&lt;/code&gt; contained a &lt;code&gt;postinstall&lt;/code&gt; hook that silently downloaded and executed platform-specific stage-2 RAT implants from &lt;code&gt;sfrclak[.]com:8000&lt;/code&gt;. Cross-platform: macOS, Windows, Linux.&lt;/p&gt;

&lt;p&gt;Google's Threat Intelligence Group attributed this to &lt;strong&gt;UNC1069&lt;/strong&gt;, a financially motivated North Korean threat actor. OpenAI was sufficiently exposed via Axios's dependency chain that it revoked its macOS code-signing certificate on March 31, 2026 as a precaution.&lt;/p&gt;

&lt;h4&gt;
  
  
  Check your lockfiles now
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check for compromised versions&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"axios.*(1&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;14&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;1|0&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;30&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;4)"&lt;/span&gt; package-lock.json
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"plain-crypto-js"&lt;/span&gt; package-lock.json yarn.lock bun.lockb

&lt;span class="c"&gt;# Safe versions&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;axios@1.14.0  &lt;span class="c"&gt;# Last legitimate 1.x with SLSA provenance&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  How to protect yourself
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Enable phishing-resistant MFA on npm, GitHub, and all cloud platforms — no exceptions&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;npm ci&lt;/code&gt; with strict lockfiles instead of &lt;code&gt;npm install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Monitor npm for maintainer email changes on critical dependencies&lt;/li&gt;
&lt;li&gt;Audit and block postinstall scripts in CI environments where possible&lt;/li&gt;
&lt;li&gt;Never run &lt;code&gt;npm install&lt;/code&gt; on production systems from ephemeral runners without lockfile pinning&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🤖 Anthropic Claude Code — March 31, 2026
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Severity: HIGH | ~512,000 lines of proprietary source code | Root cause: Human error&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What happened
&lt;/h4&gt;

&lt;p&gt;This one is different from the others — it wasn't a malicious actor compromising a third party. Anthropic accidentally shipped the &lt;strong&gt;entire source code of Claude Code&lt;/strong&gt; to the public npm registry.&lt;/p&gt;

&lt;p&gt;When Anthropic published &lt;code&gt;@anthropic-ai/claude-code&lt;/code&gt; version 2.1.88, a missing exclusion rule in the build configuration caused a 59.8 MB JavaScript source map file (&lt;code&gt;cli.js.map&lt;/code&gt;) to be bundled into the package. That source map pointed to a zip archive on Anthropic's Cloudflare R2 storage containing the full, unobfuscated TypeScript source — &lt;strong&gt;512,000 lines across 1,906 files&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Security researcher Chaofan Shou spotted it on X within hours. By the time Anthropic pulled the package at ~08:00 UTC, the code had been downloaded from their own cloud storage, mirrored to GitHub, and forked tens of thousands of times.&lt;/p&gt;

&lt;h4&gt;
  
  
  What was exposed
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Complete multi-agent orchestration architecture&lt;/li&gt;
&lt;li&gt;Self-healing memory system (&lt;code&gt;MEMORY.md&lt;/code&gt; architecture with lazy-load topic files)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Undercover Mode"&lt;/strong&gt; — suppresses Anthropic-internal metadata in commits to public repos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-distillation controls&lt;/strong&gt; — injects fake tool definitions into API responses to poison competitor training data&lt;/li&gt;
&lt;li&gt;44 feature flags, including an unreleased Tamagotchi easter egg planned for April 1–7&lt;/li&gt;
&lt;li&gt;Bidirectional CLI-to-IDE communication layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The cascading danger
&lt;/h4&gt;

&lt;p&gt;The leak coincided — entirely coincidentally — with the Axios RAT attack. Anyone who updated Claude Code via npm between &lt;strong&gt;00:21 and 03:29 UTC on March 31&lt;/strong&gt; may have simultaneously pulled a trojanized version of Axios.&lt;/p&gt;

&lt;p&gt;Additionally, attackers immediately registered npm packages mimicking Anthropic's internal tooling (&lt;code&gt;audio-capture-napi&lt;/code&gt;, &lt;code&gt;color-diff-napi&lt;/code&gt;, &lt;code&gt;image-processor-napi&lt;/code&gt;) to stage dependency confusion attacks against developers trying to compile the leaked source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not download, fork, build, or run any GitHub repository claiming to be "leaked Claude Code."&lt;/strong&gt; Many of these repositories are active malware lures delivering Vidar Stealer and GhostSocks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Anthropic's official statement
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Earlier today, a Claude Code release included some internal source code. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach. We're rolling out measures to prevent this from happening again."&lt;/em&gt;&lt;br&gt;
— Anthropic Spokesperson&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  What this means for your build pipeline
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# The failure point: Bun generates source maps by default.&lt;/span&gt;
&lt;span class="gh"&gt;# A single missing line in build config exposed 512K lines of IP.&lt;/span&gt;

&lt;span class="gh"&gt;# Lesson: Add this to your CI/CD pre-publish checklist:&lt;/span&gt;
✓ Verify .npmignore excludes &lt;span class="err"&gt;*&lt;/span&gt;.map files
✓ Verify &lt;span class="sb"&gt;`files`&lt;/span&gt; field in package.json is allowlist-based, not denylist
✓ Run &lt;span class="sb"&gt;`npm pack --dry-run`&lt;/span&gt; and inspect the manifest before every publish
✓ Set up automated secret/source scanning on all npm publish workflows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  April 2026 — The Attacks Keep Coming
&lt;/h2&gt;




&lt;h3&gt;
  
  
  ▲ Vercel — April 19, 2026
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Severity: CRITICAL | Entry point: AI productivity tool | Dwell time: ~2 months&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What happened
&lt;/h4&gt;

&lt;p&gt;This attack is a masterclass in how OAuth trust relationships create invisible lateral movement paths.&lt;/p&gt;

&lt;p&gt;The chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;February 2026&lt;/strong&gt;: A Context.ai employee downloaded Roblox game exploit scripts. Those scripts installed &lt;strong&gt;Lumma Stealer&lt;/strong&gt; malware.&lt;/li&gt;
&lt;li&gt;Lumma Stealer exfiltrated the employee's Google Workspace OAuth tokens.&lt;/li&gt;
&lt;li&gt;Context.ai's Chrome Extension had been granted full Google Drive read access by users during onboarding.&lt;/li&gt;
&lt;li&gt;A Vercel enterprise employee had used Context.ai and connected their Vercel Google account.&lt;/li&gt;
&lt;li&gt;Attackers pivoted from the stolen tokens → Context.ai's AWS environment → OAuth tokens for their product → the Vercel employee's workspace → Vercel's internal systems.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Vercel disclosed the breach on &lt;strong&gt;April 19, 2026&lt;/strong&gt;. By then, the attacker had approximately 2 months of dwell time. Vercel's CEO Guillermo Rauch confirmed the attack chain publicly on X and named Context.ai as the compromised third party.&lt;/p&gt;

&lt;p&gt;The stolen Vercel internal database was listed for sale at &lt;strong&gt;$2 million on BreachForums&lt;/strong&gt; by ShinyHunters.&lt;/p&gt;

&lt;h4&gt;
  
  
  The env variable problem
&lt;/h4&gt;

&lt;p&gt;Vercel's environment variable model left variables not explicitly marked as "sensitive" unencrypted at rest. Once an attacker had team-scoped OAuth access, they could read all non-sensitive environment variables — connection strings, API keys, third-party service credentials — stored by developers who assumed they were protected.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key takeaway for developers
&lt;/h4&gt;

&lt;p&gt;You can have perfect security in your own systems and still get breached because an AI productivity tool you gave full Drive access to got compromised via an employee who downloaded Roblox scripts.&lt;/p&gt;

&lt;p&gt;This is the supply chain threat model in its purest form. The attack surface is no longer just your code — it's every OAuth permission you've ever granted.&lt;/p&gt;

&lt;h4&gt;
  
  
  How to protect yourself
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Immediate actions:
✓ Audit all OAuth app permissions in your Google Workspace — revoke apps with excessive access
✓ Mark ALL Vercel environment variables as "sensitive" explicitly (not just secrets)
✓ Query database connection logs for IPs outside known egress ranges, Feb–Apr 2026 window
✓ Rotate all API keys and secrets stored in Vercel project environment variables

Systemic changes:
✓ Never grant AI tools full-read workspace access — use scoped permissions
✓ Implement OAuth token monitoring to detect abnormal access patterns
✓ Treat third-party AI tools with the same vendor risk assessment as any SaaS platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🔐 Bitwarden CLI — April 22, 2026
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Severity: CRITICAL | Window: 90 minutes | Notable: First supply chain attack targeting AI coding tools&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What happened
&lt;/h4&gt;

&lt;p&gt;The Shai-Hulud worm's "Third Coming." At 5:57 PM ET on April 22, 2026, attackers published &lt;code&gt;@bitwarden/cli@2026.4.0&lt;/code&gt; — a malicious version of the CLI tool for the world's most popular open-source password manager (10M+ users, 50,000 business customers). By 7:30 PM ET, it was gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;90 minutes.&lt;/strong&gt; That's the entire attack window.&lt;/p&gt;

&lt;p&gt;The attack vector: Bitwarden's repository uses &lt;code&gt;checkmarx/ast-github-action&lt;/code&gt; — one of the GitHub Actions compromised in the ongoing Checkmarx supply chain campaign (also attributed to TeamPCP). Attackers hijacked Bitwarden's CI/CD pipeline, editing the &lt;code&gt;publish-cli.yml&lt;/code&gt; workflow five consecutive times to inject a prebuilt malicious tarball containing the payload &lt;code&gt;bw1.js&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Bitwarden confirmed: no user vault data was accessed. The web extension, desktop apps, and all other clients were unaffected. Only the CLI npm package was compromised.&lt;/p&gt;

&lt;h4&gt;
  
  
  The payload was remarkable
&lt;/h4&gt;

&lt;p&gt;The malware targeted &lt;strong&gt;six distinct credential surfaces&lt;/strong&gt; and introduced two novel capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Credential targets:&lt;/span&gt;
&lt;span class="nx"&gt;targets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AWS access keys + SSM/Secrets Manager&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Azure credentials + Key Vault&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GCP service account keys + Secret Manager&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GitHub PATs + npm publish tokens&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SSH keys + shell history + .env files&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI coding assistant configurations&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;// ← NEW in 2026&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;// Novel capability 1: AI tool targeting&lt;/span&gt;
&lt;span class="c1"&gt;// Explicitly probed for: Claude, Cursor, Codex CLI, Aider&lt;/span&gt;
&lt;span class="c1"&gt;// If authenticated session found → extract credentials + inject persistence&lt;/span&gt;

&lt;span class="c1"&gt;// Novel capability 2: Self-propagating worm&lt;/span&gt;
&lt;span class="c1"&gt;// Uses victim's npm publish tokens to backdoor ALL packages they can publish to&lt;/span&gt;
&lt;span class="c1"&gt;// Exfiltrates to public GitHub repos (RSA-encrypted) as dead-drop C2&lt;/span&gt;
&lt;span class="c1"&gt;// GitHub traffic not flagged by security tools → effective evasion&lt;/span&gt;

&lt;span class="c1"&gt;// Kill switch: skips if Russian locale detected&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  This changes the threat model for AI coding tools
&lt;/h4&gt;

&lt;p&gt;The Bitwarden CLI attack — combined with the Vercel breach via Context.ai — confirms a clear pattern that security teams need to internalize:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI coding tools (Claude, Cursor, Copilot, Aider) sit at the intersection of everything attackers want: source code access, command execution, API credentials, and cloud service connections.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These tools are now explicitly named in supply chain attack malware. Your AI coding assistant's authentication state is a credential worth stealing.&lt;/p&gt;

&lt;h4&gt;
  
  
  Immediate response
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if affected (installed between 5:57–7:30 PM ET, April 22)&lt;/span&gt;
npm list @bitwarden/cli  &lt;span class="c"&gt;# 2026.4.0 = COMPROMISED&lt;/span&gt;

&lt;span class="c"&gt;# Clean install&lt;/span&gt;
npm uninstall &lt;span class="nt"&gt;-g&lt;/span&gt; @bitwarden/cli
npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @bitwarden/cli@2026.4.1  &lt;span class="c"&gt;# verified clean&lt;/span&gt;

&lt;span class="c"&gt;# Find C2 artifacts&lt;/span&gt;
find / &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"bw1.js"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"bw_setup.js"&lt;/span&gt; 2&amp;gt;/dev/null

&lt;span class="c"&gt;# Search for data exfil repos&lt;/span&gt;
&lt;span class="c"&gt;# Check public GitHub for repos containing: "Shai-Hulud: The Third Coming"&lt;/span&gt;

&lt;span class="c"&gt;# Rotate if affected:&lt;/span&gt;
&lt;span class="c"&gt;# → GitHub PATs&lt;/span&gt;
&lt;span class="c"&gt;# → npm tokens  &lt;/span&gt;
&lt;span class="c"&gt;# → AWS access keys&lt;/span&gt;
&lt;span class="c"&gt;# → GCP service account keys&lt;/span&gt;
&lt;span class="c"&gt;# → Azure credentials&lt;/span&gt;
&lt;span class="c"&gt;# → SSH keys&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Big Picture: TeamPCP and the Campaign Architecture
&lt;/h2&gt;

&lt;p&gt;Most of the March–April attacks trace back to a single threat group: &lt;strong&gt;TeamPCP&lt;/strong&gt; (also operating as DeadCatx3, PCPcat, Persy_PCP, ShellForce, and CipherForce).&lt;/p&gt;

&lt;p&gt;TeamPCP first appeared in late December 2025 as a group focused on cloud-native infrastructure exploitation. Their 2026 campaign was methodical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1 (Feb 27–28):  Exploit pull_request_target in Trivy → steal aqua-bot PAT
Phase 2 (Mar 1):      Aqua rotates credentials → incomplete rotation
Phase 3 (Mar 19):     Use residual access → poison 75 Trivy tags + Docker images
Phase 4 (Mar 21):     Use stolen PATs from Trivy → poison KICS GitHub Actions
Phase 5 (Mar 24):     Use LiteLLM CI's Trivy → steal PyPI token → poison LiteLLM
Phase 6 (Mar 27):     Telnyx Python SDK compromised
Phase 7 (Mar 30–31):  Axios npm package poisoned (separate North Korean actor)
Phase 8 (Apr 15):     Vect ransomware lists first victim from Trivy campaign
Phase 9 (Apr 22):     Bitwarden CLI poisoned via Checkmarx GitHub Action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The campaign spanned &lt;strong&gt;PyPI, npm, Docker Hub, GitHub Actions, and OpenVSX&lt;/strong&gt; in a single coordinated multi-ecosystem operation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Systemic Defenses: What Actually Works
&lt;/h2&gt;

&lt;p&gt;After cataloging all of this, here's what the evidence shows actually works:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Pin to commit SHAs, not version tags
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is the single highest-impact change you can make:&lt;/span&gt;

&lt;span class="c1"&gt;# ❌ VULNERABLE (both of these)&lt;/span&gt;
&lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;some-action@v2.0&lt;/span&gt;
&lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;some-action@main&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ IMMUTABLE — cannot be silently changed&lt;/span&gt;
&lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;some-action@a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 2025 tj-actions attack and the 2026 Trivy attack both succeeded because developers referenced actions by tag. Both would have been completely immune with SHA pinning. One line of config change. That's it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use lockfiles strictly
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In CI/CD pipelines:&lt;/span&gt;
npm ci          &lt;span class="c"&gt;# NOT npm install&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--require-hashes&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Never allow unpinned transitive dependencies in production&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Atomic credential rotation
&lt;/h3&gt;

&lt;p&gt;When you detect a compromise and rotate credentials, the rotation must be a single synchronized operation — revoke all active tokens, generate new ones, update all consumers simultaneously. Sequential rotation leaves a window. TeamPCP exploited exactly this window in the Trivy incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Principle of least privilege for service accounts
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Your CI service account should not have:&lt;/span&gt;
&lt;span class="c1"&gt;# - write access to multiple repositories&lt;/span&gt;
&lt;span class="c1"&gt;# - admin access to package registries&lt;/span&gt;
&lt;span class="c1"&gt;# - broad cloud IAM roles&lt;/span&gt;

&lt;span class="c1"&gt;# It should have exactly:&lt;/span&gt;
&lt;span class="c1"&gt;# - read access to the specific repos needed for this job&lt;/span&gt;
&lt;span class="c1"&gt;# - publish access to the specific package this job publishes&lt;/span&gt;
&lt;span class="c1"&gt;# - no persistent credentials (use OIDC/short-lived tokens)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Behavior-based CI monitoring
&lt;/h3&gt;

&lt;p&gt;The LiteLLM incident was caught first by a developer whose machine started stuttering — their CPU was pegged because the malware's fork bomb behavior crashed the system. That's not monitoring; that's luck.&lt;/p&gt;

&lt;p&gt;What you actually need: alerts for Python processes making &lt;strong&gt;outbound POST requests at install time&lt;/strong&gt;. Package installation should pull from PyPI — it should never POST encrypted binary payloads to external endpoints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Alert rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;process&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python (via pip subprocess)&lt;/span&gt;
  &lt;span class="na"&gt;direction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;outbound&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt;
  &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;encrypted binary&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ALERT + BLOCK&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Audit OAuth permissions regularly
&lt;/h3&gt;

&lt;p&gt;The Vercel breach started with a productivity tool that was granted full Google Drive read access. Every OAuth integration in your organization is a potential pivot point. Audit them. Scope them to minimum required permissions. Revoke anything that hasn't been used recently.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Treat your AI coding tools as high-privilege systems
&lt;/h3&gt;

&lt;p&gt;Given the Bitwarden CLI attack explicitly targeted Claude, Cursor, Codex, and Aider credentials, it's time to treat AI coding assistant authentication state with the same security posture as cloud access keys:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't leave AI tools authenticated in unmonitored environments&lt;/li&gt;
&lt;li&gt;Rotate AI tool API keys on the same schedule as cloud credentials&lt;/li&gt;
&lt;li&gt;Monitor for abnormal AI tool usage patterns (large data transfers, unusual API calls)&lt;/li&gt;
&lt;li&gt;Be aware that your AI coding assistant may have access to your entire codebase, your git credentials, and your cloud service connections simultaneously&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Developer's Quick Reference Checklist
&lt;/h2&gt;

&lt;p&gt;Here's a condensed action list you can use right now:&lt;/p&gt;

&lt;h3&gt;
  
  
  For your CI/CD pipelines
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;☐ Pin all GitHub Actions to full commit SHAs
☐ Use npm ci / pip install --require-hashes (not npm install / pip install)  
☐ Audit pull_request_target workflows for excessive permissions
☐ Limit service account tokens to minimum required scope
☐ Enable GitHub's SHA pinning organizational policy
☐ Set up behavior-based alerts for install-time network requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  For your dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;☐ Check for plain-crypto-js in any lockfile (Axios RAT indicator)
☐ Check for litellm==1.82.7 or 1.82.8 in any Python environment
☐ Check @bitwarden/cli for version 2026.4.0 (rotate if found)
☐ Search your GitHub org for repos named "tpcp-docs" (Trivy compromise indicator)
☐ Audit all GitHub Actions for recent unexpected workflow edits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  For your organization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;☐ Audit all OAuth app permissions — revoke excessive access
☐ Mark all Vercel environment variables as "sensitive"
☐ Rotate credentials from any CI pipeline that ran Trivy on March 19, 2026
☐ Implement vendor monitoring with automated CVE-to-vendor mapping
☐ Document your complete dependency tree (can you answer: what packages did production install in the last 30 days?)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;The pattern across all of these attacks is the same: &lt;strong&gt;attackers are targeting trust, not systems.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They're not breaking through your firewall. They're getting invited through the front door — via a trusted package, a trusted OAuth app, a trusted GitHub Action, a trusted vulnerability scanner.&lt;/p&gt;

&lt;p&gt;The question isn't whether your perimeter is secure. The question is whether you know every entity you trust, what access you've granted them, and what happens to your environment if any one of them is compromised.&lt;/p&gt;

&lt;p&gt;Supply chain security in 2026 isn't a specialized discipline anymore. It's table stakes for any team that ships software.&lt;/p&gt;

&lt;p&gt;The next compromised package is already on its way to your CI pipeline. The question is whether you'll see it land.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources and Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.aquasec.com/blog/trivy-supply-chain-attack-what-you-need-to-know/" rel="noopener noreferrer"&gt;Aqua Security Trivy Incident Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.litellm.ai/blog/security-update-march-2026" rel="noopener noreferrer"&gt;LiteLLM Official Security Update&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.elastic.co/security-labs/axios-one-rat-to-rule-them-all" rel="noopener noreferrer"&gt;Elastic Security Labs: Axios Supply Chain Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vercel.com/kb/bulletin/vercel-april-2026-security-incident" rel="noopener noreferrer"&gt;Vercel April 2026 Security Bulletin&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://community.bitwarden.com/t/bitwarden-statement-on-checkmarx-supply-chain-incident/96127" rel="noopener noreferrer"&gt;Bitwarden Statement on Checkmarx Supply Chain Incident&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://snyk.io/articles/trivy-github-actions-supply-chain-compromise/" rel="noopener noreferrer"&gt;Snyk: Trivy GitHub Actions Supply Chain Compromise&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;Claude Code Unpacked&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.endorlabs.com/learn/shai-hulud-the-third-coming" rel="noopener noreferrer"&gt;Endor Labs: Shai-Hulud Third Coming Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.group-ib.com/blog/supply-chain-attack-groups-2026/" rel="noopener noreferrer"&gt;Group-IB: Supply Chain Attack Groups 2026&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you're responsible for a CI/CD pipeline, share this with your team — the SHA pinning point alone is worth the read.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All technical details sourced from public security disclosures, vendor incident reports, and independent researcher analysis.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#security&lt;/code&gt; &lt;code&gt;#cybersecurity&lt;/code&gt; &lt;code&gt;#devops&lt;/code&gt; &lt;code&gt;#opensource&lt;/code&gt; &lt;code&gt;#supplychain&lt;/code&gt; &lt;code&gt;#javascript&lt;/code&gt; &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#npm&lt;/code&gt; &lt;code&gt;#github&lt;/code&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Definitive Guide to Lightweight Kubernetes: KIND, Minikube, MicroK8s, K3s, Vcluster, k0s, and RKE2 Compared</title>
      <dc:creator>Pendela BhargavaSai</dc:creator>
      <pubDate>Thu, 23 Apr 2026 03:18:00 +0000</pubDate>
      <link>https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3be1</link>
      <guid>https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3be1</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — There is no single "best" lightweight Kubernetes. KIND wins CI/CD, Minikube wins local dev UX, MicroK8s wins on Ubuntu, K3s wins edge and production, Vcluster wins multi-tenancy, k0s wins zero-dependency ops, and RKE2 wins enterprise compliance. This post explains why — with architecture diagrams, feature tables, and real-world guidance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ixolu1jgi9xokd9cw9k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ixolu1jgi9xokd9cw9k.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Why Lightweight Kubernetes Matters&lt;/li&gt;
&lt;li&gt;The Contenders at a Glance&lt;/li&gt;
&lt;li&gt;KIND — Kubernetes IN Docker&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#KIND-Kubernetes-IN-Docker:~:text=2.%20Minikube%20%2D%20The%20Developer%27s%20Workhorse"&gt;Minikube — The Developer's Workhorse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#KIND-Kubernetes-IN-Docker:~:text=3.%20MicroK8s%20%2D%20Zero%2DOps%20by%20Canonical"&gt;MicroK8s — Zero-Ops by Canonical&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#KIND-Kubernetes-IN-Docker:~:text=4.%20K3s%20%2D%20Production%2DGrade%20at%20the%20Edge"&gt;K3s — Production-Grade at the Edge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Vcluster — Kubernetes Inside Kubernetes&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#KIND-Kubernetes-IN-Docker:~:text=6.%20k0s%20%E2%80%94%20Zero%20Dependencies%2C%20Zero%20Friction"&gt;k0s — Zero Dependencies, Zero Friction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#KIND-Kubernetes-IN-Docker:~:text=7.%20RKE2%20%E2%80%94%20Security%2DFirst%20Enterprise%20K8s"&gt;RKE2 — Security-First Enterprise K8s&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#KIND-Kubernetes-IN-Docker:~:text=are%20non%2Dnegotiable.-,Scoring%20Across%208%20Dimensions,-Scores%20are%20relative"&gt;Scoring Across 8 Dimensions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/pendelabhargavasai/the-definitive-guide-to-lightweight-kubernetes-kind-minikube-microk8s-k3s-vcluster-k0s-and-3o5e-temp-slug-8924697/edit#final-verdict:~:text=K3s-,The%20Decision%20Tree,-Do%20you%20need"&gt;Use Case Decision Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Final Verdict&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Why Lightweight Kubernetes Matters
&lt;/h2&gt;

&lt;p&gt;Full-fat Kubernetes — the kind you run on a 3-master, 6-worker production cluster — is extraordinary infrastructure. It is also deeply impractical when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spin up a throwaway cluster in a GitHub Actions runner in under 30 seconds&lt;/li&gt;
&lt;li&gt;Run Kubernetes on a Raspberry Pi with 1 GB of RAM&lt;/li&gt;
&lt;li&gt;Give every developer on your team their own isolated cluster without buying new hardware&lt;/li&gt;
&lt;li&gt;Deploy to a factory floor where the "server" is an ARM SBC with no internet access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Kubernetes ecosystem responded by producing a rich family of lightweight distributions, each making different trade-offs. By 2025, the major players are &lt;strong&gt;KIND&lt;/strong&gt;, &lt;strong&gt;Minikube&lt;/strong&gt;, &lt;strong&gt;MicroK8s&lt;/strong&gt;, &lt;strong&gt;K3s&lt;/strong&gt;, &lt;strong&gt;Vcluster&lt;/strong&gt;, &lt;strong&gt;k0s&lt;/strong&gt;, and &lt;strong&gt;RKE2&lt;/strong&gt; — and choosing between them is genuinely consequential.&lt;/p&gt;

&lt;p&gt;This guide gives you the full picture: architecture, components, features, limitations, scoring, and concrete use-case guidance, all in one place.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Contenders at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Creator&lt;/th&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Primary Use Case&lt;/th&gt;
&lt;th&gt;Min RAM&lt;/th&gt;
&lt;th&gt;Binary Size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;KIND&lt;/td&gt;
&lt;td&gt;Kubernetes SIG Testing&lt;/td&gt;
&lt;td&gt;2019&lt;/td&gt;
&lt;td&gt;CI/CD testing&lt;/td&gt;
&lt;td&gt;2 GB&lt;/td&gt;
&lt;td&gt;N/A (uses Docker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minikube&lt;/td&gt;
&lt;td&gt;Kubernetes Community&lt;/td&gt;
&lt;td&gt;2016&lt;/td&gt;
&lt;td&gt;Local development&lt;/td&gt;
&lt;td&gt;2 GB&lt;/td&gt;
&lt;td&gt;~100 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MicroK8s&lt;/td&gt;
&lt;td&gt;Canonical (Ubuntu)&lt;/td&gt;
&lt;td&gt;2018&lt;/td&gt;
&lt;td&gt;Ubuntu / Edge&lt;/td&gt;
&lt;td&gt;540 MB&lt;/td&gt;
&lt;td&gt;Snap package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;td&gt;Rancher Labs (SUSE)&lt;/td&gt;
&lt;td&gt;2019&lt;/td&gt;
&lt;td&gt;Edge / Production&lt;/td&gt;
&lt;td&gt;512 MB&lt;/td&gt;
&lt;td&gt;&amp;lt; 100 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vcluster&lt;/td&gt;
&lt;td&gt;Loft Labs&lt;/td&gt;
&lt;td&gt;2021&lt;/td&gt;
&lt;td&gt;Multi-tenancy&lt;/td&gt;
&lt;td&gt;Host-dependent&lt;/td&gt;
&lt;td&gt;Helm chart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;k0s&lt;/td&gt;
&lt;td&gt;Mirantis&lt;/td&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;Zero-dependency ops&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;~230 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RKE2&lt;/td&gt;
&lt;td&gt;Rancher (SUSE)&lt;/td&gt;
&lt;td&gt;2021&lt;/td&gt;
&lt;td&gt;Enterprise / Compliance&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;~300 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each of these is CNCF-compatible and capable of running real Kubernetes workloads. The differences are in &lt;em&gt;where&lt;/em&gt;, &lt;em&gt;how&lt;/em&gt;, and &lt;em&gt;at what cost&lt;/em&gt; they do it.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. KIND — Kubernetes IN Docker
&lt;/h2&gt;




&lt;h2&gt;
  
  
  What It Is
&lt;/h2&gt;

&lt;p&gt;KIND (Kubernetes IN Docker) was built by the Kubernetes SIG Testing team for one purpose: to test Kubernetes itself. Every node in a KIND cluster is a Docker container. The control plane runs in one container, worker nodes in others, and they communicate over a Docker bridge network called &lt;code&gt;kindnet&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KIND&lt;/strong&gt; runs every Kubernetes node as a Docker container. There is no VM, no hypervisor, no separate OS. The &lt;code&gt;kindnet&lt;/code&gt; CNI is a purpose-built bridge that understands this container-as-node topology. The practical effect is that KIND clusters are disposable, fast, and completely ephemeral — perfect for testing but incapable of persistence.&lt;/p&gt;

&lt;p&gt;Because there is no VM involved, KIND clusters start in about 30 seconds and use only Docker's existing networking and storage. You can run a dozen isolated clusters on a single laptop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;
┌─────────────────────────────────────────────────────---┐
│                   Docker Host                          │
│                                                        │
│  ┌─────────────────────┐   ┌──────────────────────┐    │
│  │   Control Plane     │   │     Worker 1         │    │
│  │   (container)       │──▶│     (container)      │   │
│  │                     │   │                      │    │
│  │  • API Server       │   │  • kubelet           │    │
│  │  • etcd             │   │  • kube-proxy        │    │
│  │  • Scheduler        │──▶│  • Pod A  • Pod B    │    │
│  │  • Controller Mgr   │   └──────────────────────┘    │
│  │  • kindnet CNI      │     ┌──────────────────────┐  │
│  └─────────────────────┘     │     Worker 2         │  │
│                              │     (container)      │  │
│  ┌──────────────────┐        │  • kubelet + pods    │  │
│  │  Port-forwarding │        └──────────────────────┘  │
│  │  localhost:6443  │                                  │
│  └──────────────────┘                                  │
└─────────────────────────────────────────────────────---┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;kindnet&lt;/strong&gt; — Custom CNI using a kernel bridge, purpose-built for KIND's container-as-node model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;etcd&lt;/strong&gt; — Full etcd running inside the control-plane container&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;containerd&lt;/strong&gt; — Container runtime inside each node-container (Docker-in-Docker)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;kubeadm&lt;/strong&gt; — KIND uses kubeadm internally to bootstrap the cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;True multi-node clusters (control plane + N workers) on a single host&lt;/li&gt;
&lt;li&gt;Custom node images — test against any Kubernetes version&lt;/li&gt;
&lt;li&gt;Rootless mode via rootless Docker/Podman&lt;/li&gt;
&lt;li&gt;IPv6 and dual-stack support&lt;/li&gt;
&lt;li&gt;Create multiple isolated clusters simultaneously&lt;/li&gt;
&lt;li&gt;Parallel cluster creation&lt;/li&gt;
&lt;li&gt;KUBECONFIG auto-export&lt;/li&gt;
&lt;li&gt;Optimised for GitHub Actions, GitLab CI, and Jenkins&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Install&lt;/span&gt;

curl  &lt;span class="nt"&gt;-Lo&lt;/span&gt;  ./kind  &amp;lt;https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64&amp;gt;

&lt;span class="nb"&gt;chmod&lt;/span&gt;  +x  ./kind &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo  mv&lt;/span&gt;  ./kind  /usr/local/bin/kind

&lt;span class="c"&gt;# Create a single-node cluster&lt;/span&gt;

kind  create  cluster

&lt;span class="c"&gt;# Create a multi-node cluster&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kind  create  cluster  --config=-

kind: Cluster

apiVersion: kind.x-k8s.io/v1alpha4

nodes:

- role: control-plane

- role: worker

- role: worker
&lt;/span&gt;&lt;span class="no"&gt;
EOF

&lt;/span&gt;&lt;span class="c"&gt;# Delete cluster&lt;/span&gt;

kind  delete  cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Blazing fast — 30-second cluster creation, no hypervisor boot time&lt;/li&gt;
&lt;li&gt;Zero VM overhead — runs entirely inside Docker containers&lt;/li&gt;
&lt;li&gt;True multi-node topology on one host&lt;/li&gt;
&lt;li&gt;Exact Kubernetes version control via node images&lt;/li&gt;
&lt;li&gt;Perfect for ephemeral CI environments&lt;/li&gt;
&lt;li&gt;No LoadBalancer hacks needed for testing (use NodePort)&lt;/li&gt;
&lt;li&gt;Widely supported in CI platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires Docker or Podman to be running&lt;/li&gt;
&lt;li&gt;Not production-ready under any circumstances&lt;/li&gt;
&lt;li&gt;No GPU passthrough&lt;/li&gt;
&lt;li&gt;LoadBalancer type services need MetalLB or similar&lt;/li&gt;
&lt;li&gt;Volumes are lost when the cluster is deleted&lt;/li&gt;
&lt;li&gt;No addon ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;CI/CD pipelines&lt;/strong&gt; — specifically integration testing that needs a real multi-node Kubernetes topology without the boot time of a VM-based solution.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Minikube - The Developer's Workhorse
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;Minikube is the original local Kubernetes project, released in 2016 and still the most feature-rich local development option. It runs a Kubernetes cluster inside a VM, a container, or directly on the host, and brings an unmatched addon ecosystem of 30+ pre-packaged integrations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minikube&lt;/strong&gt; is the only distribution that abstracts over &lt;em&gt;drivers&lt;/em&gt; — it runs identically whether the underlying host is a VM (VirtualBox, HyperKit, KVM), a container (Docker, Podman), or bare metal. This flexibility comes at the cost of startup time and memory, but it means Minikube works for every developer on every operating system.&lt;/p&gt;

&lt;p&gt;If you've ever run &lt;code&gt;kubectl apply -f&lt;/code&gt; on your laptop, you've probably used Minikube.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────────-┐
│             VM / Docker / Podman Driver                   │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐  │
│  │            Single Node (All-in-One)                 │  │
│  │                                                     │  │
│  │  Control Plane                  Data Plane          │  │
│  │  ┌──────────┐  ┌────────────┐   ┌─────────────────┐ │  │
│  │  │API Server│  │etcd        │   │kubelet          │ │  │
│  │  └──────────┘  └────────────┘   │kube-proxy       │ │  │
│  │  ┌──────────┐  ┌────────────┐   │Pod A • Pod B    │ │  │
│  │  │Scheduler │  │Ctrl Manager│   └─────────────────┘ │  │
│  │  └──────────┘  └────────────┘                       │  │
│  └─────────────────────────────────────────────────────┘  │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐  │
│  │                   Addons Layer                      │  │
│  │  Dashboard │ Ingress │ Metrics │ Registry │ Istio   │  │
│  └─────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────-┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple drivers&lt;/strong&gt; — HyperKit, VirtualBox, KVM2, Docker, Podman, SSH&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;etcd&lt;/strong&gt; — Full etcd as the backing store&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calico or Flannel&lt;/strong&gt; — CNI (configurable per driver)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addon controller&lt;/strong&gt; — Manages the 30+ available addon services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;30+ addons including Istio, Knative, Linkerd, GPU operator, registry, and more&lt;/li&gt;
&lt;li&gt;Built-in Kubernetes dashboard (&lt;code&gt;minikube dashboard&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;GPU passthrough in VM mode&lt;/li&gt;
&lt;li&gt;LoadBalancer via &lt;code&gt;minikube tunnel&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Multiple profile management (run several clusters simultaneously)&lt;/li&gt;
&lt;li&gt;Image caching to speed up repeated pulls&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;minikube service&lt;/code&gt; command for easy port access&lt;/li&gt;
&lt;li&gt;Built-in image loading (&lt;code&gt;minikube image load&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Install (Linux)&lt;/span&gt;

curl  &lt;span class="nt"&gt;-LO&lt;/span&gt;  &amp;lt;https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64&amp;gt;

&lt;span class="nb"&gt;sudo  install  &lt;/span&gt;minikube-linux-amd64  /usr/local/bin/minikube

&lt;span class="c"&gt;# Start with Docker driver&lt;/span&gt;

minikube  start  &lt;span class="nt"&gt;--driver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker

&lt;span class="c"&gt;# Enable addons&lt;/span&gt;

minikube  addons  &lt;span class="nb"&gt;enable  &lt;/span&gt;ingress

minikube  addons  &lt;span class="nb"&gt;enable  &lt;/span&gt;metrics-server

minikube  addons  &lt;span class="nb"&gt;enable  &lt;/span&gt;dashboard

&lt;span class="c"&gt;# Open dashboard&lt;/span&gt;

minikube  dashboard

&lt;span class="c"&gt;# LoadBalancer support&lt;/span&gt;

minikube  tunnel  &lt;span class="c"&gt;# Run in separate terminal&lt;/span&gt;

&lt;span class="c"&gt;# Delete&lt;/span&gt;

minikube  delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Easiest getting-started experience of any K8s tool&lt;/li&gt;
&lt;li&gt;Unmatched addon ecosystem (30+ addons)&lt;/li&gt;
&lt;li&gt;GPU passthrough support (VirtualBox/KVM drivers)&lt;/li&gt;
&lt;li&gt;Built-in dashboard requires zero configuration&lt;/li&gt;
&lt;li&gt;Works on macOS, Linux, and Windows&lt;/li&gt;
&lt;li&gt;Multiple profiles = multiple clusters&lt;/li&gt;
&lt;li&gt;Best documentation and community support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Slow startup in VM mode (~2 minutes)&lt;/li&gt;
&lt;li&gt;High memory consumption, especially with VM driver&lt;/li&gt;
&lt;li&gt;Primarily a single-node environment&lt;/li&gt;
&lt;li&gt;Not production-ready&lt;/li&gt;
&lt;li&gt;LoadBalancer requires keeping &lt;code&gt;minikube tunnel&lt;/code&gt; running separately&lt;/li&gt;
&lt;li&gt;Battery-intensive on laptops&lt;/li&gt;
&lt;li&gt;Multi-node support exists but is limited and buggy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Local development&lt;/strong&gt; — especially developers who want a full Kubernetes experience with addons, dashboards, and GPU support without deep infrastructure expertise.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. MicroK8s - Zero-Ops by Canonical
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;MicroK8s is Canonical's packaging of Kubernetes as a snap. It installs as a single command, self-heals via systemd, updates automatically through snap channels, and has the lowest memory footprint of any full-featured Kubernetes distribution at just 540 MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MicroK8s&lt;/strong&gt; is unique in using &lt;strong&gt;dqlite&lt;/strong&gt; — a distributed SQLite engine developed by Canonical — as an alternative to etcd for HA mode. This dramatically simplifies the operational burden of running a multi-master cluster: no external etcd cluster needed, just &lt;code&gt;microk8s add-node&lt;/code&gt; on each machine.&lt;/p&gt;

&lt;p&gt;Unlike KIND and Minikube, MicroK8s is designed for both development &lt;em&gt;and&lt;/em&gt; light production workloads. Its HA mode using dqlite (a distributed version of SQLite) supports clustering without requiring a full etcd setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;
┌──────────────────────────────────────────────────────────-┐
│                  Snap Package (systemd)                   │
│                                                           │
│  ┌──────────────────────┐   ┌────────────────────────┐    │
│  │    Node 1 (Master)    │   │      Node 2           │    │
│  │                       │   │                       │    │
│  │  • API Server         │──▶│  • kubelet            │    │
│  │  • dqlite (HA store)  │   │  • kube-proxy         │    │
│  │  • Scheduler          │   │  • Calico CNI          │   │
│  │  • Controller Manager │   │  • Pods                │   │
│  │  • Calico CNI         │   └────────────────────────┘   │
│  │  • Auto-updater       │                                │
│  └──────────────────────┘                                 │
│                                                           │
│  ┌──────────────────────────────────────────────────────┐ │
│  │         Addon Engine (microk8s enable &lt;span class="nt"&gt;&amp;lt;addon&amp;gt;&lt;/span&gt;)       │ │
│  │  Istio │ Knative │ GPU │ Registry │ Dashboard │ More │ │
│  └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────-┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;dqlite&lt;/strong&gt; — Distributed SQLite for HA without the operational burden of etcd&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calico CNI&lt;/strong&gt; — Production-grade networking with network policy support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snap daemon&lt;/strong&gt; — Manages the entire lifecycle including automatic updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addon engine&lt;/strong&gt; — &lt;code&gt;microk8s enable &amp;lt;name&amp;gt;&lt;/code&gt; installs curated addons&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lowest memory footprint: 540 MB minimum&lt;/li&gt;
&lt;li&gt;HA clustering via &lt;code&gt;microk8s add-node&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Automatic channel-based updates with rollback&lt;/li&gt;
&lt;li&gt;GPU operator addon for ML/AI workloads&lt;/li&gt;
&lt;li&gt;Strict snap confinement for security&lt;/li&gt;
&lt;li&gt;ARM64 and x86 native support&lt;/li&gt;
&lt;li&gt;Observability stack addon (Prometheus, Grafana)&lt;/li&gt;
&lt;li&gt;Built-in image registry&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Install via snap&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;snap  &lt;span class="nb"&gt;install  &lt;/span&gt;microk8s  &lt;span class="nt"&gt;--classic&lt;/span&gt;

&lt;span class="c"&gt;# Add your user to the microk8s group&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;usermod  &lt;span class="nt"&gt;-aG&lt;/span&gt;  microk8s  &lt;span class="nv"&gt;$USER&lt;/span&gt;

newgrp  microk8s

&lt;span class="c"&gt;# Check status&lt;/span&gt;

microk8s  status  &lt;span class="nt"&gt;--wait-ready&lt;/span&gt;

&lt;span class="c"&gt;# Enable core addons&lt;/span&gt;

microk8s  &lt;span class="nb"&gt;enable  &lt;/span&gt;dns  ingress  metrics-server  dashboard

&lt;span class="c"&gt;# Use kubectl&lt;/span&gt;

microk8s  kubectl  get  nodes

&lt;span class="c"&gt;# Add worker node (run on master, then copy join command to worker)&lt;/span&gt;

microk8s  add-node

&lt;span class="c"&gt;# Uninstall&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;snap  remove  microk8s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lowest RAM usage of all full-featured distributions (540 MB)&lt;/li&gt;
&lt;li&gt;Best Ubuntu and Linux integration through the snap ecosystem&lt;/li&gt;
&lt;li&gt;Self-healing via systemd — restarts automatically on failure&lt;/li&gt;
&lt;li&gt;HA multi-node with a simple &lt;code&gt;add-node&lt;/code&gt; workflow&lt;/li&gt;
&lt;li&gt;Automatic updates through snap channels (stable, candidate, beta)&lt;/li&gt;
&lt;li&gt;Production-capable for light workloads&lt;/li&gt;
&lt;li&gt;ARM64 support for Raspberry Pi and ARM servers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Snap packaging limits portability to non-Ubuntu systems&lt;/li&gt;
&lt;li&gt;Ubuntu-centric design — snap is not available everywhere&lt;/li&gt;
&lt;li&gt;Addon conflicts can occur (Istio + other service meshes, for example)&lt;/li&gt;
&lt;li&gt;Strict snap confinement can block some host filesystem operations&lt;/li&gt;
&lt;li&gt;dqlite is still maturing compared to battle-tested etcd&lt;/li&gt;
&lt;li&gt;Automatic updates can cause unplanned restarts without configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Ubuntu workstations and edge servers&lt;/strong&gt; — if you're on Ubuntu, MicroK8s is the most native Kubernetes experience available.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. K3s - Production-Grade at the Edge
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;K3s is the single most consequential lightweight Kubernetes project of the past five years. Released by Rancher Labs (now SUSE) in 2019, it packs a complete, CNCF-certified Kubernetes distribution into a single binary under 100 MB. It runs on 512 MB of RAM, boots in 30 seconds, and runs identically on a Raspberry Pi, a factory floor ARM controller, and a cloud VM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;K3s&lt;/strong&gt; achieves its sub-100 MB size by bundling everything into a single Go binary with no external dependencies, using SQLite as a default backing store (which requires no cluster management), and removing upstream K8s features that aren't needed in its target environments (Windows nodes, cloud-provider integrations, certain alpha features).&lt;/p&gt;

&lt;p&gt;K3s is not a toy. It is used in production by thousands of organisations worldwide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;
┌────────────────────────────────────────────────────────────────┐
│                      k3s binary (&amp;lt; 100 MB)                     │
│                                                                │
│  ┌─────────────────────────────────┐                           │
│  │          k3s Server             │                           │
│  │  (Control Plane + Optional DP)  │──────────┐                │
│  │                                 │          │                │
│  │  • API Server                   │          ▼                │
│  │  • SQLite (default) / etcd / PG │   ┌─────────────────┐     │
│  │  • Scheduler                    │   │   k3s Agent 1   │     │
│  │  • Controller Manager           │   │   (Worker Node) │     │
│  │  • Flannel CNI (built-in)       │   │  • kubelet      │     │
│  │  • Traefik Ingress              │   │  • kube-proxy   │     │
│  │  • CoreDNS                      │──▶│  • Flannel     │     │
│  │  • local-path-provisioner       │   │  • Pods         │     │
│  │  • Helm controller              │   └─────────────────┘     │
│  └─────────────────────────────────┘          │                │
│                                               ▼                │
│                                        ┌─────────────────┐     │
│                                        │ k3s Agent 2     │     │
│                                        │ (ARM / IoT)     │     │
│                                        └─────────────────┘     │
└────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single binary&lt;/strong&gt; — Packages containerd, CNI plugins, CoreDNS, Traefik, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite&lt;/strong&gt; — Default data store, ideal for single-server or small clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedded etcd&lt;/strong&gt; — Available for HA clusters (3+ servers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External DB&lt;/strong&gt; — PostgreSQL, MySQL, or etcd for larger deployments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flannel CNI&lt;/strong&gt; — Built-in overlay networking, zero extra configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traefik&lt;/strong&gt; — Ingress controller included out of the box&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Helm controller&lt;/strong&gt; — Manage Helm charts via CRDs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;local-path-provisioner&lt;/strong&gt; — Dynamic PVC provisioning on local disk&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CNCF-certified — passes full Kubernetes conformance tests&lt;/li&gt;
&lt;li&gt;Single binary &amp;lt; 100 MB with everything bundled&lt;/li&gt;
&lt;li&gt;Multiple storage backends: SQLite, etcd, PostgreSQL, MySQL&lt;/li&gt;
&lt;li&gt;ARM64 and ARMv7 first-class support&lt;/li&gt;
&lt;li&gt;Air-gap / offline install support (critical for edge deployments)&lt;/li&gt;
&lt;li&gt;Auto TLS with Let's Encrypt for Traefik&lt;/li&gt;
&lt;li&gt;Server + Agent role split for control/data plane separation&lt;/li&gt;
&lt;li&gt;Automatic certificate rotation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Install server (master) — one command&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  &amp;lt;https://get.k3s.io&amp;gt; | sh  -

&lt;span class="c"&gt;# Check status&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;systemctl  status  k3s

&lt;span class="nb"&gt;sudo  &lt;/span&gt;kubectl  get  nodes

&lt;span class="c"&gt;# Get the node join token&lt;/span&gt;

&lt;span class="nb"&gt;sudo  cat&lt;/span&gt;  /var/lib/rancher/k3s/server/node-token

&lt;span class="c"&gt;# Join a worker node (run on the worker)&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  &amp;lt;https://get.k3s.io&amp;gt; | &lt;span class="se"&gt;\\&lt;/span&gt;

&lt;span class="nv"&gt;K3S_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://&amp;lt;SERVER_IP&amp;gt;:6443  &lt;span class="se"&gt;\\&lt;/span&gt;

&lt;span class="nv"&gt;K3S_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;NODE_TOKEN&amp;gt; &lt;span class="se"&gt;\\&lt;/span&gt;

sh  -

&lt;span class="c"&gt;# Use kubectl without sudo&lt;/span&gt;

&lt;span class="nb"&gt;mkdir&lt;/span&gt;  &lt;span class="nt"&gt;-p&lt;/span&gt;  ~/.kube

&lt;span class="nb"&gt;sudo  cp&lt;/span&gt;  /etc/rancher/k3s/k3s.yaml  ~/.kube/config

&lt;span class="nb"&gt;sudo  chown&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;  &lt;span class="nt"&gt;-u&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;  &lt;span class="nt"&gt;-g&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; ~/.kube/config

&lt;span class="c"&gt;# Uninstall&lt;/span&gt;

/usr/local/bin/k3s-uninstall.sh  &lt;span class="c"&gt;# server&lt;/span&gt;

/usr/local/bin/k3s-agent-uninstall.sh  &lt;span class="c"&gt;# agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HA Setup (Embedded etcd)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# First server node&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  &amp;lt;https://get.k3s.io&amp;gt; | sh  &lt;span class="nt"&gt;-s&lt;/span&gt;  -  server  &lt;span class="se"&gt;\\&lt;/span&gt;

&lt;span class="nt"&gt;--cluster-init&lt;/span&gt;

&lt;span class="c"&gt;# Additional server nodes&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  &amp;lt;https://get.k3s.io&amp;gt; | sh  &lt;span class="nt"&gt;-s&lt;/span&gt;  -  server  &lt;span class="se"&gt;\\&lt;/span&gt;

&lt;span class="nt"&gt;--server&lt;/span&gt; https://&amp;lt;FIRST_SERVER_IP&amp;gt;:6443 &lt;span class="se"&gt;\\&lt;/span&gt;

&lt;span class="nt"&gt;--token&lt;/span&gt; &amp;lt;NODE_TOKEN&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CNCF-certified — genuine, conformant Kubernetes, not a cut-down imitation&lt;/li&gt;
&lt;li&gt;Single binary under 100 MB — deploy to anything&lt;/li&gt;
&lt;li&gt;512 MB RAM minimum — runs on Raspberry Pi 3&lt;/li&gt;
&lt;li&gt;30-second cold start&lt;/li&gt;
&lt;li&gt;SQLite for small clusters, etcd for HA — right tool for every scale&lt;/li&gt;
&lt;li&gt;Traefik ingress out of the box — production workloads with zero extra config&lt;/li&gt;
&lt;li&gt;ARM64 and ARMv7 native — best IoT Kubernetes support in the market&lt;/li&gt;
&lt;li&gt;Air-gap install — works in completely offline environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SQLite backend not suitable for clusters exceeding ~50 nodes&lt;/li&gt;
&lt;li&gt;Some upstream Kubernetes features are stripped (Alpha features, some cloud integrations)&lt;/li&gt;
&lt;li&gt;Default CNI is Flannel only (using Calico requires additional configuration)&lt;/li&gt;
&lt;li&gt;No built-in dashboard&lt;/li&gt;
&lt;li&gt;Less rich addon ecosystem than Minikube or MicroK8s&lt;/li&gt;
&lt;li&gt;Limited Windows node support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Edge computing, IoT, production on resource-constrained hardware, and any environment where the binary size and startup time of a traditional Kubernetes distribution is prohibitive.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Vcluster — Kubernetes Inside Kubernetes
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;Vcluster takes a completely different approach to "lightweight Kubernetes." Rather than running alongside a host operating system, it runs &lt;em&gt;inside&lt;/em&gt; an existing Kubernetes cluster. Each virtual cluster is a set of pods in a namespace, but from the user's perspective it is a completely isolated Kubernetes cluster with its own API server, etcd, and full Kubernetes API.&lt;/p&gt;

&lt;p&gt;This makes Vcluster the definitive answer to the multi-tenancy problem: instead of giving teams namespace isolation (which shares the API server and exposes blast radius), you give each team their own cluster for the cost of a few pods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vcluster&lt;/strong&gt; is architecturally unique in the field. Its virtual control plane (API server + etcd + scheduler + controller manager) runs as pods &lt;em&gt;inside&lt;/em&gt; a host cluster namespace. A component called the &lt;strong&gt;Syncer&lt;/strong&gt; watches the virtual cluster's API and translates virtual resources into real host resources — a virtual Pod becomes a real Pod in the host namespace with a remapped name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────────────┐
│               Host Kubernetes Cluster (any provider)         │
│                                                              │
│  ┌────────────────────┐  ┌────────────────────┐              │
│  │    vcluster 1      │  │    vcluster 2       │             │
│  │  (Team A ns)       │  │  (Team B ns)        │             │
│  │                    │  │                     │             │
│  │  Virtual API Srv   │  │  Virtual API Srv    │             │
│  │  In-process etcd   │  │  In-process etcd    │             │
│  │  Syncer pod        │  │  Syncer pod         │             │
│  │  ┌────┐  ┌────┐    │  │  ┌────┐  ┌────┐    │              │
│  │  │PodA│  │PodB│    │  │  │PodC│  │PodD│    │              │
│  │  └─┬──┘  └─┬──┘    │  │  └─┬──┘  └─┬──┘    │              │
│  │    │sync   │sync   │  │    │sync   │sync    │             │
│  └────┼───────┼───────┘  └────┼───────┼────────┘             │
│       ▼       ▼               ▼       ▼                      │
│  ┌──────────────────────────────────────────────────────┐    │
│  │  Shared Worker Nodes — Host CNI, Storage, Hardware   │    │
│  └──────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;Syncer&lt;/strong&gt; is the key innovation: it translates virtual cluster resources into real host cluster resources. A Pod created in vcluster 1 becomes a real Pod in the host cluster's namespace, but with a remapped name that prevents conflicts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Virtual API Server&lt;/strong&gt; — Full Kubernetes API, runs as a pod in the host cluster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-process etcd&lt;/strong&gt; — Embedded etcd for the virtual cluster's state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Syncer&lt;/strong&gt; — Reconciles virtual resources to host cluster resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vcluster CLI&lt;/strong&gt; — Manages lifecycle: create, connect, delete, list&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Full Kubernetes API isolation per virtual cluster&lt;/li&gt;
&lt;li&gt;Works on top of any Kubernetes (EKS, GKE, AKS, K3s, RKE2, etc.)&lt;/li&gt;
&lt;li&gt;~10 second spin-up time — fastest of all solutions&lt;/li&gt;
&lt;li&gt;No extra hardware — uses existing cluster nodes&lt;/li&gt;
&lt;li&gt;CRD isolation — each vcluster has its own CRDs&lt;/li&gt;
&lt;li&gt;RBAC isolation — separate RBAC per vcluster&lt;/li&gt;
&lt;li&gt;Helm chart deployment — deploy via standard Helm&lt;/li&gt;
&lt;li&gt;On-demand creation and deletion&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Install vcluster CLI&lt;/span&gt;

curl  &lt;span class="nt"&gt;-L&lt;/span&gt;  &lt;span class="nt"&gt;-o&lt;/span&gt;  vcluster  &lt;span class="s2"&gt;"&amp;lt;https://github.com/loft-sh/vcluster/releases/latest/download/vcluster-linux-amd64&amp;gt;"&lt;/span&gt;

&lt;span class="nb"&gt;chmod&lt;/span&gt;  +x  vcluster &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo  mv  &lt;/span&gt;vcluster  /usr/local/bin

&lt;span class="c"&gt;# Create a virtual cluster&lt;/span&gt;

vcluster  create  my-vcluster  &lt;span class="nt"&gt;--namespace&lt;/span&gt;  team-a

&lt;span class="c"&gt;# Connect to it (sets KUBECONFIG automatically)&lt;/span&gt;

vcluster  connect  my-vcluster  &lt;span class="nt"&gt;--namespace&lt;/span&gt;  team-a

&lt;span class="c"&gt;# Now kubectl talks to the vcluster&lt;/span&gt;

kubectl  get  nodes

kubectl  create  deployment  nginx  &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nginx

&lt;span class="c"&gt;# Disconnect&lt;/span&gt;

vcluster  disconnect

&lt;span class="c"&gt;# Delete&lt;/span&gt;

vcluster  delete  my-vcluster  &lt;span class="nt"&gt;--namespace&lt;/span&gt;  team-a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Full Kubernetes API isolation per tenant — no shared API server blast radius&lt;/li&gt;
&lt;li&gt;10-second spin-up — fastest cluster creation of all solutions reviewed&lt;/li&gt;
&lt;li&gt;No extra hardware — reuses host cluster's nodes entirely&lt;/li&gt;
&lt;li&gt;Works on any cloud or on-premises Kubernetes&lt;/li&gt;
&lt;li&gt;Cost-efficient multi-tenancy at scale&lt;/li&gt;
&lt;li&gt;Each team gets the full &lt;code&gt;kubectl&lt;/code&gt; experience&lt;/li&gt;
&lt;li&gt;Easy to create and delete on demand for short-lived environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Not standalone — requires a host Kubernetes cluster to exist first&lt;/li&gt;
&lt;li&gt;Cannot create real nodes — virtual only&lt;/li&gt;
&lt;li&gt;Advanced networking between vclusters is complex&lt;/li&gt;
&lt;li&gt;Some cluster-scoped resources (like ClusterRoles and CRDs) are not fully isolated&lt;/li&gt;
&lt;li&gt;Requires privileged pod access on the host cluster&lt;/li&gt;
&lt;li&gt;Newer project — less battle-tested than K3s or Minikube&lt;/li&gt;
&lt;li&gt;Node-level debugging is limited&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Multi-tenant development environments, per-team isolated clusters, and CI/CD environments where many short-lived clusters need to be spun up and torn down rapidly on existing infrastructure.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  6. k0s — Zero Dependencies, Zero Friction
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;k0s (pronounced "kay-zeros") from Mirantis lives up to its name: zero host OS dependencies. It is a single binary that includes everything needed to run Kubernetes without requiring any specific kernel modules, swap configuration, or package manager. It works on any Linux distribution out of the box.&lt;/p&gt;

&lt;p&gt;k0s uses an eBPF-based CNI called kube-router, includes Autopilot for automated upgrades, and offers FIPS 140-2 compliance — a feature set that appeals strongly to regulated industries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;k0s&lt;/strong&gt; prioritises deployment universality. By bundling containerd and all CNI plugins into the binary itself and requiring no kernel module configuration from the host OS, it can be dropped onto virtually any Linux system and run. The eBPF-based kube-router CNI offers modern packet processing without iptables overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
┌──────────────────────────────────────────────────────┐
│             k0s binary (systemd / OpenRC)            │
│                                                      │
│  ┌──────────────────────────┐                        │
│  │     k0s controller       │                        │
│  │   (Control Plane)        │───────────────┐        │
│  │                          │               │        │
│  │  • API Server            │               ▼        │
│  │  • etcd (embedded)       │   ┌─────────────────┐  │
│  │  • Scheduler             │   │  k0s worker 1   │  │
│  │  • Controller Manager    │   │                 │  │
│  │  • containerd            │   │  • kubelet      │  │
│  │  • kube-router (eBPF)    │──▶│  • kube-router  │  │
│  │  • Autopilot updater     │   │  • containerd   │  │
│  └──────────────────────────┘   │  • Pods         │  │
│                                  └─────────────────┘ │
│  k0sctl tool → manages cluster lifecycle             │
└──────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Truly zero host OS dependencies — no kernel module requirements&lt;/li&gt;
&lt;li&gt;FIPS 140-2 compliance mode available&lt;/li&gt;
&lt;li&gt;eBPF-based networking via kube-router&lt;/li&gt;
&lt;li&gt;Autopilot automated upgrades&lt;/li&gt;
&lt;li&gt;k0sctl for full cluster lifecycle management&lt;/li&gt;
&lt;li&gt;ARM64 native support&lt;/li&gt;
&lt;li&gt;Air-gap install support&lt;/li&gt;
&lt;li&gt;Works on any Linux OS (Debian, RHEL, Alpine, CoreOS, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Download k0s&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sSLf&lt;/span&gt;  &amp;lt;https://get.k0s.sh&amp;gt; | &lt;span class="nb"&gt;sudo  &lt;/span&gt;sh

&lt;span class="c"&gt;# Install and start as a service&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;k0s  &lt;span class="nb"&gt;install  &lt;/span&gt;controller  &lt;span class="nt"&gt;--single&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;k0s  start

&lt;span class="c"&gt;# Get kubeconfig&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;k0s  kubeconfig  admin &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/.kube/config

&lt;span class="c"&gt;# Check cluster&lt;/span&gt;

kubectl  get  nodes

&lt;span class="c"&gt;# Add a worker node — generate join token on controller&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;k0s  token  create  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;worker

&lt;span class="c"&gt;# On the worker node&lt;/span&gt;

&lt;span class="nb"&gt;sudo  &lt;/span&gt;k0s  &lt;span class="nb"&gt;install  &lt;/span&gt;worker  &lt;span class="nt"&gt;--token-file&lt;/span&gt;  /path/to/token

&lt;span class="nb"&gt;sudo  &lt;/span&gt;k0s  start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Truly zero host OS dependencies — works on any Linux, no special kernel configuration&lt;/li&gt;
&lt;li&gt;FIPS 140-2 compliance for regulated industries&lt;/li&gt;
&lt;li&gt;eBPF-based networking with kube-router is modern and efficient&lt;/li&gt;
&lt;li&gt;Autopilot handles automated upgrades safely&lt;/li&gt;
&lt;li&gt;k0sctl provides a proper cluster lifecycle management tool&lt;/li&gt;
&lt;li&gt;No swap or kernel module pre-requirements&lt;/li&gt;
&lt;li&gt;Air-gap support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Smaller community than K3s or MicroK8s&lt;/li&gt;
&lt;li&gt;Less rich addon ecosystem&lt;/li&gt;
&lt;li&gt;k0sctl adds an additional tool to the workflow&lt;/li&gt;
&lt;li&gt;Some CNI plugins need manual configuration beyond kube-router&lt;/li&gt;
&lt;li&gt;Enterprise support is a paid product from Mirantis&lt;/li&gt;
&lt;li&gt;Fewer third-party integrations and tutorials&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Environments where host OS diversity is a challenge&lt;/strong&gt; — mixed Linux distributions, heavily locked-down servers, or compliance-driven deployments needing FIPS 140-2.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. RKE2 — Security-First Enterprise K8s
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;RKE2 (Rancher Kubernetes Engine 2) is the enterprise evolution of K3s. Where K3s optimises for minimal resource usage and edge deployability, RKE2 optimises for security hardening and compliance. It ships hardened by default with CIS Kubernetes Benchmark compliance, FIPS 140-2 support, automatic etcd snapshots, and deep Rancher integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RKE2&lt;/strong&gt; starts from K3s's architecture and adds a hardening layer: Pod Security Admission enforced by default, etcd encryption at rest, CIS-compliant API server flags, audit logging enabled, and Canal CNI with network policy enforcement. It is Kubernetes made appropriate for government and financial sector requirements.&lt;/p&gt;

&lt;p&gt;If K3s is the lightweight sports car, RKE2 is the armoured vehicle. More resource-intensive, harder to damage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────────────────────────────────────────────┐
│              RKE2 Server (Hardened Control Plane)         │
│                                                           │
│  ┌──────────────────────────────────────────────────────┐ │
│  │  CIS-Hardened Kubernetes                             │ │
│  │                                                      │ │
│  │  • Hardened API Server (PSP enforced)                │ │
│  │  • etcd with automated snapshots                     │ │
│  │  • Hardened Scheduler &amp;amp; Controller Manager           │ │
│  │  • Canal / Calico / Cilium CNI (configurable)        │ │
│  │  • containerd runtime                                │ │
│  │  • Cert-manager + auto rotation                      │ │
│  └──────────────────────────────────────────────────────┘ │
│                    │                                      │
│          ┌─────────┴──────────┐                           │
│          ▼                    ▼                           │
│  ┌────────────────┐  ┌────────────────┐                   │
│  │  RKE2 Agent 1  │  │  RKE2 Agent 2  │                   │
│  │  (Worker)      │  │  (Worker)      │                   │
│  └────────────────┘  └────────────────┘                   │
│                                                           │
│  ┌──────────────────────────────────────────────────────┐ │
│  │  Rancher Management Plane (optional)                 │ │
│  └──────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CIS Kubernetes Benchmark v1.6 compliant by default&lt;/li&gt;
&lt;li&gt;FIPS 140-2 cryptographic compliance&lt;/li&gt;
&lt;li&gt;etcd with automated periodic snapshots and restoration&lt;/li&gt;
&lt;li&gt;Multiple CNI options: Canal (default), Calico, Cilium&lt;/li&gt;
&lt;li&gt;Automated certificate rotation&lt;/li&gt;
&lt;li&gt;Helm chart integration&lt;/li&gt;
&lt;li&gt;Air-gap install support&lt;/li&gt;
&lt;li&gt;Deep Rancher management platform integration&lt;/li&gt;
&lt;li&gt;Role-based node configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Install RKE2 server&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  &amp;lt;https://get.rke2.io&amp;gt; | sh  -

systemctl  &lt;span class="nb"&gt;enable  &lt;/span&gt;rke2-server.service

systemctl  start  rke2-server.service

&lt;span class="c"&gt;# Get kubeconfig&lt;/span&gt;

&lt;span class="nb"&gt;export  &lt;/span&gt;&lt;span class="nv"&gt;KUBECONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/etc/rancher/rke2/rke2.yaml

&lt;span class="c"&gt;# Get join token for workers&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt;  /var/lib/rancher/rke2/server/node-token

&lt;span class="c"&gt;# On worker nodes&lt;/span&gt;

curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  &amp;lt;https://get.rke2.io&amp;gt; | &lt;span class="nv"&gt;INSTALL_RKE2_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"agent"&lt;/span&gt;  sh  -

&lt;span class="nb"&gt;mkdir&lt;/span&gt;  &lt;span class="nt"&gt;-p&lt;/span&gt;  /etc/rancher/rke2/

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/rancher/rke2/config.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;

server: https://&amp;lt;SERVER_IP&amp;gt;:9345

token: &amp;lt;NODE_TOKEN&amp;gt;
&lt;/span&gt;&lt;span class="no"&gt;
EOF

&lt;/span&gt;systemctl  &lt;span class="nb"&gt;enable  &lt;/span&gt;rke2-agent.service

systemctl  start  rke2-agent.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CIS Kubernetes Benchmark compliance out of the box — no manual hardening&lt;/li&gt;
&lt;li&gt;FIPS 140-2 for regulated environments (finance, government, healthcare)&lt;/li&gt;
&lt;li&gt;Automated etcd snapshots — point-in-time restore capability&lt;/li&gt;
&lt;li&gt;Multiple CNI choices (Canal, Calico, Cilium) for varied network requirements&lt;/li&gt;
&lt;li&gt;Excellent Rancher multi-cluster management integration&lt;/li&gt;
&lt;li&gt;Automated certificate rotation&lt;/li&gt;
&lt;li&gt;Strong air-gap support for isolated environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;4 GB RAM minimum makes it unsuitable for edge/IoT&lt;/li&gt;
&lt;li&gt;Longer startup time (~2 minutes)&lt;/li&gt;
&lt;li&gt;More operationally complex than K3s&lt;/li&gt;
&lt;li&gt;Overkill for non-compliance use cases&lt;/li&gt;
&lt;li&gt;Tightly coupled to the Rancher ecosystem&lt;/li&gt;
&lt;li&gt;Larger binary and resource footprint&lt;/li&gt;
&lt;li&gt;etcd only — no SQLite lightweight option&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best For
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Enterprise, compliance-driven, and government workloads&lt;/strong&gt; where security hardening and audit-readiness are non-negotiable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scoring Across 8 Dimensions
&lt;/h2&gt;




&lt;p&gt;Scores are relative (1–10, higher is better for most dimensions):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;KIND&lt;/th&gt;
&lt;th&gt;Minikube&lt;/th&gt;
&lt;th&gt;MicroK8s&lt;/th&gt;
&lt;th&gt;K3s&lt;/th&gt;
&lt;th&gt;Vcluster&lt;/th&gt;
&lt;th&gt;k0s&lt;/th&gt;
&lt;th&gt;RKE2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ease of use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Production readiness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource efficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-node support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Addon ecosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edge / IoT fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-tenancy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD suitability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Use Case Decision Guide
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Your Situation&lt;/th&gt;
&lt;th&gt;Best Choice&lt;/th&gt;
&lt;th&gt;Runner-Up&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Actions / GitLab CI pipelines&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;KIND&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vcluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local development on macOS/Windows/Linux&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Minikube&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MicroK8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer on Ubuntu workstation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MicroK8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raspberry Pi cluster at home&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;K3s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MicroK8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Industrial IoT / factory floor&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;K3s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;k0s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARM-based edge server&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;K3s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MicroK8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production workload on lightweight infra&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;K3s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MicroK8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Government / regulated enterprise&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RKE2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;k0s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FIPS 140-2 compliance required&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RKE2 or k0s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-tenant dev environments&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Vcluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Namespace isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-team isolated clusters&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Vcluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;KIND&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed Linux OS fleet&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;k0s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Air-gap / offline environment&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;K3s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;k0s or RKE2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing Kubernetes itself&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;KIND&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HA on bare metal with minimal ops&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MicroK8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K3s embedded etcd&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes with Rancher management&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RKE2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  The Decision Tree
&lt;/h3&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Do you need production-grade?
├── No → Is it for CI/CD testing?
│         ├── Yes → KIND
│         └── No  → Are you on Ubuntu?
│                   ├── Yes → MicroK8s
│                   └── No  → Minikube
└── Yes → Do you need compliance (FIPS/CIS)?
          ├── Yes → RKE2 (CIS+FIPS) or k0s (FIPS)
          └── No  → Is it edge/IoT/ARM?
                    ├── Yes → K3s
                    └── No  → Need multi-tenancy?
                              ├── Yes → Vcluster
                              └── No  → K3s or MicroK8s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;




&lt;p&gt;After a thorough review, the landscape shakes out clearly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;K3s&lt;/strong&gt; is the most remarkable project in the lightweight Kubernetes space. It delivers a complete, CNCF-certified Kubernetes distribution in under 100 MB, runs on 512 MB of RAM, and works in air-gapped ARM environments. For the vast majority of production lightweight Kubernetes use cases, K3s is the correct answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vcluster&lt;/strong&gt; solves a problem no other distribution addresses: genuine Kubernetes API-level multi-tenancy without dedicated hardware. If you need to give 10 teams their own isolated clusters, Vcluster is the only sensible approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KIND&lt;/strong&gt; is indispensable for CI/CD. If you run Kubernetes integration tests in any CI system, KIND's 30-second, Docker-native, multi-node clusters are the right tool with no close competitor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minikube&lt;/strong&gt; remains the best onboarding experience for developers who are new to Kubernetes. The addon ecosystem and built-in dashboard lower the barrier to entry substantially.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MicroK8s&lt;/strong&gt; is the best Kubernetes for Ubuntu. If your team lives on Ubuntu workstations and servers, snap-based installation, self-healing, and dqlite HA make it the most frictionless operational experience on that platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;k0s&lt;/strong&gt; fills an important niche: mixed Linux fleets and environments where zero host OS dependencies matter more than community size or addon richness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RKE2&lt;/strong&gt; is the right answer when your compliance officer needs CIS Kubernetes Benchmark and FIPS 140-2. The resource overhead is the price of admission to heavily regulated sectors.&lt;/p&gt;




&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;KIND Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://minikube.sigs.k8s.io/docs/" rel="noopener noreferrer"&gt;Minikube Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://microk8s.io/docs" rel="noopener noreferrer"&gt;MicroK8s Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.k3s.io/" rel="noopener noreferrer"&gt;K3s Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.vcluster.com/docs" rel="noopener noreferrer"&gt;Vcluster Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.k0sproject.io/" rel="noopener noreferrer"&gt;k0s Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.rke2.io/" rel="noopener noreferrer"&gt;RKE2 Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/certification/software-conformance/" rel="noopener noreferrer"&gt;CNCF Certified Kubernetes Conformance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This post was written in April 2025. Kubernetes moves fast — always check the official documentation for the latest version information.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt;  &lt;code&gt;kubernetes&lt;/code&gt;  &lt;code&gt;k8s&lt;/code&gt;  &lt;code&gt;k3s&lt;/code&gt;  &lt;code&gt;kind&lt;/code&gt;  &lt;code&gt;minikube&lt;/code&gt;  &lt;code&gt;microk8s&lt;/code&gt;  &lt;code&gt;vcluster&lt;/code&gt;  &lt;code&gt;k0s&lt;/code&gt;  &lt;code&gt;rke2&lt;/code&gt;  &lt;code&gt;devops&lt;/code&gt;  &lt;code&gt;infrastructure&lt;/code&gt;  &lt;code&gt;edge-computing&lt;/code&gt;  &lt;code&gt;cloud-native&lt;/code&gt;  &lt;code&gt;containers&lt;/code&gt;  &lt;code&gt;cncf&lt;/code&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Running k3s on Proxmox: A Multi-Node Cluster with a VM and LXC Worker — The Hard Way and Back</title>
      <dc:creator>Pendela BhargavaSai</dc:creator>
      <pubDate>Tue, 21 Apr 2026 03:30:00 +0000</pubDate>
      <link>https://dev.to/pendelabhargavasai/running-k3s-on-proxmox-a-multi-node-cluster-with-a-vm-and-lxc-worker-the-hard-way-and-back-1cb4</link>
      <guid>https://dev.to/pendelabhargavasai/running-k3s-on-proxmox-a-multi-node-cluster-with-a-vm-and-lxc-worker-the-hard-way-and-back-1cb4</guid>
      <description>&lt;p&gt;&lt;em&gt;A practical guide covering installation, troubleshooting, and the real story of getting k3s to run inside an LXC container&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvo4jx0xy1tzo3m10e1pj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvo4jx0xy1tzo3m10e1pj.png" alt=" " width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;




&lt;p&gt;Kubernetes is powerful but notorious for being heavy. k3s, the lightweight Kubernetes distribution from Rancher, fixes that. It strips out legacy APIs, bundles containerd, and ships as a single binary under 100MB. It is perfect for homelabs, edge deployments, and resource-constrained environments.&lt;br&gt;
(more about k3s: &lt;a href="https://traefik.io/glossary/k3s-explained/?ref=adventuresintech.org" rel="noopener noreferrer"&gt;https://traefik.io/glossary/k3s-explained/&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;This is the first of a series of posts describing how to bootstrap a Kubernetes cluster on &lt;a href="https://proxmox.com/?ref=adventuresintech.org" rel="noopener noreferrer"&gt;Proxmox&lt;/a&gt; using ubuntu VM and LXC containers. By the end of the series, the aim is to have a fully working Kubernetes (&lt;a href="https://k3s.io/?ref=adventuresintech.org" rel="noopener noreferrer"&gt;K3S&lt;/a&gt;) install including &lt;a href="https://metallb.universe.tf/?ref=adventuresintech.org" rel="noopener noreferrer"&gt;MetalLB&lt;/a&gt; load balancer, &lt;a href="https://gateway-api.sigs.k8s.io/guides/getting-started/" rel="noopener noreferrer"&gt;Gateway API&lt;/a&gt; controller and an Istio service mesh. I’ll also have some sample applications installed for good measure.&lt;/p&gt;


&lt;h2&gt;
  
  
  Basically why do I need a Kubernetes cluster ?
&lt;/h2&gt;



&lt;p&gt;At work, I’ve used large K8S clusters in production environments (AWS), clusters are abstracted away behind platform teams, which is efficient for delivery but leaves gaps in understanding how scheduling, networking, storage, and controllers really behave under the hood. Setting up your own cluster gives you that missing layer of operational intuition: you get to break things, debug them, and understand why they broke. For someone already running a fairly complex home setup, using Kubernetes as a unifying platform to experiment, whether or not you fully migrate all your Docker Compose stacks—is less about necessity and more about building practical, transferable expertise.&lt;/p&gt;

&lt;p&gt;In this post I document how I built a three-node k3s cluster on Proxmox VE with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;1 master node&lt;/strong&gt; — a Proxmox VM running Ubuntu&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;1 VM worker node&lt;/strong&gt; — a standard Proxmox VM (worker1)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;1 LXC worker node&lt;/strong&gt; — a Proxmox LXC container (worker2)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The VM setup was straightforward. The LXC setup was not. This post focuses heavily on the LXC journey — the errors, the fixes, the Linux internals involved, and what it finally took to make it work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkgf71akebq4knbgx4fi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkgf71akebq4knbgx4fi.png" width="476" height="106"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Part 1: Setting Up the Master Node
&lt;/h2&gt;


&lt;h3&gt;
  
  
  &lt;em&gt;Installing k3s Server&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;On the master VM, installing k3s is a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  https://get.k3s.io | sh  -

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;k3s sets up a systemd service, installs containerd, and bootstraps a single-node Kubernetes cluster automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;u&gt;Fixing kubectl Access&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;After installation, running &lt;code&gt;kubectl get nodes&lt;/code&gt; immediately fails:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;The connection to the server localhost:8080 was refused&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;This happens because kubectl defaults to &lt;code&gt;localhost:8080&lt;/code&gt; when no kubeconfig is set. k3s stores its kubeconfig at &lt;code&gt;/etc/rancher/k3s/k3s.yaml&lt;/code&gt;. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt;  &lt;span class="nt"&gt;-p&lt;/span&gt;  ~/.kube

&lt;span class="nb"&gt;sudo  cp&lt;/span&gt;  /etc/rancher/k3s/k3s.yaml  ~/.kube/config

&lt;span class="nb"&gt;sudo  chown&lt;/span&gt;  &lt;span class="nv"&gt;$USER&lt;/span&gt;:&lt;span class="nv"&gt;$USER&lt;/span&gt;  ~/.kube/config

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or export it permanently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt;  &lt;span class="s1"&gt;'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc

&lt;span class="nb"&gt;source&lt;/span&gt;  ~/.bashrc

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Retrieve the Node Token
&lt;/h3&gt;

&lt;p&gt;Worker nodes need a token to join the cluster. Grab it from the master:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nb"&gt;sudo  cat&lt;/span&gt;  /var/lib/rancher/k3s/server/node-token

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep this value — it is used in every worker join command.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: Adding the VM Worker (worker1)
&lt;/h2&gt;




&lt;h3&gt;
  
  
  &lt;em&gt;Joining the Cluster&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;On the worker VM, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  https://get.k3s.io | &lt;span class="se"&gt;\&lt;/span&gt;

&lt;span class="nv"&gt;K3S_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://192.168.1.44:6443  &lt;span class="se"&gt;\&lt;/span&gt;

&lt;span class="nv"&gt;K3S_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;node-token&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;

sh  -

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;u&gt;Problem: Node Password Rejected&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;The agent started but immediately logged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Node password rejected, duplicate hostname or contents of

'/etc/rancher/node/password' may not match server node-passwd entry

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This happened because the worker VM had previously joined the cluster. k3s stores a node password on both the node (&lt;code&gt;/etc/rancher/node/password&lt;/code&gt;) and the master (as a Kubernetes secret). When they don't match, the server rejects the node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix — on the worker:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nb"&gt;sudo  &lt;/span&gt;systemctl  stop  k3s-agent

&lt;span class="nb"&gt;sudo  rm&lt;/span&gt;  &lt;span class="nt"&gt;-f&lt;/span&gt;  /etc/rancher/node/password

&lt;span class="nb"&gt;sudo  rm&lt;/span&gt;  &lt;span class="nt"&gt;-rf&lt;/span&gt;  /var/lib/rancher/k3s/agent/

&lt;span class="nb"&gt;sudo  &lt;/span&gt;systemctl  start  k3s-agent

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix — on the master, delete the stale secret:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
kubectl  get  secrets  &lt;span class="nt"&gt;-n&lt;/span&gt;  kube-system | &lt;span class="nb"&gt;grep  &lt;/span&gt;node-password

kubectl  delete  secret  worker1.node-password.k3s  &lt;span class="nt"&gt;-n&lt;/span&gt;  kube-system

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;u&gt;Problem: Duplicate Hostname&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;Both the master and worker had the hostname &lt;code&gt;k3s&lt;/code&gt;. k3s uses the hostname as the node name, so the server rejected the second node as a duplicate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix — rename the worker:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nb"&gt;sudo  &lt;/span&gt;hostnamectl  set-hostname  worker1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After renaming and cleaning up the stale secret, the worker joined successfully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: The LXC Worker — The Real Story
&lt;/h2&gt;




&lt;h3&gt;
  
  
  What is an LXC Container?
&lt;/h3&gt;

&lt;p&gt;LXC (Linux Containers) is a lightweight virtualisation technology. Unlike VMs which emulate full hardware, LXC containers share the host kernel directly. They use Linux namespaces for isolation and cgroups for resource control. They are faster and more efficient than VMs but have less isolation.&lt;/p&gt;

&lt;p&gt;Proxmox LXC containers can be &lt;strong&gt;privileged&lt;/strong&gt; (root inside = root on host) or &lt;strong&gt;unprivileged&lt;/strong&gt; (root inside maps to a regular user on host via UID namespacing). Unprivileged is the default and more secure option.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the LXC Container
&lt;/h3&gt;

&lt;p&gt;In Proxmox, I created a Debian Trixie LXC container with:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xq11x12p45pm40cx5l9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xq11x12p45pm40cx5l9.png" width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Joining the Cluster&lt;/em&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
curl  &lt;span class="nt"&gt;-sfL&lt;/span&gt;  https://get.k3s.io | &lt;span class="se"&gt;\&lt;/span&gt;

&lt;span class="nv"&gt;K3S_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://192.168.1.44:6443  &lt;span class="se"&gt;\&lt;/span&gt;

&lt;span class="nv"&gt;K3S_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;node-token&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;

sh  -

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The install script ran and printed &lt;code&gt;[INFO] systemd: Starting k3s-agent&lt;/code&gt; — and then nothing. It just hung.&lt;/p&gt;

&lt;p&gt;Checking the journal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
journalctl  &lt;span class="nt"&gt;-u&lt;/span&gt;  k3s-agent  &lt;span class="nt"&gt;-f&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;u&gt;Error 1: &lt;code&gt;/dev/kmsg: no such file or directory&lt;/code&gt;&lt;/u&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;
Error: failed to run Kubelet: failed to create kubelet: open /dev/kmsg: no such file or directory

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is &lt;code&gt;/dev/kmsg&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/dev/kmsg&lt;/code&gt; is the kernel message buffer device. The Linux kernel uses it to log messages (this is what &lt;code&gt;dmesg&lt;/code&gt; reads). kubelet uses it to watch for OOM (Out of Memory) kill events via the &lt;code&gt;oomWatcher&lt;/code&gt;. Without it, kubelet refuses to start.&lt;/p&gt;

&lt;p&gt;In an unprivileged LXC container, &lt;code&gt;/dev/kmsg&lt;/code&gt; does not exist because the container does not have access to kernel devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix — bind mount from host:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;/etc/pve/lxc/209.conf&lt;/code&gt; on the Proxmox host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;
&lt;span class="n"&gt;lxc&lt;/span&gt;.&lt;span class="n"&gt;mount&lt;/span&gt;.&lt;span class="n"&gt;entry&lt;/span&gt;: /&lt;span class="n"&gt;dev&lt;/span&gt;/&lt;span class="n"&gt;kmsg&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;/&lt;span class="n"&gt;kmsg&lt;/span&gt; &lt;span class="n"&gt;none&lt;/span&gt; &lt;span class="n"&gt;bind&lt;/span&gt;,&lt;span class="n"&gt;create&lt;/span&gt;=&lt;span class="n"&gt;file&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bind mounts the host's &lt;code&gt;/dev/kmsg&lt;/code&gt; into the container. Stop and start (not restart) the LXC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
pct  stop  209

pct  start  209

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;u&gt;Error 2: &lt;code&gt;/dev/kmsg: operation not permitted&lt;/code&gt;&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;After adding the bind mount, the error changed slightly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;
open /dev/kmsg: operation not permitted

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The file now existed in the container but the process was not allowed to open it. The container was still running in user namespace mode (unprivileged), and AppArmor was blocking the access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix — disable AppArmor restriction:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;
&lt;span class="n"&gt;lxc&lt;/span&gt;.&lt;span class="n"&gt;apparmor&lt;/span&gt;.&lt;span class="n"&gt;profile&lt;/span&gt;: &lt;span class="n"&gt;unconfined&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AppArmor is a Linux Security Module that applies mandatory access control policies. The default Proxmox LXC AppArmor profile blocks access to kernel devices like &lt;code&gt;/dev/kmsg&lt;/code&gt;. Setting it to &lt;code&gt;unconfined&lt;/code&gt; removes all AppArmor restrictions for this container.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;u&gt;Error 3: &lt;code&gt;/proc/sys/kernel/panic: read-only file system&lt;/code&gt;&lt;/u&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;
Failed to start ContainerManager:

open /proc/sys/kernel/panic: read-only file system

open /proc/sys/kernel/panic_on_oops: read-only file system

open /proc/sys/vm/overcommit_memory: read-only file system

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is &lt;code&gt;/proc/sys&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/proc&lt;/code&gt; is a virtual filesystem the kernel exposes so userspace can read and write kernel parameters. &lt;code&gt;/proc/sys/&lt;/code&gt; specifically contains sysctl values — tuneable kernel settings.&lt;/p&gt;

&lt;p&gt;kubelet needs to write to these on startup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;kernel/panic&lt;/code&gt; — configure kernel panic timeout&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;kernel/panic_on_oops&lt;/code&gt; — whether a kernel oops causes a panic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;vm/overcommit_memory&lt;/code&gt; — memory overcommit policy&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In an unprivileged LXC container, &lt;code&gt;/proc&lt;/code&gt; is mounted read-only for safety. Any process inside the container (even root inside) cannot modify these values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix — mount proc and sys as read-write:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;
&lt;span class="n"&gt;lxc&lt;/span&gt;.&lt;span class="n"&gt;mount&lt;/span&gt;.&lt;span class="n"&gt;auto&lt;/span&gt;: &lt;span class="s2"&gt;"proc:rw sys:rw"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells LXC to mount &lt;code&gt;/proc&lt;/code&gt; and &lt;code&gt;/sys&lt;/code&gt; with read-write access instead of the default read-only.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;u&gt;Error 4: Various Permission Denied Errors&lt;/u&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;
write /proc/self/oom_score_adj: permission denied

Failed to set sysctl: open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These were caused by the container still running as unprivileged — the process was root inside the container but mapped to a normal user on the host, so many privileged operations were blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix — switch to privileged container:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;
&lt;span class="py"&gt;unprivileged&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the most significant change. A privileged container maps root inside to actual root on the host. This removes the UID namespace remapping that caused most of the permission errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Also needed:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;
&lt;span class="n"&gt;lxc&lt;/span&gt;.&lt;span class="n"&gt;cgroup2&lt;/span&gt;.&lt;span class="n"&gt;devices&lt;/span&gt;.&lt;span class="n"&gt;allow&lt;/span&gt;: &lt;span class="n"&gt;a&lt;/span&gt;

&lt;span class="n"&gt;lxc&lt;/span&gt;.&lt;span class="n"&gt;cap&lt;/span&gt;.&lt;span class="n"&gt;drop&lt;/span&gt;:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;cgroup2.devices.allow: a&lt;/code&gt; — allows the container access to all devices via the cgroup device controller&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;cap.drop:&lt;/code&gt; (empty) — prevents Proxmox from dropping any Linux capabilities. By default, Proxmox drops capabilities like &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt;, &lt;code&gt;CAP_NET_ADMIN&lt;/code&gt;, and &lt;code&gt;CAP_SYS_PTRACE&lt;/code&gt; from LXC containers. k3s needs these.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Also needed: &lt;code&gt;features: keyctl=1,nesting=1&lt;/code&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;keyctl=1&lt;/code&gt; — enables the Linux kernel keyring inside the container. containerd uses this to securely store credentials and keys for image pulls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;nesting=1&lt;/code&gt; — enables nested containerisation. k3s runs containerd inside the LXC container, and containerd runs pods (more containers) inside itself. Without nesting enabled, Proxmox blocks the inner container creation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;u&gt;Final Working LXC Config&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;After applying all these changes and doing a full &lt;code&gt;pct stop&lt;/code&gt; / &lt;code&gt;pct start&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
journalctl  &lt;span class="nt"&gt;-u&lt;/span&gt;  k3s-agent  &lt;span class="nt"&gt;-f&lt;/span&gt;

&lt;span class="c"&gt;# ... containerd is now running&lt;/span&gt;

&lt;span class="c"&gt;# ... Server ACTIVE&lt;/span&gt;

&lt;span class="c"&gt;# ... Started kubelet&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Summary: What Each Modification Does
&lt;/h2&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjchnlx7oapagd55avh30.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjchnlx7oapagd55avh30.png" width="800" height="1022"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: LXC as a k3s Worker — Features and Limitations
&lt;/h2&gt;




&lt;h3&gt;
  
  
  &lt;u&gt;&lt;em&gt;Features / Advantages&lt;/em&gt;&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Resource efficiency&lt;/strong&gt; — LXC containers consume significantly less memory and CPU than VMs. A VM needs a full OS kernel in memory. An LXC container shares the host kernel, so the overhead is minimal. worker2 running k3s uses around 250–300MB RAM idle versus a VM which would use 500MB+ for the OS alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fast startup&lt;/strong&gt; — LXC containers start in 1–3 seconds versus 15–30 seconds for a VM. For ephemeral worker nodes or autoscaling scenarios this matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage efficiency&lt;/strong&gt; — LXC uses the host filesystem directly (with a root filesystem overlay). No separate virtual disk emulation layer. I/O is closer to bare metal performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple networking&lt;/strong&gt; — LXC containers participate in the same Proxmox bridge (&lt;code&gt;vmbr0&lt;/code&gt;) as VMs. No extra networking configuration is needed for k3s to communicate between the master VM and the LXC worker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Density&lt;/strong&gt; — you can run more LXC containers on the same Proxmox host than VMs, making it ideal for testing multi-node cluster topologies on limited hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;u&gt;&lt;em&gt;Limitations&lt;/em&gt;&lt;/u&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Shared kernel — no kernel version isolation&lt;/strong&gt; — all LXC containers on a host run the same kernel version as the host. You cannot run a different kernel inside an LXC container. This matters if you need a specific kernel feature or version for your workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privileged mode is a security trade-off&lt;/strong&gt; — to get k3s working we had to switch to a privileged container and disable AppArmor. In a privileged container, a root escape inside the container gives root on the host. For a homelab or trusted environment this is acceptable; for production or multi-tenant setups it is a significant risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No hardware virtualisation&lt;/strong&gt; — LXC containers cannot run nested VMs. If your workloads need hardware-level isolation or GPU passthrough in the container, a VM is required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kernel module limitations&lt;/strong&gt; — the LXC container cannot load kernel modules that aren't already loaded on the host. During setup we saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
modprobe: FATAL: Module br_netfilter not found

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These modules need to be loaded on the Proxmox host, not inside the container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Some syscalls are blocked&lt;/strong&gt; — even in privileged mode, certain syscalls that could affect the host are restricted. This can cause subtle compatibility issues with some container workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not suitable for untrusted workloads&lt;/strong&gt; — because the kernel is shared, a kernel exploit inside an LXC container could theoretically affect the host and all other containers. Never run untrusted code in a privileged LXC container.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;




&lt;p&gt;Getting k3s running on a Proxmox LXC container is absolutely possible, but it requires understanding why each restriction exists and selectively removing the ones that conflict with k3s's requirements. The journey from a blank LXC to a working cluster node touched on AppArmor, Linux capabilities, cgroups, kernel device access, namespace nesting, and virtual filesystem permissions.&lt;/p&gt;

&lt;p&gt;The key takeaway: LXC containers are not VMs. They share the host kernel, and every security restriction that makes them safe is also a potential blocker for complex software like k3s that expects a full OS environment. The solution is not to blindly disable everything — it is to understand each error, trace it to the underlying Linux feature, and make the minimal change required to unblock it.&lt;/p&gt;

&lt;p&gt;The final cluster — one control plane VM and two workers (one VM, one LXC) — runs stably with k3s managing scheduling, networking, and DNS across all three nodes via CoreDNS.&lt;/p&gt;

&lt;p&gt;I now have a vanilla multi-node Kubernetes cluster running in a Ubuntu VM and an LXC container and accessible from my machine. It’s got nothing deployed inside it yet, but that’s easily fixed.... see u in part 2.&lt;/p&gt;




&lt;p&gt;*Built on Proxmox VE with k3s v1.34.6+k3s1 — Debian Trixie LXC — Ubuntu VM nodes&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>linux</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>"Why can’t I just mount S3 like a drive?” AWS finally answering that question in 2026</title>
      <dc:creator>Pendela BhargavaSai</dc:creator>
      <pubDate>Sun, 12 Apr 2026 13:35:35 +0000</pubDate>
      <link>https://dev.to/pendelabhargavasai/why-cant-i-just-mount-s3-like-a-drive-aws-finally-answering-that-question-in-2026-4g00</link>
      <guid>https://dev.to/pendelabhargavasai/why-cant-i-just-mount-s3-like-a-drive-aws-finally-answering-that-question-in-2026-4g00</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;From "why can't I just mount S3 like a drive?" to AWS finally answering that question in 2026.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;I've had that conversation more times than I can count.&lt;/p&gt;

&lt;p&gt;A developer joins a new AWS project, looks at the architecture, and asks: &lt;em&gt;"We're already storing everything in S3 — why do we also need EFS? Can't we just mount S3 directly?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And every time, the answer was the same patient explanation about object storage vs file systems, why they're fundamentally different, and why you need separate services for separate workloads. It was the right answer. It just wasn't a satisfying one.&lt;/p&gt;

&lt;p&gt;That changed in April 2026 when AWS launched &lt;strong&gt;S3 Files&lt;/strong&gt; — and suddenly that conversation got a lot shorter.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fne1ezqqr8ls1axsuyqwh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fne1ezqqr8ls1axsuyqwh.png" alt=" " width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But before we get there, let's start from the beginning. Because understanding &lt;em&gt;why&lt;/em&gt; S3 Files matters requires understanding the problem it's solving. And that means understanding the full AWS storage landscape.&lt;/p&gt;


&lt;h2&gt;
  
  
  The AWS Storage Trinity (Before S3 Files)
&lt;/h2&gt;

&lt;p&gt;AWS has three primary storage services, each built for a completely different purpose. Engineers often get confused because on the surface they all seem to do the same thing: store data. But the &lt;em&gt;way&lt;/em&gt; they store it — and who can access it and how — is completely different.&lt;/p&gt;

&lt;p&gt;Here's the simplest way I know to think about it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt; is like a giant library. You can store billions of books (objects), and anyone with the right access can retrieve any book. But to fix a typo on page 47, you have to reprint the entire book.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EBS&lt;/strong&gt; is like a hard drive physically attached to your computer. Super fast, but only your computer can use it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EFS&lt;/strong&gt; is like a shared office filing cabinet on a network. Anyone in the office can open a drawer, pull out a folder, and edit a document — at the same time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's go deeper on each one.&lt;/p&gt;


&lt;h2&gt;
  
  
  Amazon S3 — Object Storage Built for Scale
&lt;/h2&gt;

&lt;p&gt;S3 (Simple Storage Service) launched in 2006 and fundamentally changed how the world thinks about storing data. The core idea is simple: you have &lt;strong&gt;buckets&lt;/strong&gt;, and inside buckets you store &lt;strong&gt;objects&lt;/strong&gt;. Each object is just a file plus its metadata, stored at a unique key (think of it like a URL).&lt;/p&gt;
&lt;h3&gt;
  
  
  What makes S3 special
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Virtually unlimited scale.&lt;/strong&gt; S3 stores more than 500 trillion objects across hundreds of exabytes today.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;11 nines of durability (99.999999999%).&lt;/strong&gt; AWS automatically replicates your data across at least three Availability Zones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pay only for what you use.&lt;/strong&gt; No minimum capacity, no infrastructure to manage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple storage classes.&lt;/strong&gt; From S3 Standard (~$0.023/GB) down to Glacier Deep Archive (~$0.00099/GB) for data you almost never touch.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The one thing S3 cannot do
&lt;/h3&gt;

&lt;p&gt;Here's the catch that trips everyone up: &lt;strong&gt;S3 is not a file system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you store something in S3, it becomes an immutable object. If you want to change even a single character in a file, you have to download the entire object, make your change, and re-upload the whole thing as a new object. There's no such thing as "open this file and edit line 47." That's just not how object storage works.&lt;/p&gt;

&lt;p&gt;This isn't a bug — it's by design. The immutability of objects is part of what makes S3 so durable and scalable. But it creates real friction for any workload that needs to &lt;em&gt;work with&lt;/em&gt; data the way normal applications do: open a file, read some bytes, write some bytes, save.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# What you can do with S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;myfile.txt s3://my-bucket/myfile.txt    &lt;span class="c"&gt;# upload&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;s3://my-bucket/myfile.txt ./myfile.txt  &lt;span class="c"&gt;# download&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://my-bucket/myfile.txt               &lt;span class="c"&gt;# delete&lt;/span&gt;

&lt;span class="c"&gt;# What you CANNOT do&lt;/span&gt;
&lt;span class="c"&gt;# Open myfile.txt and append a line — impossible without full re-upload&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihtdjmavdki0i9x7xit0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihtdjmavdki0i9x7xit0.jpg" alt=" " width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Amazon EBS — The Fast Attached Drive
&lt;/h2&gt;

&lt;p&gt;EBS (Elastic Block Store) is block storage — the AWS equivalent of an SSD attached directly to your server. When you launch an EC2 instance, the root volume (where the operating system lives) is an EBS volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  What EBS is good at
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed.&lt;/strong&gt; EBS delivers single-digit millisecond latency because it behaves like a local disk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;POSIX semantics.&lt;/strong&gt; You can open files, write individual bytes, seek to specific positions — everything a normal file system supports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency.&lt;/strong&gt; What you write is immediately readable. No eventual consistency concerns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The hard limit of EBS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;EBS volumes can only be attached to one EC2 instance at a time&lt;/strong&gt; (with some multi-attach exceptions for specific use cases). &lt;/p&gt;

&lt;p&gt;This means if you have a cluster of 10 EC2 instances all running your application, each one needs its own EBS volume. They can't share data through EBS. If instance A writes a file, instance B can't see it without some kind of sync mechanism.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EC2 Instance A  →  EBS Volume A  (can't share)
EC2 Instance B  →  EBS Volume B  (separate, isolated)
EC2 Instance C  →  EBS Volume C  (separate, isolated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For single-instance workloads — databases, operating system volumes, single-server applications — EBS is excellent. The moment you need shared storage across multiple servers, you hit a wall.&lt;/p&gt;




&lt;h2&gt;
  
  
  Amazon EFS — The Shared Network Drive
&lt;/h2&gt;

&lt;p&gt;EFS (Elastic File System) is AWS's managed Network File System (NFS). Think of it as a shared drive that any number of EC2 instances, containers, or Lambda functions can mount simultaneously and use like a local file system.&lt;/p&gt;

&lt;h3&gt;
  
  
  What EFS solves
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concurrent access.&lt;/strong&gt; Thousands of compute resources can mount and use the same EFS volume at the same time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full POSIX semantics.&lt;/strong&gt; Open files, edit bytes in-place, file locking, directory operations — everything works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scales automatically.&lt;/strong&gt; The file system grows and shrinks as you add or remove files. No capacity planning required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-millisecond latency&lt;/strong&gt; on Standard tier.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EC2 Instance A  ──┐
EC2 Instance B  ──┤──→  EFS Volume  (all share the same files)
EC2 Instance C  ──┘
Lambda Function ──┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F368aftu96o0epx2bty0g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F368aftu96o0epx2bty0g.jpg" alt=" " width="800" height="588"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Where EFS falls short
&lt;/h3&gt;

&lt;p&gt;The pricing model. &lt;strong&gt;EFS charges you for every gigabyte stored, whether you touched it this month or not.&lt;/strong&gt; Standard tier is $0.30/GB-month — roughly 13x more expensive than S3 Standard per gigabyte.&lt;/p&gt;

&lt;p&gt;This is fine when your data is "hot" (actively accessed). It's painful when you have petabytes of data where only a fraction is actively used at any time. You end up paying full file system prices for data that's sitting idle.&lt;/p&gt;

&lt;p&gt;And the other problem: &lt;strong&gt;EFS has zero native integration with S3.&lt;/strong&gt; They're completely separate systems. Your data lake is in S3. Your compute needs EFS. So you write sync scripts to copy data back and forth — and now you have two copies of everything, two storage bills, and a manual process that breaks at the worst possible times.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Old Workflow Pain (The Problem All of This Creates)
&lt;/h2&gt;

&lt;p&gt;Before S3 Files, a typical ML or data engineering team's workflow looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S3 Data Lake
    ↓  (manual copy — takes time, costs money)
EFS Volume
    ↓  (mount on EC2)
EC2 Training Job
    ↓  (output back to EFS)
    ↓  (another manual copy)
S3 Data Lake  ← results stored here for analytics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every arrow in that diagram is a point of failure. Every copy step is a delay, a cost, and a potential for the two copies to drift out of sync. Engineers were spending real engineering hours maintaining these sync pipelines — hours that weren't building anything valuable.&lt;/p&gt;

&lt;p&gt;This is the problem that s3fs tried to solve, years before AWS had an official answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  s3fs-fuse — The Community's Workaround
&lt;/h2&gt;

&lt;p&gt;If you've been working with AWS for a few years, you've probably encountered &lt;code&gt;s3fs-fuse&lt;/code&gt;. It's an open-source FUSE (Filesystem in Userspace) tool that lets you mount an S3 bucket as a local directory on Linux, macOS, or FreeBSD.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;s3fs

&lt;span class="c"&gt;# Configure credentials&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ACCESS_KEY_ID:SECRET_ACCESS_KEY"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/.passwd-s3fs
&lt;span class="nb"&gt;chmod &lt;/span&gt;600 ~/.passwd-s3fs

&lt;span class="c"&gt;# Mount your bucket&lt;/span&gt;
s3fs my-bucket /mnt/s3-data &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;passwd_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;~/.passwd-s3fs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, you can run &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;cp&lt;/code&gt;, &lt;code&gt;cat&lt;/code&gt; — your S3 bucket looks like a local folder. For a quick demo or a simple use case, it feels magical.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's actually happening under the hood
&lt;/h3&gt;

&lt;p&gt;Here's the thing nobody tells you upfront: s3fs isn't &lt;em&gt;really&lt;/em&gt; giving you file system access to S3. It's translating file commands into S3 API calls — and the translation has serious limitations.&lt;/p&gt;

&lt;p&gt;When you "edit" a file through s3fs, this is what actually happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: nano myfile.txt  (make a small change, save)
     ↓
s3fs: GET entire object from S3 → download to local temp cache
s3fs: You edit the local temp copy
s3fs: On file close → PUT entire object back to S3 (full re-upload)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change one character in a 10GB file? s3fs downloads all 10GB, makes the change, and uploads all 10GB again. Every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real limitations you need to know
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No file locking.&lt;/strong&gt; If two processes try to write to the same file through s3fs at the same time, you get data corruption. Not an error message — silent data corruption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No atomic renames.&lt;/strong&gt; Renaming a file in s3fs copies it to a new key and deletes the old one. Any application that relies on atomic renames (which includes most databases and many log processors) will break.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slow directory listings.&lt;/strong&gt; Every &lt;code&gt;ls&lt;/code&gt; is a &lt;code&gt;ListObjects&lt;/code&gt; API call to S3. On a bucket with millions of objects, this is painfully slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No hard links or symbolic links.&lt;/strong&gt; S3 simply doesn't support them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Operation          | What s3fs does              | Problem
-------------------|-----------------------------|-----------------------
Read file          | GET entire object           | Slow for large files
Edit file          | Download → edit → full PUT  | Expensive re-upload
Append to file     | Rewrite entire object       | Very expensive
Rename file        | Copy + Delete               | Not atomic
File lock          | Not supported               | Data corruption risk
List directory     | ListObjects API call        | Slow on large buckets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;s3fs works well for lightweight, read-heavy, single-process use cases. But the moment you need multi-process access, in-place edits, or production reliability — it starts breaking down. The community built it because AWS didn't have a better answer. Eventually, AWS tried building their own version.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mountpoint for S3 — AWS's Open-Source Attempt (2023)
&lt;/h2&gt;

&lt;p&gt;In 2023, AWS released &lt;strong&gt;Mountpoint for S3&lt;/strong&gt;, their own open-source FUSE client. It was faster than s3fs-fuse and better optimised for cloud-native read-heavy workloads.&lt;/p&gt;

&lt;p&gt;But it still couldn't do in-place edits, directory renames, or file locking. It was better than s3fs-fuse, but it still hit the same fundamental ceiling: you can't make S3's API behave like a real file system by pretending.&lt;/p&gt;

&lt;p&gt;AWS knew this. Internally, they'd been trying to solve it properly for years.&lt;/p&gt;




&lt;h2&gt;
  
  
  Amazon S3 Files — The Real Solution (April 2026)
&lt;/h2&gt;

&lt;p&gt;On April 7, 2026, AWS launched &lt;strong&gt;S3 Files&lt;/strong&gt; — and it's the most significant S3 update since the service launched.&lt;/p&gt;

&lt;p&gt;The internal project was even called "EFS3" at one point. One engineer on the team described the design process as &lt;em&gt;"a battle of unpalatable compromises."&lt;/em&gt; Getting object storage and file system semantics to truly coexist is genuinely hard engineering. Every design decision forced a tradeoff where either the file presentation or the object presentation had to give something up.&lt;/p&gt;

&lt;p&gt;What they landed on is clever: instead of trying to make the S3 API &lt;em&gt;behave&lt;/em&gt; like a file system (which is what s3fs does), they did the opposite — they took a real, production-grade file system (EFS) and connected it directly to S3 storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  How S3 Files actually works
&lt;/h3&gt;

&lt;p&gt;S3 Files uses a &lt;strong&gt;two-tier architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1 — EFS Cache Layer (hot data)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores your active working set: recently written files, recently read files, metadata&lt;/li&gt;
&lt;li&gt;Delivers ~1ms latency&lt;/li&gt;
&lt;li&gt;Serves small files (under 128KB by default) entirely from cache&lt;/li&gt;
&lt;li&gt;Handles all NFS file operations — open, read, write, rename, lock&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tier 2 — S3 Bucket (your full dataset)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Holds your complete data at normal S3 prices (~$0.023/GB)&lt;/li&gt;
&lt;li&gt;Large reads (1MB+) bypass the cache entirely and stream directly from S3 for free&lt;/li&gt;
&lt;li&gt;Changes made through the file system sync back to S3 automatically within minutes
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Application
      ↓  (NFS mount — standard Linux file operations)
EFS Cache Layer  ←→  Smart Router
      ↓                    ↓
   Hot data            Cold/large data
   (~1ms)              (streams from S3, free)
      ↓                    ↓
      └────────────────────┘
                  ↓
            S3 Bucket
       (your data, always here)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;strong&gt;your data never leaves S3.&lt;/strong&gt; The EFS cache is just a smart caching layer on top. You're not maintaining two copies — you have one copy in S3, accessible via both the S3 API and the file system mount simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgk0a728p2arun7vntopk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgk0a728p2arun7vntopk.png" alt=" " width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  OLD way to New way
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F662i5jvty3f4qfmx1swi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F662i5jvty3f4qfmx1swi.png" alt=" " width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting started in 3 steps
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Create an S3 file system&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the AWS Console → S3 → File Systems → Create file system. Enter your bucket name, done.&lt;/p&gt;

&lt;p&gt;Or via CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3api create-file-system &lt;span class="nt"&gt;--bucket&lt;/span&gt; my-bucket
aws s3api create-mount-target &lt;span class="nt"&gt;--file-system-id&lt;/span&gt; fs-xxxx &lt;span class="nt"&gt;--subnet-id&lt;/span&gt; subnet-xxxx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Mount it on your EC2 instance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Make sure the &lt;code&gt;amazon-efs-utils&lt;/code&gt; package is installed (preinstalled on AWS AMIs), then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; /mnt/s3files
&lt;span class="nb"&gt;sudo &lt;/span&gt;mount &lt;span class="nt"&gt;-t&lt;/span&gt; s3files fs-0aa860d05df9afdfe:/ /mnt/s3files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Use it like any local directory&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a file&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Hello S3 Files"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /mnt/s3files/hello.txt

&lt;span class="c"&gt;# Edit it in place&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"New line added"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /mnt/s3files/hello.txt

&lt;span class="c"&gt;# List files&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /mnt/s3files/

&lt;span class="c"&gt;# The same data is accessible via S3 API too&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://my-bucket/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Changes you make through the file system mount appear in S3 within minutes. Changes made directly to the S3 bucket appear in the file system within seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security — what you need to know
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;IAM integration for access control at both file system and object level&lt;/li&gt;
&lt;li&gt;Data encrypted in transit using TLS 1.3&lt;/li&gt;
&lt;li&gt;Data encrypted at rest using SSE-S3 (or KMS if you prefer customer-managed keys)&lt;/li&gt;
&lt;li&gt;POSIX permissions (UID/GID) stored as S3 object metadata&lt;/li&gt;
&lt;li&gt;Monitor via CloudWatch metrics and CloudTrail logs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pricing — the part that actually makes sense
&lt;/h3&gt;

&lt;p&gt;S3 Files charges EFS-level rates, but &lt;strong&gt;only on the fraction of data you're actively working with&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What you pay for&lt;/th&gt;
&lt;th&gt;Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High-performance storage (hot data)&lt;/td&gt;
&lt;td&gt;$0.30/GB-month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reads (small files served from cache)&lt;/td&gt;
&lt;td&gt;$0.03/GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writes&lt;/td&gt;
&lt;td&gt;$0.06/GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everything else in your S3 bucket&lt;/td&gt;
&lt;td&gt;Standard S3 rates (~$0.023/GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you have a 100TB dataset but only 1TB is actively used at any time — you pay EFS rates on 1TB and S3 rates on the other 99TB. AWS claims up to 90% cost savings compared to the old pattern of cycling data between S3 and a dedicated EFS volume.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It All Together — Which Service Should You Use?
&lt;/h2&gt;

&lt;p&gt;Here's the honest answer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use this&lt;/th&gt;
&lt;th&gt;When you need&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bulk storage, backups, data lakes, analytics, static assets, anything accessed via API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EBS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OS volumes, databases, single-instance high-performance storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EFS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shared file system for legacy NAS migration, on-premises workloads moving to cloud, apps that need pure NFS without S3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3 Files&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ML pipelines, agentic AI workflows, data engineering, any workload where both S3 API and file system access are needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;s3fs-fuse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quick prototypes, read-heavy single-process scripts, legacy apps where you can't change the architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The quick comparison
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy897f8eanwniy70iw975.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy897f8eanwniy70iw975.png" alt=" " width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters for ML and AI Workloads
&lt;/h2&gt;

&lt;p&gt;If you're building machine learning pipelines or agentic AI systems, S3 Files is worth paying close attention to.&lt;/p&gt;

&lt;p&gt;The old workflow was: data lives in S3 → copy to EFS before training → run training job → copy results back to S3. For large datasets, that copy step alone could take hours. You were also paying double storage costs during the transition.&lt;/p&gt;

&lt;p&gt;With S3 Files, your training job mounts the S3 bucket directly. The EFS cache warms up as your training reads data. No copy step. No sync script. No duplicate storage.&lt;/p&gt;

&lt;p&gt;For agentic AI systems specifically — where multiple agents need to coordinate through shared files, read from each other's outputs, maintain shared state — S3 Files provides exactly the concurrent NFS access with close-to-open consistency that these workloads need. Standard Python file operations, standard shell tools, all working against data that lives in S3.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Short Version
&lt;/h2&gt;

&lt;p&gt;For a decade, AWS storage was a choice: pay S3 prices and lose file system semantics, or pay EFS prices and lose S3 integration. Teams wrote sync scripts, maintained duplicate data, and spent engineering time on storage plumbing instead of actual product work.&lt;/p&gt;

&lt;p&gt;s3fs-fuse was the community's best attempt at a workaround — and it worked, up to a point. But it was always emulating file system behavior on top of an API that wasn't designed for it.&lt;/p&gt;

&lt;p&gt;S3 Files is the first time AWS has genuinely solved this at the right layer. Real NFS semantics, real S3 storage, real production reliability. One bucket, two protocols, no compromises.&lt;/p&gt;

&lt;p&gt;If you've ever maintained a sync script between your data lake and your compute layer — you know exactly what problem this solves. And you know exactly how good it feels to delete that script.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/s3/features/files/" rel="noopener noreferrer"&gt;Amazon S3 Files product page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/launching-s3-files-making-s3-buckets-accessible-as-file-systems/" rel="noopener noreferrer"&gt;AWS Blog: Launching S3 Files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" rel="noopener noreferrer"&gt;S3 Files documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/s3fs-fuse/s3fs-fuse" rel="noopener noreferrer"&gt;s3fs-fuse on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/s3/pricing/" rel="noopener noreferrer"&gt;Amazon S3 pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/efs/pricing/" rel="noopener noreferrer"&gt;Amazon EFS pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=zb8TdNJhZCk" rel="noopener noreferrer"&gt;Intro to S3 Files by Darko Mesaros&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Published April 2026. All pricing figures reflect us-east-1 as of the time of writing.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this helped you, drop a reaction or leave a comment — curious what storage patterns others are running into in the wild.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>machinelearning</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
