DEV Community

Pendela BhargavaSai
Pendela BhargavaSai

Posted on

K3s vs Kubernetes: A Deep Dive into Control Plane Architecture

Not just "what's different" — but WHY it's different, HOW each component works under the hood, and WHEN to choose which.


🧠 Why This Post Exists

Every "K3s vs K8s" article you've read probably gave you a table with checkmarks and said "K3s is lightweight." That's true — but why is it lightweight? What did Rancher actually strip out, merge, or replace? What are the architectural trade-offs you inherit when you deploy K3s in production?

This post tears open both control planes component by component. We'll go deep into what each piece actually does at the byte level, then see how K3s reimagines it.


🏗️ The Kubernetes Control Plane: A Ground-Up Look

Before comparing, let's build a mental model of each standard Kubernetes control plane component. Not the 30-second version — the real one.


1. 🔵 kube-apiserver — The Brain's Frontal Lobe

What It Actually Does

The API server is not just a REST endpoint. It is the only component in Kubernetes that talks directly to etcd. Every other component — scheduler, controller-manager, kubelet — communicates exclusively through the API server. This is a deliberate architectural decision called the hub-and-spoke pattern.

When you run kubectl apply -f deployment.yaml, here's what actually happens:

kubectl → HTTPS → kube-apiserver
                    │
                    ├── 1. Authentication (Who are you?)
                    │       └── x509 certs / Bearer tokens / OIDC /Webhook
                    │
                    ├── 2. Authorization (Can you do this?)
                    │       └── RBAC / ABAC / Node / Webhook evaluators
                    │
                    ├── 3. Admission Control (Should this be allowed?)
                    │       ├── Mutating Webhooks  ← can MODIFY the object
                    │       └── Validating Webhooks ← can REJECT theobject
                    │
                    ├── 4. Schema Validation
                    │       └── OpenAPI v3 schema enforcement per GVK
                    │
                    └── 5. Persist to etcd
                            └── /registry/deployments/default/my-app
Enter fullscreen mode Exit fullscreen mode

The Watch Mechanism — The Heartbeat of Kubernetes

The API server implements a long-poll watch mechanism over HTTP/2. This is what makes Kubernetes reactive rather than polling-based.

# You can see this yourself
kubectl get pods --watch -v=9
# Watch the raw HTTP stream — it's a chunked HTTP response that stays open
Enter fullscreen mode Exit fullscreen mode

Every controller, scheduler, and kubelet maintains a persistent informer — a cached watch stream from the API server. The informer pattern:

  1. Does an initial LIST to populate local cache
  2. Starts a WATCH from the resource version of that LIST
  3. On disconnect, re-watches from the last known resourceVersion
  4. The API server buffers events in a watchCache in memory (configurable with --watch-cache-sizes)
                    ┌─────────────────────────┐
                    │      kube-apiserver     │
                    │                         │
                    │  ┌─────────────────┐    │
                    │  │   etcd watch    │    │
                    │  └────────┬────────┘    │
                    │           │             │
                    │  ┌────────▼────────┐    │
                    │  │   watchCache    │    │  ← In-memory ring buffer
                    │  └────────┬────────┘    │
                    │           │             │
                    └───────────┼─────────────┘
                                │
              ┌─────────────────┼──────────────────┐
              │                 │                  │
         ┌────▼────┐      ┌─────▼─────┐     ┌─────▼─────┐
         │Scheduler│      │Controller │     │  kubelet  │
         │Informer │      │  Informer │     │  Informer │
         └─────────┘      └───────────┘     └───────────┘
Enter fullscreen mode Exit fullscreen mode

Aggregation Layer & CRDs

The API server can extend itself via two mechanisms:

  • CRDs (Custom Resource Definitions): Schema is stored in etcd, handled natively by the API server itself
  • Aggregation Layer (AA): Proxy traffic to an external API server (used by metrics-server, KEDA, etc.)
# CRD — API server owns the storage
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com

# AA — API server proxies to external server
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
Enter fullscreen mode Exit fullscreen mode

Production Tuning Knobs

kube-apiserver \
  --max-requests-inflight=400 \          # Max non-mutating concurrent requests
  --max-mutating-requests-inflight=200 \ # Max mutating concurrent requests
  --watch-cache-sizes=pods#1000 \        # Per-resource watch cache sizes
  --enable-admission-plugins=NodeRestriction,PodSecurity \
  --audit-log-path=/var/log/audit.log \
  --audit-policy-file=/etc/k8s/audit-policy.yaml
Enter fullscreen mode Exit fullscreen mode

2. 🟣 etcd — The Distributed Brain's Memory

What etcd Actually Is

etcd is a distributed key-value store built on the Raft consensus algorithm. It's not a database in the traditional sense — it's a fault-tolerant state machine where every write must be agreed upon by a quorum of nodes before it's committed.

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   etcd-0    │     │   etcd-1    │     │   etcd-2    │
│  (LEADER)   │◄────│  (FOLLOWER) │     │  (FOLLOWER) │
│             │────►│             │     │             │
└──────┬──────┘     └─────────────┘     └──────▲──────┘
       │                                       │
       └───────────────────────────────────────┘
                    Raft Heartbeats
Enter fullscreen mode Exit fullscreen mode

Raft in Plain English

  1. Leader Election: One node becomes leader. It sends heartbeats. If 2+ nodes don't hear a heartbeat, they call an election.
  2. Log Replication: Every write goes to the leader. Leader appends it to its log and replicates it to followers. Once a majority acknowledges, the write is committed.
  3. Quorum Math: (n/2) + 1 nodes must agree. For 3 nodes: 2. For 5 nodes: 3.
etcd write path:
Client → Leader APPEND entry to log
         Leader SEND AppendEntries RPC to all followers
         Followers ACKNOWLEDGE
         Leader COMMITS when the majority ack
         Leader RESPONDS to client
         Leader NOTIFIES followers of the commit
Enter fullscreen mode Exit fullscreen mode

How Kubernetes Data Lives in etcd

All Kubernetes objects are stored under /registry/ with the structure:

/registry/{resource-type}/{namespace}/{name}

Examples:
/registry/pods/default/nginx-7d8b9f-xyz
/registry/deployments/kube-system/coredns
/registry/secrets/default/my-secret
/registry/events/default/pod-scheduled-event
Enter fullscreen mode Exit fullscreen mode

The data is serialized using protobuf (not JSON!) for efficiency. You can inspect it:

# Decode an etcd value
etcdctl get /registry/pods/default/nginx \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  | auger decode  # github.com/jpbetz/auger
Enter fullscreen mode Exit fullscreen mode

MVCC — Multi-Version Concurrency Control

etcd uses MVCC, meaning it keeps multiple historical versions of every key. Each write increments a global revision counter. The API server uses this resourceVersion for watch ordering and conflict detection.

# See the revision
etcdctl get /registry/pods/default/nginx -w json | jq .header.revision
Enter fullscreen mode Exit fullscreen mode

When etcd's keyspace grows too large (default compaction at 2GB), older revisions are compacted — deleted. This is why very old watches can fail with "compacted" errors.

etcd Failure Modes You Must Know

Scenario What Happens
1 node fails (3-node cluster) Cluster continues. Writes still work.
2 nodes fail (3-node cluster) CLUSTER STOPS ACCEPTING WRITES. API server returns 503.
Leader fails Election happens. ~150-300ms downtime while new leader is elected.
Network partition Minority partition goes read-only. Majority continues.
etcd OOM API server loses state store. Catastrophic.

⚠️ This is the critical difference with K3s. If you're running K3s with embedded SQLite, you get zero HA for the datastore by default.


3. 🟡 kube-scheduler — The CPU-Time Auctioneer

What It Actually Does

The scheduler watches for Pods in Pending state (no nodeName assigned) and decides which Node they should run on. It does NOT place the pod — it simply writes the chosen nodeName to the Pod spec in etcd via the API server. The kubelet on that node then sees its name and starts the pod.

Pod created (nodeName: "")  →  Scheduler sees it via watch
                            →  Runs filtering + scoring
                            →  Writes nodeName to Pod
                            →  kubelet on that node sees the Pod
                            →  kubelet pulls image + starts container
Enter fullscreen mode Exit fullscreen mode

The Scheduling Framework — Two-Phase Deep Dive

Scheduling happens in two phases: Filtering and Scoring.

Phase 1: Filtering (Hard Constraints — binary pass/fail)

All Nodes
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  Filter Plugins (run in parallel, any fail = remove) │
│                                                     │
│  • NodeUnschedulable  — node.spec.unschedulable?    │
│  • NodeAffinity       — matchLabels on node?        │
│  • TaintToleration    — pod tolerates node taints?  │
│  • PodTopologySpread  — spread constraints met?     │
│  • VolumeBinding      — PVC can bind to this node?  │
│  • NodeResourcesFit   — enough CPU/mem/GPU?         │
│  • NodePorts          — hostPort conflicts?         │
└─────────────────────────────────────────────────────┘
    │
    ▼
Feasible Nodes (subset)
Enter fullscreen mode Exit fullscreen mode

Phase 2: Scoring (Soft Preferences — 0-100 score)

Feasible Nodes
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  Score Plugins (weighted sum)                        │
│                                                     │
│  • LeastAllocated       — prefer less loaded nodes  │
│  • NodeAffinity         — preferred affinities      │
│  • InterPodAffinity     — co-locate or spread pods  │
│  • ImageLocality        — prefer nodes with image   │
│  • TaintToleration      — fewer preferred taints    │
│  • TopologySpreadConstraint — balance spread        │
└─────────────────────────────────────────────────────┘
    │
    ▼
Highest Score Node → Binding (nodeName written)
Enter fullscreen mode Exit fullscreen mode

Preemption — What Happens When No Node Passes Filtering

If no node can fit the Pod, the scheduler checks if lower priority pods can be evicted to make room:

  1. Find nodes where evicting lower-priority pods creates enough room
  2. Pick the node that requires evicting the fewest/lowest-priority pods
  3. Send eviction requests → evicted pods are deleted → pending pod is scheduled
# Priority classes matter here
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
---
# System-critical pods have value: 2000001000
# They will preempt your workloads if nodes are tight
Enter fullscreen mode Exit fullscreen mode

The Binding Cache — Optimistic Concurrency

The scheduler maintains an assumed pod cache. After scoring but before the API server confirms the bind, the scheduler optimistically assumes the pod is placed and accounts for that node's capacity. This prevents scheduling thrash in high-throughput clusters.


4. 🟢 kube-controller-manager — The Reconciliation Engine

What It Actually Is

The controller manager is a single binary that runs ~30+ independent control loops as goroutines. Each controller watches specific resource types and reconciles desired state vs actual state.

# The reconciliation loop in pseudocode (every controller)
for {
    desired := get_desired_state_from_api_server()
    actual  := get_actual_state_from_world()

    if desired != actual {
        take_action_to_make_actual_match_desired()
    }

    sleep(resync_period)  // default: 10min
}
Enter fullscreen mode Exit fullscreen mode

Key Controllers and What They Actually Do

ReplicaSet Controller

Watches: ReplicaSets, Pods
Loop:
  current_pods = list pods with matching selector
  delta = replicaset.spec.replicas - len(current_pods)
  if delta > 0: create `delta` pods
  if delta < 0: delete abs(delta) pods (by priority: unscheduled first)
Enter fullscreen mode Exit fullscreen mode

Deployment Controller

Watches: Deployments, ReplicaSets
Loop:
  desired_rs = compute_hash(deployment.spec.template)
  if no RS with that hash: create new RS
  scale up new RS, scale down old RS (by strategy: RollingUpdate or Recreate)
  update deployment.status (readyReplicas, conditions, etc.)
Enter fullscreen mode Exit fullscreen mode

Node Controller — This one is critical to understand

Watches: Nodes
Loop:
  for each node:
    if no heartbeat for node-monitor-grace-period (default 40s):
      set NodeReady=Unknown
    if no heartbeat for pod-eviction-timeout (default 5min):
      taint node with node.kubernetes.io/unreachable:NoExecute
      (this triggers pod eviction by the taint manager)
Enter fullscreen mode Exit fullscreen mode

EndpointSlice Controller — How Services actually work

Watches: Services, Pods
Loop:
  for each service:
    pods = list pods matching service.spec.selector where pod.status.ready=true
    build EndpointSlices (groups of 100 endpoints each)
    write EndpointSlices to API server
    (kube-proxy watches EndpointSlices and updates iptables/ipvs rules)
Enter fullscreen mode Exit fullscreen mode

Informer + WorkQueue Architecture

Every controller is built on the same pattern:

API Server Watch
      │
      ▼
   Informer
   (local cache)
      │
      ▼  (on change event)
  WorkQueue  ←──── rate-limited, deduplicated
      │
      ▼
  Worker goroutines (usually 1-5)
      │
      ▼
  Reconcile function
      │
      ├── Success → remove from queue
      └── Failure → re-queue with exponential backoff
Enter fullscreen mode Exit fullscreen mode

This pattern means controllers are eventually consistent — they don't act on every single event, they converge to the desired state over time.


5. 🔴 cloud-controller-manager — The Cloud API Bridge

What It Actually Does

The CCM was extracted from kube-controller-manager in Kubernetes 1.11 specifically to decouple Kubernetes from cloud provider APIs. It runs cloud-specific control loops:

Node Controller (cloud variant)

On new Node joining:
  1. Fetch instance metadata from cloud API
     (AWS EC2 DescribeInstances / GCP ComputeInstances)
  2. Apply cloud provider labels:
     - topology.kubernetes.io/zone = us-east-1a
     - node.kubernetes.io/instance-type = m5.xlarge
  3. Set node addresses (internal/external IP from cloud metadata)
  4. Check if instance still exists periodically
     → If terminated in cloud: delete the Node object
Enter fullscreen mode Exit fullscreen mode

Route Controller (AWS/GCP specific)

For each node:
  ensure cloud routing table has route:
  pod-cidr (e.g., 10.244.1.0/24) → node instance-id

This is how pod-to-pod routing works across nodes
WITHOUT an overlay network on supported clouds
Enter fullscreen mode Exit fullscreen mode

Service Controller — The LoadBalancer Magic

Watch Services with type=LoadBalancer:
  on CREATE: call cloud API → create load balancer
             update service.status.loadBalancer.ingress with external IP
  on UPDATE: update LB listener rules / health checks
  on DELETE: delete cloud load balancer
Enter fullscreen mode Exit fullscreen mode

This is why kubectl get svc shows <pending> for LoadBalancer services until the cloud LB is provisioned.


⚡ The K3s Control Plane: Architectural Reimagination

Now let's look at what K3s does differently — not just "it's smaller" but architecturally why.


K3s Single Binary Philosophy

K3s ships as a single ~70MB binary (k3s) that embeds:

k3s binary
├── k3s-server (control plane)
│   ├── kube-apiserver
│   ├── kube-controller-manager
│   ├── kube-scheduler
│   ├── kubelet
│   ├── kube-proxy
│   ├── embedded containerd
│   ├── embedded CoreDNS
│   ├── embedded Flannel (CNI)
│   ├── embedded Traefik (ingress)
│   ├── embedded ServiceLB (load balancer)
│   └── embedded local-path-provisioner (storage)
└── k3s-agent (worker)
    ├── kubelet
    ├── kube-proxy
    └── embedded containerd
Enter fullscreen mode Exit fullscreen mode

This is not containerized — these are linked as Go packages into a single binary. Startup goes from ~3 minutes (typical K8s) to under 30 seconds.


1. K3s API Server — Same Core, Slimmer Defaults

The K3s API server is still the upstream kube-apiserver — but K3s wraps it with:

Removed/Disabled by Default:

  • Alpha feature gates are disabled
  • Cloud provider plugins: --cloud-provider=external not set (no CCM)
  • Several admission plugins that assume cloud infra

The K3s Tunnel Proxy — Replacing the CCM Node Controller

K3s introduces a reverse tunnel from agent → server. In standard K8s, the API server connects to the kubelet for exec/logs/port-forward. In K3s:

Standard K8s:
  kube-apiserver → kubelet:10250  (API server initiates)
  Requires API server to reach all nodes directly

K3s:
  k3s-agent → k3s-server:6443 (agent initiates)
  ┌────────────────────────────────────────────────┐
  │  WebSocket tunnel maintained by agent          │
  │  All kubelet traffic flows THROUGH this tunnel │
  └────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

This is why K3s works behind NAT without special networking — agents reach out, not the server. This is a fundamental architectural shift that enables edge/IoT deployments.


2. SQLite / Kine — The etcd Abstraction Layer

This is the most significant architectural difference.

K3s introduces Kine (Kubernetes Is Not Etcd) — a shim that translates etcd's gRPC API into SQL queries.

kube-apiserver
      │
      │  etcd gRPC v3 protocol (ListWatch, Txn, etc.)
      ▼
┌──────────────┐
│     Kine     │  ← translation layer
│  (etcd shim) │
└──────┬───────┘
       │  SQL queries
       ▼
┌──────────────┐
│   SQLite /   │  ← actual datastore
│  PostgreSQL  │
│    MySQL     │
│   DQLite     │
└──────────────┘
Enter fullscreen mode Exit fullscreen mode

How Kine Implements the etcd Watch API:

etcd's watch is event-driven via gRPC streams. SQL databases don't natively support this. Kine implements it via:

-- Kine's core table
CREATE TABLE kine (
  id      INTEGER PRIMARY KEY AUTOINCREMENT,  -- acts as etcd revision
  name    TEXT,                               -- the key (/registry/pods/...)
  created INTEGER,
  deleted INTEGER,
  create_revision INTEGER,
  prev_revision   INTEGER,
  lease   INTEGER,
  value   BLOB,                              -- the protobuf-encoded object
  old_value BLOB
);

-- Watch is implemented as polling:
-- SELECT * FROM kine WHERE id > last_seen_id ORDER BY id
-- Run every ~100ms — NOT event-driven like real etcd
Enter fullscreen mode Exit fullscreen mode

The Implications:

  • For small clusters: unnoticeable
  • For large clusters: polling adds latency to watch events
  • SQLite: single-writer, no HA (single node only)
  • PostgreSQL/MySQL with Kine: HA possible but watch latency higher than etcd

DQLite — Embedded Distributed SQLite (Experimental)

For HA without an external DB, K3s can use DQLite — a distributed SQLite implementation using Raft (similar to etcd but built on SQLite). It's embedded in the binary and doesn't require an external DB.

# K3s with embedded HA using DQLite
k3s server --cluster-init   # First server (bootstrap)
k3s server --server https://first-server:6443 --token <token>  # Join as HA peer
Enter fullscreen mode Exit fullscreen mode

3. K3s Controller Manager — Pruned and Extended

K3s runs the upstream kube-controller-manager with several modifications:

Removed Controllers:

  • cloud-node controller (no cloud metadata fetching)
  • cloud-node-lifecycle controller
  • route controller (no cloud routes)
  • service controller (replaced by ServiceLB)

Added: ServiceLB (a.k.a. Klipper LoadBalancer)

Instead of calling a cloud API to provision a load balancer, K3s runs a DaemonSet-based solution:

Service type=LoadBalancer created
        │
        ▼
ServiceLB Controller watches for it
        │
        ▼
Creates a DaemonSet:
  - Runs a pod on every node with hostPort matching service ports
  - The pod does iptables DNAT → service ClusterIP
        │
        ▼
Every node's IP becomes a valid entry point
(no external LB needed)
Enter fullscreen mode Exit fullscreen mode
# What ServiceLB actually deploys under the hood
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: svclb-my-service
  namespace: kube-system
spec:
  template:
    spec:
      hostNetwork: true
      containers:
      - name: lb-port-80
        image: rancher/klipper-lb:latest
        ports:
        - hostPort: 80       # binds on every node
          containerPort: 80
        env:
        - name: SRC_PORT
          value: "80"
        - name: DEST_PROTO
          value: TCP
        - name: DEST_IP
          value: "10.96.100.50"  # ClusterIP
        - name: DEST_PORT
          value: "80"
Enter fullscreen mode Exit fullscreen mode

4. K3s Scheduler — Unchanged but Co-located

The scheduler in K3s is the unmodified upstream kube-scheduler. However, it runs as a goroutine inside the k3s-server binary rather than as a separate process.

The key difference is operational:

  • In K8s: scheduler can be independently scaled, upgraded, or replaced (e.g., with Volcano, Yunikorn)
  • In K3s: scheduler is embedded — replacing it requires rebuilding or running an external scheduler with leader election disabled on the built-in one

5. The Flannel CNI — Embedded Networking

Standard K8s requires you to install a CNI (Calico, Cilium, Flannel, Weave) separately. K3s embeds Flannel with VXLAN as the default backend.

Pod on Node 1 (10.42.1.5) → Pod on Node 2 (10.42.2.7)

Standard K8s + Calico:
  10.42.1.5 → BGP route → 10.42.2.7 (no encapsulation on supported networks)

K3s + Flannel VXLAN:
  10.42.1.5 → VXLAN encapsulate → eth0:8472 → Node 2 → decapsulate → 10.42.2.7
  (works everywhere, slight overhead from encapsulation)
Enter fullscreen mode Exit fullscreen mode

K3s also supports swapping Flannel for Cilium or Calico if you disable the built-in:

k3s server --flannel-backend=none --disable-network-policy
# Then install Cilium/Calico manually
Enter fullscreen mode Exit fullscreen mode

📊 Side-by-Side Deep Comparison

Dimension Standard Kubernetes K3s
Deployment model Separate processes (+ etcd cluster) Single binary, all-in-one
API server Full upstream, all features Full upstream, conservative defaults
Datastore etcd (Raft, event-driven watch) SQLite/Kine (SQL polling) or embedded DQLite
Watch latency ~10ms (event-driven) ~100ms (polling on SQL backends)
HA datastore etcd cluster (3/5 nodes) External DB + Kine OR embedded DQLite
Control plane HA Multiple API server replicas Multiple k3s-server nodes possible
Cloud integration cloud-controller-manager No CCM, uses ServiceLB + node-ip flags
LoadBalancer Cloud LB (AWS ELB, GCP GLB) ServiceLB DaemonSet (hostPort)
Ingress Bring your own (nginx, traefik) Traefik v2 embedded
CNI Bring your own Flannel (VXLAN) embedded
DNS Bring your own CoreDNS CoreDNS embedded
Storage Bring your own CSI local-path-provisioner embedded
Kubelet location Separate binary on worker Embedded in k3s binary
API server → kubelet Direct connection (port 10250) Reverse WebSocket tunnel
Memory (control plane) ~2GB+ (separate processes) ~512MB (single process)
Startup time 2-5 minutes 20-30 seconds
Alpha feature gates Available Disabled by default
Admission webhooks Full support Full support
CRDs Full support Full support
RBAC Full support Full support
Audit logging Configurable Configurable
Scheduler extensibility Scheduler profiles, plugins Embedded; replace with external
Controller extensibility Separate binary, hot-swap Embedded goroutine
Upgrades Independent component upgrades Single binary upgrade
Edge/NAT traversal Requires direct reachability Native via reverse tunnel
ARM support Separate builds Native multi-arch in single release

🔑 When to Choose What

Choose Standard Kubernetes When:

✅ 100+ node clusters
✅ Financial / regulated workloads requiring etcd for compliance
✅ You need independent control plane component upgrades
✅ You're using cloud-managed control planes (EKS, GKE, AKS)
✅ You need custom scheduler profiles (ML batch, GPU scheduling)
✅ Multi-tenancy with strong isolation requirements
✅ You need external etcd for ultra-high availability
✅ Team has K8s expertise and infra budget
Enter fullscreen mode Exit fullscreen mode

Choose K3s When:

✅ Edge computing (retail, industrial, remote sites)
✅ IoT / ARM devices (Raspberry Pi clusters)
✅ CI/CD ephemeral clusters (fast startup is critical)
✅ Development environments (minimal resource usage)
✅ Single-node homelab or small on-prem clusters
✅ Clusters behind NAT (reverse tunnel is a killer feature)
✅ Teams that want "it just works" with less Ops overhead
✅ Bare metal without a cloud provider
✅ Air-gapped environments (single binary, easy to ship)
Enter fullscreen mode Exit fullscreen mode

🔭 The Architecture Decision Tree

Do you need >50 nodes?
├── YES → Standard K8s (EKS/GKE/AKS or kubeadm)
└── NO
    ├── Are you on edge/IoT/ARM?
    │   └── YES → K3s (purpose-built for this)
    ├── Do you need cloud LoadBalancer integration?
    │   └── YES → Standard K8s with CCM
    ├── Is startup speed critical? (CI/CD, dev envs)
    │   └── YES → K3s
    ├── Do you need etcd for compliance/audit?
    │   └── YES → Standard K8s
    └── Default recommendation for <20 nodes on-prem?
        └── K3s (less to manage, same K8s API)
Enter fullscreen mode Exit fullscreen mode

🎯 Closing Thoughts

K3s isn't "Kubernetes with stuff removed." It's a purpose-built reimagining of the control plane for constrained environments. Rancher made deliberate trade-offs:

  • etcd → Kine/SQLite: Sacrificed watch latency and native HA for operational simplicity
  • Separate binaries → Single binary: Sacrificed independent upgradeability for atomic deployments
  • CCM → ServiceLB: Sacrificed cloud-native LB for zero-dependency load balancing
  • Direct kubelet access → Reverse tunnel: Sacrificed simplicity for NAT traversal capability

The result is a distribution that runs the full Kubernetes API on a Raspberry Pi with 512MB of RAM, starts in 30 seconds, and works behind NAT — things standard K8s simply wasn't designed for.

Both are Kubernetes. Both run your workloads. The control plane is where the real difference lives.


Top comments (0)