James Lee

Posted on May 19

Kubernetes Resource Orchestration: How kubelet Prepares Storage, Network & Compute for Every Pod

#architecture #devops #infrastructure #kubernetes

In the previous article we covered how kube-scheduler selects the optimal node for a Pod through its three-phase pipeline (Filter → Score → Preemption). Once that decision is written to etcd, the baton passes to kubelet. This article covers what happens next: resource orchestration.

1. What Is Resource Orchestration?

Resource orchestration is the process by which a worker node's kubelet, upon receiving a scheduling result, organizes and prepares all the resources a workload needs before its containers can start.

These resources fall into three categories:

Category	What gets prepared
Storage	Persistent volumes (PVCs) and ephemeral (temporary) storage
Network	Shared Linux network stack for the Pod + per-container network devices
Compute	CPU and memory allocation and management via cgroups

2. Where Orchestration Fits in the Control Flow

Resource orchestration is step ⑤ of the resource creation pipeline — it begins the moment kubelet detects a new scheduling binding in etcd:

┌─────────────────────────────────────────────────────────────────────┐
│   kubectl / REST Request                                            │
│          │ ①                                                        │
│          ▼                                                          │
│   ┌──────────────────────────────────────────────────────────────┐  │
│   │                    kube-apiserver                            │  │
│   └──┬───────────────────────┬──────────────────────────────┬───┘  │
│    ② │                     ③ │                            ④ │      │
│      ▼                       ▼                              ▼      │
│    etcd          kube-controller-manager            kube-scheduler  │
│                                                    (binding → etcd) │
│                                                           │ ⑤       │
│                                                           ▼         │
│                                           ┌──────────────────────┐  │
│                                           │       kubelet        │  │
│                                           │  RESOURCE            │  │
│                                           │  ORCHESTRATION  ←    │  │
│                                           │        │ ⑥           │  │
│                                           │  ┌─────▼──────────┐  │  │
│                                           │  │ Pod [C]...[C]  │  │  │
│                                           │  └────────────────┘  │  │
│                                           └──────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Trigger: kubelet watches etcd for new Pod/Node binding records at its designated path. When a new binding appears for its node, orchestration begins immediately.

3. The Resource Orchestration Pipeline

The three resource categories are prepared in a strict sequential order — each phase depends on the previous one being ready:

Scheduling result received by kubelet
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│  PHASE 1: STORAGE                                           │
│                                                             │
│  ┌──────────────────────┐  ┌──────────────────────────┐    │
│  │  Persistent Storage  │  │   Ephemeral Storage      │    │
│  │  (PVC → PV mount)    │  │   (emptyDir, configMap,  │    │
│  │                      │  │    secret, etc.)          │    │
│  └──────────────────────┘  └──────────────────────────┘    │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  PHASE 2: NETWORK                                           │
│                                                             │
│  Step A: Create shared Linux network stack for the Pod      │
│          (network namespace) + connect to host network      │
│                    │                                        │
│                    ▼                                        │
│  Step B: Create per-container network devices               │
│          + attach to the shared Pod network stack           │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  PHASE 3: COMPUTE                                           │
│                                                             │
│  ┌──────────────────────┐  ┌──────────────────────────┐    │
│  │  Memory Management   │  │    CPU Management        │    │
│  │  (cgroup limits,     │  │    (cgroup CPU shares,   │    │
│  │   OOM priority)      │  │     CPU pinning for      │    │
│  │                      │  │     guaranteed QoS)      │    │
│  └──────────────────────┘  └──────────────────────────┘    │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
                All resources ready → containers start  ✅

4. Deep Dive: Storage Orchestration

Storage is prepared first because containers may need to mount volumes before their processes start.

Persistent Storage

Persistent storage survives Pod restarts and rescheduling. The orchestration steps are:

PersistentVolumeClaim (PVC) in Pod spec
      │
      ▼
kubelet checks if PVC is already bound to a PV
      │
      ├── Not bound → kube-controller-manager's
      │               PersistentVolume controller binds PVC → PV
      │
      └── Bound → kubelet calls volume plugin to attach the volume
                  to the node (e.g. mount NFS, attach EBS disk)
                        │
                        ▼
                  Volume mounted into Pod's filesystem
                  at the path specified in volumeMounts  ✅

Common persistent volume types:

Type	Use case
`PersistentVolumeClaim`	Dynamic provisioning (cloud disks, NFS, Ceph)
`hostPath`	Direct mount from node filesystem (dev/testing only)
`nfs`	Shared network filesystem across Pods
`configMap` / `secret`	Configuration and credentials injected as files

Ephemeral Storage

Temporary storage exists only for the lifetime of the Pod:

Type	Description
`emptyDir`	Empty directory created when Pod starts, deleted when Pod ends
Container writable layer	Each container's own writable overlay filesystem
`downwardAPI`	Exposes Pod metadata (labels, annotations) as files

5. Deep Dive: Network Orchestration

Network setup is a two-step process, coordinated between kubelet and the CNI (Container Network Interface) plugin.

Step A — Create the Pod Network Namespace

kubelet calls CNI plugin (e.g. Flannel, Calico, Cilium)
      │
      ▼
CNI creates a new Linux network namespace for the Pod
(all containers in the Pod share this namespace)
      │
      ▼
CNI creates a virtual ethernet pair (veth pair):
  - One end: inside the Pod network namespace (eth0)
  - One end: on the host network bridge
      │
      ▼
Pod gets an IP address from the cluster CIDR
Routing rules updated so other Pods can reach this IP  ✅

Step B — Connect Each Container

For each container in the Pod:
      │
      ▼
Container runtime creates the container
with the Pod's existing network namespace
(NOT a new namespace — shared with all siblings)
      │
      ▼
Container sees the same eth0, same IP, same ports
as all other containers in the Pod  ✅

Key insight: All containers in a Pod share one IP address and one network namespace. They communicate with each other via localhost. This is why a Pod is the atomic unit of networking in Kubernetes, not the individual container.

kube-proxy's role

While kubelet handles Pod-level networking, kube-proxy handles Service-level networking:

kube-proxy watches Service and Endpoint objects in etcd
      │
      ▼
Maintains iptables / IPVS rules on every node
      │
      ▼
Traffic to Service ClusterIP → load-balanced to healthy Pod IPs  ✅

6. Deep Dive: Compute Orchestration

Compute resources (CPU and memory) are managed via Linux cgroups, enforcing the requests and limits defined in the Pod spec.

Memory Management

resources:
  requests:
    memory: "256Mi"    # guaranteed minimum — node must have this free
  limits:
    memory: "512Mi"    # hard cap — exceed this → OOMKilled

kubelet creates cgroup for the Pod
      │
      ├── memory.limit_in_bytes = limits.memory (512Mi)
      │   Container exceeds this → Linux OOM killer terminates it
      │
      └── memory.soft_limit_in_bytes = requests.memory (256Mi)
          Guaranteed allocation under memory pressure

CPU Management

resources:
  requests:
    cpu: "500m"        # 0.5 CPU — used for scheduling decisions
  limits:
    cpu: "1000m"       # 1.0 CPU — hard cap via CFS bandwidth

kubelet creates cgroup for the Pod
      │
      ├── cpu.shares = proportional to requests.cpu
      │   (relative weight when CPU is contested)
      │
      └── cpu.cfs_quota_us = limits.cpu
          (hard cap: container throttled if it exceeds this)

QoS Classes

Kubernetes assigns a QoS class based on resource spec, which determines eviction priority under node pressure:

QoS Class	Condition	Eviction priority
Guaranteed	`requests == limits` for all containers	Last to be evicted
Burstable	`requests < limits` (at least one container)	Middle priority
BestEffort	No `requests` or `limits` set	First to be evicted

7. The Complete Orchestration Flow

kubelet detects new Pod/Node binding in etcd
      │
      ▼
① Admit Pod (check node-level admission, resource availability)
      │
      ▼
② Pull container images (if not cached locally)
      │
      ▼
③ Prepare STORAGE
   ├── Attach/mount persistent volumes (PVC → PV)
   └── Create ephemeral volumes (emptyDir, secrets, configMaps)
      │
      ▼
④ Prepare NETWORK (via CNI plugin)
   ├── Create Pod network namespace
   ├── Assign Pod IP from cluster CIDR
   └── Connect containers to shared network namespace
      │
      ▼
⑤ Prepare COMPUTE (via cgroups)
   ├── Set memory limits and soft limits
   └── Set CPU shares and CFS quota
      │
      ▼
⑥ Start containers via container runtime (containerd/CRI-O)
      │
      ▼
⑦ Monitor Pod health (liveness/readiness probes)
   Report status back to kube-apiserver  ✅

8. Summary

Phase	Managed by	Key mechanism	Purpose
Storage	kubelet + volume plugins	PVC/PV binding, mount	Provide data persistence and config injection
Network	kubelet + CNI plugin	Linux netns, veth pairs, iptables	Give Pod a unique IP, enable cluster-wide connectivity
Compute	kubelet + cgroups	`cpu.shares`, `memory.limit_in_bytes`	Enforce resource requests/limits, determine QoS class

Resource orchestration is the bridge between the scheduler's abstract decision ("run this Pod on Node D") and the concrete reality of a running container. It's where Kubernetes turns YAML into a live process — with its own IP address, mounted volumes, and guaranteed CPU and memory.

Next in this series: Kubernetes Data Access Flow: How Pods Read and Write Persistent Storage (Part 6)

Follow the series for more deep dives into Kubernetes internals.

📦 Source Code: github.com/muzinan123/k8s-paas