In the previous article we covered how kube-scheduler selects the optimal node for a Pod through its three-phase pipeline (Filter → Score → Preemption). Once that decision is written to etcd, the baton passes to kubelet. This article covers what happens next: resource orchestration.
1. What Is Resource Orchestration?
Resource orchestration is the process by which a worker node's kubelet, upon receiving a scheduling result, organizes and prepares all the resources a workload needs before its containers can start.
These resources fall into three categories:
| Category | What gets prepared |
|---|---|
| Storage | Persistent volumes (PVCs) and ephemeral (temporary) storage |
| Network | Shared Linux network stack for the Pod + per-container network devices |
| Compute | CPU and memory allocation and management via cgroups |
2. Where Orchestration Fits in the Control Flow
Resource orchestration is step ⑤ of the resource creation pipeline — it begins the moment kubelet detects a new scheduling binding in etcd:
┌─────────────────────────────────────────────────────────────────────┐
│ kubectl / REST Request │
│ │ ① │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ kube-apiserver │ │
│ └──┬───────────────────────┬──────────────────────────────┬───┘ │
│ ② │ ③ │ ④ │ │
│ ▼ ▼ ▼ │
│ etcd kube-controller-manager kube-scheduler │
│ (binding → etcd) │
│ │ ⑤ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ kubelet │ │
│ │ RESOURCE │ │
│ │ ORCHESTRATION ← │ │
│ │ │ ⑥ │ │
│ │ ┌─────▼──────────┐ │ │
│ │ │ Pod [C]...[C] │ │ │
│ │ └────────────────┘ │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Trigger: kubelet watches etcd for new Pod/Node binding records at its designated path. When a new binding appears for its node, orchestration begins immediately.
3. The Resource Orchestration Pipeline
The three resource categories are prepared in a strict sequential order — each phase depends on the previous one being ready:
Scheduling result received by kubelet
│
▼
┌─────────────────────────────────────────────────────────────┐
│ PHASE 1: STORAGE │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ Persistent Storage │ │ Ephemeral Storage │ │
│ │ (PVC → PV mount) │ │ (emptyDir, configMap, │ │
│ │ │ │ secret, etc.) │ │
│ └──────────────────────┘ └──────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ PHASE 2: NETWORK │
│ │
│ Step A: Create shared Linux network stack for the Pod │
│ (network namespace) + connect to host network │
│ │ │
│ ▼ │
│ Step B: Create per-container network devices │
│ + attach to the shared Pod network stack │
└─────────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ PHASE 3: COMPUTE │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ Memory Management │ │ CPU Management │ │
│ │ (cgroup limits, │ │ (cgroup CPU shares, │ │
│ │ OOM priority) │ │ CPU pinning for │ │
│ │ │ │ guaranteed QoS) │ │
│ └──────────────────────┘ └──────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
│
▼
All resources ready → containers start ✅
4. Deep Dive: Storage Orchestration
Storage is prepared first because containers may need to mount volumes before their processes start.
Persistent Storage
Persistent storage survives Pod restarts and rescheduling. The orchestration steps are:
PersistentVolumeClaim (PVC) in Pod spec
│
▼
kubelet checks if PVC is already bound to a PV
│
├── Not bound → kube-controller-manager's
│ PersistentVolume controller binds PVC → PV
│
└── Bound → kubelet calls volume plugin to attach the volume
to the node (e.g. mount NFS, attach EBS disk)
│
▼
Volume mounted into Pod's filesystem
at the path specified in volumeMounts ✅
Common persistent volume types:
| Type | Use case |
|---|---|
PersistentVolumeClaim |
Dynamic provisioning (cloud disks, NFS, Ceph) |
hostPath |
Direct mount from node filesystem (dev/testing only) |
nfs |
Shared network filesystem across Pods |
configMap / secret
|
Configuration and credentials injected as files |
Ephemeral Storage
Temporary storage exists only for the lifetime of the Pod:
| Type | Description |
|---|---|
emptyDir |
Empty directory created when Pod starts, deleted when Pod ends |
| Container writable layer | Each container's own writable overlay filesystem |
downwardAPI |
Exposes Pod metadata (labels, annotations) as files |
5. Deep Dive: Network Orchestration
Network setup is a two-step process, coordinated between kubelet and the CNI (Container Network Interface) plugin.
Step A — Create the Pod Network Namespace
kubelet calls CNI plugin (e.g. Flannel, Calico, Cilium)
│
▼
CNI creates a new Linux network namespace for the Pod
(all containers in the Pod share this namespace)
│
▼
CNI creates a virtual ethernet pair (veth pair):
- One end: inside the Pod network namespace (eth0)
- One end: on the host network bridge
│
▼
Pod gets an IP address from the cluster CIDR
Routing rules updated so other Pods can reach this IP ✅
Step B — Connect Each Container
For each container in the Pod:
│
▼
Container runtime creates the container
with the Pod's existing network namespace
(NOT a new namespace — shared with all siblings)
│
▼
Container sees the same eth0, same IP, same ports
as all other containers in the Pod ✅
Key insight: All containers in a Pod share one IP address and one network namespace. They communicate with each other via
localhost. This is why a Pod is the atomic unit of networking in Kubernetes, not the individual container.
kube-proxy's role
While kubelet handles Pod-level networking, kube-proxy handles Service-level networking:
kube-proxy watches Service and Endpoint objects in etcd
│
▼
Maintains iptables / IPVS rules on every node
│
▼
Traffic to Service ClusterIP → load-balanced to healthy Pod IPs ✅
6. Deep Dive: Compute Orchestration
Compute resources (CPU and memory) are managed via Linux cgroups, enforcing the requests and limits defined in the Pod spec.
Memory Management
resources:
requests:
memory: "256Mi" # guaranteed minimum — node must have this free
limits:
memory: "512Mi" # hard cap — exceed this → OOMKilled
kubelet creates cgroup for the Pod
│
├── memory.limit_in_bytes = limits.memory (512Mi)
│ Container exceeds this → Linux OOM killer terminates it
│
└── memory.soft_limit_in_bytes = requests.memory (256Mi)
Guaranteed allocation under memory pressure
CPU Management
resources:
requests:
cpu: "500m" # 0.5 CPU — used for scheduling decisions
limits:
cpu: "1000m" # 1.0 CPU — hard cap via CFS bandwidth
kubelet creates cgroup for the Pod
│
├── cpu.shares = proportional to requests.cpu
│ (relative weight when CPU is contested)
│
└── cpu.cfs_quota_us = limits.cpu
(hard cap: container throttled if it exceeds this)
QoS Classes
Kubernetes assigns a QoS class based on resource spec, which determines eviction priority under node pressure:
| QoS Class | Condition | Eviction priority |
|---|---|---|
| Guaranteed |
requests == limits for all containers |
Last to be evicted |
| Burstable |
requests < limits (at least one container) |
Middle priority |
| BestEffort | No requests or limits set |
First to be evicted |
7. The Complete Orchestration Flow
kubelet detects new Pod/Node binding in etcd
│
▼
① Admit Pod (check node-level admission, resource availability)
│
▼
② Pull container images (if not cached locally)
│
▼
③ Prepare STORAGE
├── Attach/mount persistent volumes (PVC → PV)
└── Create ephemeral volumes (emptyDir, secrets, configMaps)
│
▼
④ Prepare NETWORK (via CNI plugin)
├── Create Pod network namespace
├── Assign Pod IP from cluster CIDR
└── Connect containers to shared network namespace
│
▼
⑤ Prepare COMPUTE (via cgroups)
├── Set memory limits and soft limits
└── Set CPU shares and CFS quota
│
▼
⑥ Start containers via container runtime (containerd/CRI-O)
│
▼
⑦ Monitor Pod health (liveness/readiness probes)
Report status back to kube-apiserver ✅
8. Summary
| Phase | Managed by | Key mechanism | Purpose |
|---|---|---|---|
| Storage | kubelet + volume plugins | PVC/PV binding, mount | Provide data persistence and config injection |
| Network | kubelet + CNI plugin | Linux netns, veth pairs, iptables | Give Pod a unique IP, enable cluster-wide connectivity |
| Compute | kubelet + cgroups |
cpu.shares, memory.limit_in_bytes
|
Enforce resource requests/limits, determine QoS class |
Resource orchestration is the bridge between the scheduler's abstract decision ("run this Pod on Node D") and the concrete reality of a running container. It's where Kubernetes turns YAML into a live process — with its own IP address, mounted volumes, and guaranteed CPU and memory.
Next in this series: Kubernetes Data Access Flow: How Pods Read and Write Persistent Storage (Part 6)
Follow the series for more deep dives into Kubernetes internals.
Top comments (0)