<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gulcan</title>
    <description>The latest articles on DEV Community by Gulcan (@gulcan).</description>
    <link>https://dev.to/gulcan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3942767%2Fe7c8bb4a-c91c-4401-bfb0-089f2f56f754.png</url>
      <title>DEV Community: Gulcan</title>
      <link>https://dev.to/gulcan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gulcan"/>
    <language>en</language>
    <item>
      <title>Kubelet Metrics: How cAdvisor and CRI Collect Kubernetes Stats</title>
      <dc:creator>Gulcan</dc:creator>
      <pubDate>Wed, 20 May 2026 17:52:02 +0000</pubDate>
      <link>https://dev.to/gulcan/kubelet-metrics-how-cadvisor-and-cri-collect-kubernetes-stats-2k71</link>
      <guid>https://dev.to/gulcan/kubelet-metrics-how-cadvisor-and-cri-collect-kubernetes-stats-2k71</guid>
      <description>&lt;p&gt;This article was originally published on &lt;a href="https://learnkube.com/kubernetes-metrics-cadvisor-kubelet-cri" rel="noopener noreferrer"&gt;LearnKube&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;TL;DR: This article dissects the Kubernetes metrics pipeline through kubelet, cAdvisor, and CRI to show where your metrics actually come from and what breaks when the defaults change.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This article breaks down how Kubernetes collects container, pod, and node metrics, starting with cAdvisor and the Linux kernel, then shifting to a CRI-native model powered by gRPC.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You’ll see how kubelet exposes this data, what happens when you flip &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt;, why container metrics on &lt;code&gt;/metrics/cadvisor&lt;/code&gt; can be sourced from CRI instead of cAdvisor, and how to trace each metric back to its origin.&lt;/p&gt;

&lt;p&gt;It also explains how kubelet talks to the CRI over gRPC, and why understanding this matters if you rely on Prometheus, Grafana, or any observability stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Table of contents&lt;/li&gt;
&lt;li&gt;How Kubernetes Monitoring Layers Stack Up&lt;/li&gt;
&lt;li&gt;Where Metrics Originate&lt;/li&gt;
&lt;li&gt;cgroup v1 with cgroupfs: The Legacy Baseline&lt;/li&gt;
&lt;li&gt;At the crux of how cgroup hierarchy is shaped&lt;/li&gt;
&lt;li&gt;How Kubernetes Creates and Manages the Cgroup Hierarchy&lt;/li&gt;
&lt;li&gt;Kubernetes QoS Classes and cgroup Placement&lt;/li&gt;
&lt;li&gt;Auto-Detecting cgroup Drivers via KubeletCgroupDriverFromCRI&lt;/li&gt;
&lt;li&gt;cAdvisor: Embedded Resource Monitoring in Kubelet&lt;/li&gt;
&lt;li&gt;Kubelet’s Metrics Endpoints&lt;/li&gt;
&lt;li&gt;From cAdvisor to CRI: How Kubelet Collects Metrics Today&lt;/li&gt;
&lt;li&gt;Validating CRI-Based Metrics Collection in Kubelet&lt;/li&gt;
&lt;li&gt;Summary&lt;/li&gt;
&lt;li&gt;References&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Kubernetes Monitoring Layers Stack Up
&lt;/h2&gt;

&lt;p&gt;Kubernetes metrics are the lifeblood of observability in your clusters.&lt;/p&gt;

&lt;p&gt;While tools like Prometheus and Grafana often dominate the monitoring conversation, it's worth understanding the native mechanisms that Kubernetes uses to collect, expose, and leverage metrics before they ever reach those external systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes monitoring works as a multi-layered system which provides insights that span from bare metal to application workloads.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each layer builds upon the previous one to create a comprehensive picture of your cluster's health.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At the foundation sit node-level metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueded29uyz3ajjzltoua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueded29uyz3ajjzltoua.png" alt=" " width="640" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These reveal the utilization of physical and virtual resources like CPU, memory, and disk I/O.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/prometheus/node_exporter" rel="noopener noreferrer"&gt;Prometheus Node Exporter&lt;/a&gt; is commonly used to collect these fundamental metrics, but they originate from the operating system itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One layer up are Kubernetes component metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4o67lhgw09c81dxb43x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4o67lhgw09c81dxb43x.png" alt=" " width="640" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These expose the health and performance of core services such as kubelet, kube-proxy, and the API server.&lt;/p&gt;

&lt;p&gt;Metrics like pod startup latency or API request throughput can tell you whether your control plane is running efficiently and reliably.&lt;/p&gt;

&lt;p&gt;Zooming out to the object layer, &lt;strong&gt;API resource metrics, often surfaced by tools like &lt;a href="https://github.com/kubernetes/kube-state-metrics" rel="noopener noreferrer"&gt;&lt;code&gt;kube-state-metrics&lt;/code&gt;&lt;/a&gt;, offer visibility into Kubernetes objects.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy9jhj3a24e97a8m9eio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy9jhj3a24e97a8m9eio.png" alt=" " width="640" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They track details such as the number of pods in a namespace, deployment status, or the number of services running across your cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finally, at the top layer are pod and container workload metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsvxft0l5d1zj7n6vbos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsvxft0l5d1zj7n6vbos.png" alt=" " width="640" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These focus on the actual performance of your applications.&lt;/p&gt;

&lt;p&gt;This is where critical signals like CPU throttling come into play.&lt;/p&gt;

&lt;p&gt;For instance, knowing how often a container is blocked from using CPU because it's hit its limit can reveal performance bottlenecks that might otherwise remain hidden.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Metrics Originate
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes defines resource requests and limits, but the kernel does the actual enforcement.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It relies on the Linux kernel’s control groups, known as cgroups, to apply those rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64uth4zgbmw5qr44iioa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64uth4zgbmw5qr44iioa.png" alt=" " width="640" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpilgju4imj9jyzzdi658.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpilgju4imj9jyzzdi658.png" alt=" " width="640" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhj94960z92lz2t28993f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhj94960z92lz2t28993f.png" alt=" " width="640" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cgroups are directories in the &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt; virtual filesystem.&lt;/p&gt;

&lt;p&gt;They are a live view of resource allocation and enforcement at the kernel level, exposed as files you can read and write.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These directories define how much CPU time, memory, or I/O bandwidth a process is allowed to consume.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this context, a resource is anything the system can allocate, limit, and monitor: CPU cycles, memory usage, disk throughput, network bandwidth, even the number of process IDs a container can spawn.&lt;/p&gt;

&lt;p&gt;But defining resources is only half of the story.&lt;/p&gt;

&lt;p&gt;That’s where controllers make all the difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/resource_management_guide/br-resource_controllers_in_linux_kernel" rel="noopener noreferrer"&gt;A controller is a kernel component&lt;/a&gt; that enforces resource policies and monitors usage for a specific type of resource.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For every resource, there’s a controller in cgroups that governs it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes7uvjuq4gn58qosc4u5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes7uvjuq4gn58qosc4u5.png" alt=" " width="640" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7wlmirvh0rvetqyjkt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7wlmirvh0rvetqyjkt9.png" alt=" " width="640" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The kernel reads them, applies the rules they define, and keeps every container within its resource boundaries.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Let's start a Minikube cluster with containerd as the container runtime, and deploy a Python pod to see this in action:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start -c containerd
kubectl create deployment python \
  --image=ghcr.io/learnk8s/python-metrics \
  --port=8080 \
  -- /usr/local/bin/python3 -m http.server 8080

kubectl get po -o wide
NAME                      READY   STATUS    IP
python-66dc9f5c8b-w6x4b   1/1     Running   10.244.0.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Linux cgroup API has two versions: cgroup v1 and cgroup v2.&lt;/p&gt;

&lt;p&gt;Each version structures resource management differently.&lt;/p&gt;

&lt;p&gt;To understand why cgroup v2 and the systemd driver matter, it helps to start with the older model first: cgroup v1 with the cgroupfs driver.&lt;/p&gt;

&lt;h2&gt;
  
  
  cgroup v1 with cgroupfs: The Legacy Baseline
&lt;/h2&gt;

&lt;p&gt;In this model, Kubernetes and the container runtime manage cgroups by writing directly to the cgroup filesystem.&lt;/p&gt;

&lt;p&gt;That works, but it also means the hierarchy is shaped by separate controller trees rather than one unified resource tree.&lt;/p&gt;

&lt;p&gt;In cgroup v1, kubelet and the container runtime can still be configured to use either &lt;code&gt;systemd&lt;/code&gt; or &lt;code&gt;cgroupfs&lt;/code&gt;, as long as both sides use the same driver.&lt;/p&gt;

&lt;p&gt;Now let's step into a cgroup v1 environment and see how Kubernetes builds its QoS-based hierarchies when it uses the &lt;code&gt;cgroupfs&lt;/code&gt; driver.&lt;/p&gt;

&lt;p&gt;We’ll delete our existing Minikube cluster and reboot into a system where cgroup v1 is enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;There are several ways to switch a Linux system back to cgroup v1.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You might pass kernel boot parameters like &lt;code&gt;systemd.unified_cgroup_hierarchy=0&lt;/code&gt; or disable cgroup v2 entirely, depending on the environment, whether it’s bare metal, a VM, or WSL2.&lt;/p&gt;

&lt;p&gt;Once the node boots into cgroup v1, Kubernetes automatically detects it and adjusts its resource management behavior.&lt;/p&gt;

&lt;p&gt;First, confirm the system is operating under cgroup v1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stat -fc %T /sys/fs/cgroup/
tmpfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now start a fresh Minikube cluster with the containerd runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start -c containerd
kubectl create deployment python \
  --image=ghcr.io/learnk8s/python-metrics \
  --port=8080 \
  -- /usr/local/bin/python3 -m http.server 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And deploy the Python pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get po -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP
python-66dc9f5c8b-4248r   1/1     Running   0          42s   10.244.0.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we focus on how Kubernetes structures the cgroups under cgroup v1 with the cgroupfs driver.&lt;/p&gt;

&lt;p&gt;Kubernetes enforces QoS-based resource isolation by creating separate hierarchies for each QoS class under every controller.&lt;/p&gt;

&lt;p&gt;We confirm the kubelet configuration to verify this setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl proxy --port=8001 &amp;amp;
curl -X GET http://127.0.0.1:8001/api/v1/nodes/minikube/proxy/configz | jq . | grep -i qos
"cgroupsPerQOS": true,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per-QoS hierarchy creation is enabled, but which driver is kubelet using to manage these hierarchies?:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo cat /var/lib/kubelet/config.yaml | grep -i cgroupDriver"
cgroupDriver: cgroupfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In cgroup v1 with &lt;code&gt;cgroupsPerQOS: true&lt;/code&gt;, kubelet’s use of the &lt;code&gt;cgroupfs&lt;/code&gt; driver results in Kubernetes creating and managing separate cgroup subtrees for QoS classes under each controller.&lt;/p&gt;

&lt;p&gt;Let's inspect the CPU controller directory structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/cpu/kubepods/"
drwxr-xr-x 5 root root 0 Mar 20 12:10 besteffort
drwxr-xr-x 7 root root 0 Mar 20 12:11 burstable
drwxr-xr-x 3 root root 0 Mar 20 12:12 guaranteed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each QoS class gets its own directory under each controller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Since our Python pod was deployed without resource requests, we can locate it under the &lt;code&gt;besteffort&lt;/code&gt; QoS class:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/cpu/kubepods/besteffort/"
drwxr-xr-x 4 root root 0 Mar 20 03:51 pod23e59e27-abe5-4529-bf9c-581516ae0c0b
drwxr-xr-x 4 root root 0 Mar 20 03:51 pod9f874003-a948-425d-a072-f389dc21bdff
drwxr-xr-x 4 root root 0 Mar 20 03:51 podc1d8cd50-b50a-4b3c-a33d-8963242c60ef
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We find multiple pod directories, named by their UID.&lt;/p&gt;

&lt;p&gt;To correlate the pod directory with the actual python pod let's retrieve its UID from the Kubernetes API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-4248r -o jsonpath='{.metadata.uid}'
c1d8cd50-b50a-4b3c-a33d-8963242c60ef
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matches the directory &lt;code&gt;podc1d8cd50-b50a-4b3c-a33d-8963242c60ef&lt;/code&gt; under the &lt;code&gt;besteffort&lt;/code&gt; class.&lt;/p&gt;

&lt;p&gt;Inside this pod directory, each container has its own cgroup, named after the container ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/cpu/kubepods/besteffort/podc1d8cd50-b50a-4b3c-a33d-8963242c60ef/"
-rw-r--r-- 1 root root 0 Mar 20 12:16 cpu.shares
-rw-r--r-- 1 root root 0 Mar 20 12:16 cpu.cfs_quota_us
drwxr-xr-x 2 root root 0 Mar 20 03:52 ef455b35bf7e2afa0942e25b58cd10858d40ed1d97fffe7f0b6a664d2e64aa54
-rw-r--r-- 1 root root 0 Mar 20 04:22 tasks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, we can inspect the pod’s memory limit in the memory controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /sys/fs/cgroup/memory/kubepods/besteffort/\
podc1d8cd50-b50a-4b3c-a33d-8963242c60ef/\
memory.limit_in_bytes"

9223372036854771712
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This very large value is an effectively unlimited memory ceiling, which is expected for a BestEffort pod.&lt;/p&gt;

&lt;p&gt;At this point, kubelet decides where the pod belongs in the QoS hierarchy, the container runtime helps create and configure the container cgroups, and the kernel enforces the resulting cgroup settings for the processes attached to them.&lt;/p&gt;

&lt;h2&gt;
  
  
  At the crux of how cgroup hierarchy is shaped
&lt;/h2&gt;

&lt;p&gt;In cgroup v1, each controller operates in its own separate hierarchy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7f918dfwicy8y7jp7dt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7f918dfwicy8y7jp7dt.png" alt=" " width="640" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we list the mounted cgroup controllers in cgroup v1, we see each one mounted independently as its own filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "mount | grep cgroup"

cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,relatime,pids)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This indicates that each controller, whether CPU, memory, or pids, has its own mount point and hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can confirm this separation by checking &lt;code&gt;/proc/cgroups&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /proc/cgroups"

#subsys_name    hierarchy    num_cgroups    enabled
cpuset          1            34             1
cpu             2            52             1
cpuacct         3            34             1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we check the filesystem type of &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt; in cgroup v1, it reports &lt;code&gt;tmpfs&lt;/code&gt; instead of &lt;code&gt;cgroup2fs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "stat -fc %T /sys/fs/cgroup/"

tmpfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cgroup fs structure looks like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/"

drwxr-xr-x 15 root root   0 Feb 23 05:17 blkio
drwxr-xr-x 15 root root   0 Feb 23 05:17 cpu
drwxr-xr-x  2 root root  40 Feb 23 05:17 cpu,cpuacct
drwxr-xr-x 23 root root   0 Feb 23 05:17 cpuacct
drwxr-xr-x 23 root root   0 Feb 23 05:17 cpuset
drwxr-xr-x 18 root root   0 Feb 23 05:17 devices
drwxr-xr-x 23 root root   0 Feb 23 05:17 freezer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core limitation of cgroup v1: CPU, memory, pids, and other controllers can each have their own hierarchy, so resource management is split across multiple trees.&lt;/p&gt;

&lt;p&gt;cgroup v2 fixes that part by moving controllers into a single unified hierarchy.&lt;/p&gt;

&lt;p&gt;Now let's switch to a cgroup v2 system and examine the structure of the cgroup filesystem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/"

-r--r--r-- 1 root root 0 Apr 28 10:51 cgroup.controllers
-r--r--r-- 1 root root 0 Apr 28 10:58 cgroup.stat
-rw-r--r-- 1 root root 0 Apr 28 10:51 memory.high
drwxr-xr-x 5 root root 0 Apr 28 10:51 kubepods.slice
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;All resource controllers are managed together in a single tree rooted at &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To confirm that cgroup v2 is active, we can inspect the mounted cgroup filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "mount | grep cgroup"

cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can list the active controllers that the kernel has attached to this unified hierarchy by reading &lt;code&gt;/proc/cgroups&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In cgroup v2, all controllers operate within a single hierarchy, and the hierarchy column reflects this by showing &lt;code&gt;0&lt;/code&gt; for each controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /proc/cgroups"

#subsys_name    hierarchy       num_cgroups     enabled
cpu     0       208     1
cpuacct 0       208     1
blkio   0       208     1
devices 0       208     1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To verify the filesystem type for &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;, we can run the &lt;code&gt;stat&lt;/code&gt; utility.&lt;/p&gt;

&lt;p&gt;In cgroup v2, this command reports &lt;code&gt;cgroup2fs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "stat -fc %T /sys/fs/cgroup/"

cgroup2fs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If it shows &lt;code&gt;cgroup2fs&lt;/code&gt;, we know we’re running cgroup v2.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So cgroup v2 cleans up the kernel-side hierarchy, but it does not answer the ownership question by itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhkoajifbkuyg20yazw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhkoajifbkuyg20yazw1.png" alt=" " width="640" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On a systemd-based node, Kubernetes still needs to decide who owns and manages the cgroup tree: systemd or direct filesystem writes through cgroupfs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;cgroup v1 is now only relevant for legacy systems, and its days are officially numbered.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Modern distributions such as &lt;a href="https://discourse.ubuntu.com/t/performance/29416" rel="noopener noreferrer"&gt;Ubuntu 22.04+&lt;/a&gt;, &lt;a href="https://fedoraproject.org/wiki/Changes/CGroupsV2" rel="noopener noreferrer"&gt;Fedora 31+&lt;/a&gt;, and &lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/9.0_release_notes/new-features" rel="noopener noreferrer"&gt;RHEL 9+&lt;/a&gt; enable cgroup v2 by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes has supported cgroup v2 as stable since v1.25, and cgroup v1 has been officially deprecated since Kubernetes v1.35 as part of &lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5573-remove-cgroup-v1/README.md" rel="noopener noreferrer"&gt;KEP-5573&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Starting with Kubernetes v1.35, kubelet no longer starts on cgroup v1 nodes by default unless &lt;code&gt;failCgroupV1&lt;/code&gt; is explicitly set to &lt;code&gt;false&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you’re running production clusters that still use cgroup v1, you should plan a migration to cgroup v2 and define an upgrade or rollback strategy in advance.&lt;/p&gt;

&lt;p&gt;So far, we've seen how cgroup v1 and v2 shape the filesystem layout, and we've learned how to verify which mode the node is using.&lt;/p&gt;

&lt;p&gt;But to understand how Kubernetes actually turns that kernel structure into pod and container boundaries, we now need to look at the two decisions kubelet makes next: which cgroup manager it initializes, and which cgroup driver owns the tree.&lt;/p&gt;

&lt;p&gt;And that is where the cgroup driver comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Kubernetes Creates and Manages the Cgroup Hierarchy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;On a Kubernetes node, kubelet and the container runtime collaborate to build and maintain the cgroup hierarchy used for enforcing pod-level resource constraints.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before either component can create or manage any cgroups, kubelet needs to resolve one fundamental question: &lt;em&gt;is the node running cgroup v1 or cgroup v2?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That answer comes early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At startup, kubelet queries the kernel to determine the active cgroup mode.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If it detects cgroup v2, it initializes a v2-specific manager built for the unified hierarchy.&lt;/p&gt;

&lt;p&gt;If the node is using cgroup v1, it falls back to a legacy manager.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This decision locks in the way kubelet will interact with kernel-level resource controls for the lifetime of the process.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But the cgroup version is only half the equation.&lt;/p&gt;

&lt;p&gt;The other part is who is responsible for actually managing the cgroup tree within &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is called the cgroup driver.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubelet supports two drivers: systemd or cgroupfs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5uw62996wkyivuib8p7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5uw62996wkyivuib8p7.png" alt=" " width="640" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It picks one or the other, never both at the same time.&lt;/p&gt;

&lt;p&gt;In cgroup v2, the unified hierarchy makes the &lt;code&gt;systemd&lt;/code&gt; cgroup driver the recommended choice on systemd-based Linux distributions.&lt;/p&gt;

&lt;p&gt;Kubelet can still be configured to use &lt;code&gt;cgroupfs&lt;/code&gt;, but Kubernetes recommends &lt;a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/" rel="noopener noreferrer"&gt;avoiding&lt;/a&gt; a setup where systemd and Kubernetes manage cgroups separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the driver is systemd, kubelet hands cgroup creation to systemd; instead of writing directories itself, it generates logical slice names like &lt;code&gt;kubepods.slice&lt;/code&gt; or &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These slices represent pod resource groups.&lt;/p&gt;

&lt;p&gt;After generating the slice names, kubelet asks systemd to instantiate and manage the cgroup structure beneath &lt;code&gt;/sys/fs/cgroup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the part cgroup v2 does not solve alone: ownership of the tree needs to be consistent.&lt;/p&gt;

&lt;p&gt;From that point on, all resource controls for pods are expressed through systemd’s unit model.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why systemd?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Because when you boot a modern Linux system, systemd is the first userspace process the kernel runs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It becomes PID 1.&lt;/p&gt;

&lt;p&gt;As PID 1, systemd takes ownership of process supervision and resource control for the entire system.&lt;/p&gt;

&lt;p&gt;Rather than using shell scripts, systemd defines behavior through typed units.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@sebastiancarlos/systemds-nuts-and-bolts-0ae7995e45d3" rel="noopener noreferrer"&gt;Units are structured configuration objects like &lt;code&gt;.service&lt;/code&gt;, &lt;code&gt;.scope&lt;/code&gt;, and &lt;code&gt;.slice&lt;/code&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A slice is how systemd partitions the system for resource control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In Kubernetes slices are automatically created by systemd based on pod QoS classes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foukuf89suljmyyh2peve.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foukuf89suljmyyh2peve.png" alt=" " width="640" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Think of slices like namespaces for CPU and memory budgets, managed for you behind the scenes.&lt;/p&gt;

&lt;p&gt;What matters is you can apply limits at the slice level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Services are the more familiar systemd unit type.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4unkuizghqi6ufmq0oyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4unkuizghqi6ufmq0oyc.png" alt=" " width="640" height="627"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;.service&lt;/code&gt; represents a process that systemd starts and supervises directly.&lt;/p&gt;

&lt;p&gt;On a Kubernetes node, kubelet and containerd usually run as services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubelet.service&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;containerd.service&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These services live under &lt;code&gt;system.slice&lt;/code&gt;, not under &lt;code&gt;kubepods.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That distinction matters: kubelet and containerd are host daemons that coordinate pod placement and container startup, but the containers themselves do not become children of &lt;code&gt;containerd.service&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The actual container processes are placed into Kubernetes pod cgroups under &lt;code&gt;kubepods.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Scopes are different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scopes are used when systemd needs to manage a process it inherits from another launcher and still wants to control.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqci10xkgf6fu0melzrus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqci10xkgf6fu0melzrus.png" alt=" " width="640" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example when the runtime launches a container, systemd can still take over and manage it.&lt;/p&gt;

&lt;p&gt;It does this by wrapping the container process in a &lt;code&gt;.scope&lt;/code&gt; unit.&lt;/p&gt;

&lt;p&gt;Then systemd creates a &lt;code&gt;.scope&lt;/code&gt; unit (such as &lt;code&gt;cri-containerd-&amp;lt;container-id&amp;gt;.scope&lt;/code&gt;) and places it inside an appropriate slice determined by the pod’s quality of service (QoS) class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But this only works if both kubelet and the container runtime agree on the cgroup driver.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If kubelet generates systemd slice names but containerd uses cgroupfs, the contract breaks.&lt;/p&gt;

&lt;p&gt;If the cgroup driver is cgroupfs, kubelet goes back to the older model: direct filesystem ownership.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubelet interacts with the kernel’s cgroup API through the filesystem to create and manage cgroup directories.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s step back into our Minikube cluster running cgroup v2 with containerd as the runtime.&lt;/p&gt;

&lt;p&gt;Containerd handles its end of the driver selection agreement through its &lt;a href="https://github.com/containerd/containerd/blob/main/docs/cri/config.md" rel="noopener noreferrer"&gt;configuration file&lt;/a&gt; in &lt;code&gt;/etc/containerd/config.toml&lt;/code&gt; through the &lt;code&gt;SystemdCgroup&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo cat /etc/containerd/config.toml | grep -i -C2 'SystemdCgroup'"
runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

  [plugins."io.containerd.grpc.v1.cri".cni]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the config version 2 format used by containerd 1.x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once kubelet and the runtime align on both the cgroup version and the driver, kubelet can safely take ownership of building the pod-level cgroup hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But in systemd with cgroup v2, which scope unit goes into which systemd slice?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That’s determined by the pod’s QoS class, which kubelet calculates based on the pod’s resource requests and limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes QoS Classes and cgroup Placement
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Based on the pod’s resource requests and limits, &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/" rel="noopener noreferrer"&gt;Kubernetes assigns it to one of three Quality-of-Service (QoS) classes,&lt;/a&gt; which influences where the pod is placed in the cgroup hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pod is classified as &lt;strong&gt;Guaranteed&lt;/strong&gt; only when every container has CPU and memory requests and limits set, and each request exactly matches its corresponding limit.&lt;/li&gt;
&lt;li&gt;A pod is &lt;strong&gt;Burstable&lt;/strong&gt; when it defines at least one CPU or memory request or limit but does not meet the stricter Guaranteed rules.&lt;/li&gt;
&lt;li&gt;A pod is &lt;strong&gt;BestEffort&lt;/strong&gt; when none of its containers define CPU or memory requests or limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This QoS-to-cgroup hierarchy behavior is controlled by kubelet’s &lt;code&gt;--cgroups-per-qos&lt;/code&gt; flag, which &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=%2D%2Dcgroups%2Dper%2Dqos%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0Default%3A%20true" rel="noopener noreferrer"&gt;defaults&lt;/a&gt; to &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When &lt;code&gt;cgroupsPerQOS: true&lt;/code&gt; and systemd manages cgroups on a cgroup v2 node, systemd organizes pods under &lt;code&gt;kubepods.slice&lt;/code&gt; and further into slices based on QoS classes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's inspect the root qos directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -d /sys/fs/cgroup/kubepods.slice/*/"
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/
/sys/fs/cgroup/kubepods-poded2df55a_639e_4beb_aee3_5db422c35910.slice/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Notice the third entry.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It is not a QoS slice like &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt; or &lt;code&gt;kubepods-burstable.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is a pod-level cgroup.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pod...&lt;/code&gt; part maps back to &lt;code&gt;ed2df55a-639e-4beb-aee3-5db422c35910&lt;/code&gt; Kubernetes UID:&lt;/p&gt;

&lt;p&gt;Let's verify which pod owns that UID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -A \
  -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,UID:.metadata.uid' \
  | grep ed2df55a
kube-system   kindnet-qkqvh   ed2df55a-639e-4beb-aee3-5db422c35910
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the third cgroup entry belongs to the &lt;code&gt;kindnet-qkqvh&lt;/code&gt; pod in the &lt;code&gt;kube-system&lt;/code&gt; namespace.&lt;/p&gt;

&lt;p&gt;Now let's verify its QoS class from the Kubernetes API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod kindnet-qkqvh -n kube-system -o jsonpath='{.status.qosClass}{"\n"}'
Guaranteed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we print the QoS class and UID together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod kindnet-qkqvh -n kube-system -o jsonpath='QoS={.status.qosClass}{"\n"}UID={.metadata.uid}{"\n"}'
QoS=Guaranteed
UID=ed2df55a-639e-4beb-aee3-5db422c35910
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;We see the mapping is the cgroup for this pod and that pod is classified by Kubernetes as &lt;code&gt;Guaranteed&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now let's look inside that pod cgroup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/kubepods.slice/kubepods-poded2df55a_639e_4beb_aee3_5db422c35910.slice/"
cri-containerd-7ae5ffd3996a6ac09031cbf283d6bd9727a24bc723a06e76141132a8e57f1716.scope
cri-containerd-d24246f29f54f7adced123bc6194d9e0f15fd3a15c54326cd8c96d39961760c0.scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two &lt;code&gt;cri-containerd-*.scope&lt;/code&gt; entries are the container-level systemd scope units running inside the &lt;code&gt;kindnet-qkqvh&lt;/code&gt; pod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We have traced a &lt;code&gt;Guaranteed&lt;/code&gt; pod all the way down from the Kubernetes API to its pod slice and container scopes on disk.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Simplified to the branch we just inspected, the mapping looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sys/fs/cgroup/
└── kubepods.slice
    └── kubepods-poded2df55a_639e_4beb_aee3_5db422c35910.slice
        ├── cri-containerd-7ae5ffd3996a6ac09031cbf283d6bd9727a24bc723a06e76141132a8e57f1716.scope
        └── cri-containerd-d24246f29f54f7adced123bc6194d9e0f15fd3a15c54326cd8c96d39961760c0.scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Now let’s do the same for our Python workload, which lands in a different part of the hierarchy because it has a different QoS class.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inside the root slice, systemd further organizes pods into separate slices based on their QoS classes.&lt;/p&gt;

&lt;p&gt;Since our Python pod was deployed without any CPU or memory requests or limits, its resources are managed under &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let's confirm the QoS classification of the pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-2kktd -o jsonpath='{.status.qosClass}'
BestEffort
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's map our python pod and containers to their systemd-managed cgroup slices and scopes.&lt;/p&gt;

&lt;p&gt;To achieve this we will get the pod UID to map it to the slice name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-2kktd -o jsonpath='{.metadata.uid}'
b60baa0b-1e66-4990-8670-93c5919f09cb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Each pod gets its own slice under the qos slices and systemd translates hyphens into underscores when creating pod slice directories (&lt;code&gt;kubepods-{qos class}-pod{pod UID with underscores}.slice&lt;/code&gt;).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;List the available pod slices under &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -d /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/*/"
/sys/fs/cgroup/.../kubepods-besteffort-pod740242e7_85e5_4369_a8a0_d6101719e386.slice/
/sys/fs/cgroup/.../kubepods-besteffort-pod857495d4_07b5_45a2_895b_0298f68797d8.slice/
/sys/fs/cgroup/.../kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last pod slice corresponds to our Python pod (its UID matches &lt;code&gt;b60baa0b-1e66-4990-8670-93c5919f09cb&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The other entries are other BestEffort pods on the node, such as kube-system pods like CoreDNS or kube-proxy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Within this pod slice, systemd organizes each container into separate &lt;code&gt;.scope&lt;/code&gt; units.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These scopes are named after the containerd runtime and container ID.&lt;/p&gt;

&lt;p&gt;List the contents of the specific pod slice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls /sys/fs/cgroup/kubepods.slice/\
kubepods-besteffort.slice/kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/ | grep scope"
cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope
cri-containerd-b8609ccf36f85b5a4fc652317358950861a6f0a538e6c4b4c4243241189fbc11.scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The long hex strings above are the container ID, as assigned by containerd.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Systemd appends them to the &lt;code&gt;.scope&lt;/code&gt; unit it creates for each container.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So now the question is: which one of these is your Python container?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We query containerd to match the container ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo crictl ps --name python"
CONTAINER           IMAGE          NAME              POD ID            POD
b21e881ca9d62       bdbec6b439339  python-metrics    b8609ccf36f85     python-66dc9f5c8b-2kktd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The container ID &lt;code&gt;b21e881ca9d62&lt;/code&gt; matches the first &lt;code&gt;.scope&lt;/code&gt; unit above.&lt;/p&gt;

&lt;p&gt;The other one (&lt;code&gt;b8609ccf36f85...&lt;/code&gt;) is the pod sandbox, which is the pause container we will inspect next.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "\
ls -la \
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/\
kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/\
cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope"
cpu.max
hugetlb.2MB.events
memory.high
memory.stat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the hierarchy for the Python pod looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sys/fs/cgroup/
└── kubepods.slice
    └── kubepods-besteffort.slice
        └── kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice
            ├── cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope
            │   └── python-metrics container
            └── cri-containerd-b8609ccf36f85b5a4fc652317358950861a6f0a538e6c4b4c4243241189fbc11.scope
                └── pod sandbox / pause container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;We can now dig into its cgroup resource metrics like memory usage statistics.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /sys/fs/cgroup/kubepods.slice/\
kubepods-besteffort.slice/kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/\
cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope/\
memory.stat" | head -5
anon 9601024
file 13496320
kernel 1056768
kernel_stack 16384
pagetables 94208
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But what about the other scope?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this setup, even a Pod with a single application container has two active container scopes under the pod slice: one for the application container, one for the pause container.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/kubernetes-network-packets"&gt;The pause container is a sandbox environment that sets up the network namespace, IP address, and IPC for the pod.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the sandbox is running and holding that shared environment, Kubernetes starts the Python container inside that namespace.&lt;/p&gt;

&lt;p&gt;Let’s inspect the pod sandbox &lt;code&gt;b8609ccf36f85&lt;/code&gt; to confirm the pause container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo crictl inspectp b8609ccf36f85 | grep image"
"image": "registry.k8s.io/pause:3.10.1",
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The pause container maps to the other &lt;code&gt;.scope&lt;/code&gt; unit, but how can we verify it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We inspect the pod sandbox to retrieve the pause container's PID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo crictl inspectp b8609ccf36f85 | grep -E '\"pid\"'"
"pid": "CONTAINER",
    "pid": 1647,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PID &lt;code&gt;1647&lt;/code&gt; corresponds to the pause container.&lt;/p&gt;

&lt;p&gt;We correlate the PID with the running process and its parent shim:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo ps -e -o pid,ppid,cmd | grep -E '\\b1603\\b|\\b1647\\b'"
1603       1 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id b8609... -address /run/containerd/containerd.sock
1647    1603 /pause
1694    1603 /usr/local/bin/python3 -m http.server 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second scope is the pause container.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;PID &lt;code&gt;1647&lt;/code&gt; is the &lt;code&gt;/pause&lt;/code&gt; process, and it shares the same &lt;code&gt;containerd-shim-runc-v2&lt;/code&gt; parent, PID &lt;code&gt;1603&lt;/code&gt;, with the Python process &lt;code&gt;1694&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Auto-Detecting cgroup Drivers via KubeletCgroupDriverFromCRI
&lt;/h2&gt;

&lt;p&gt;Kubernetes addressed some of the coordination challenges with the &lt;code&gt;KubeletCgroupDriverFromCRI&lt;/code&gt; feature gate, &lt;a href="https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/" rel="noopener noreferrer"&gt;introduced&lt;/a&gt; as alpha in v1.28 and graduated to GA in v1.34.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At startup, kubelet asks the runtime which cgroup driver to use through the CRI &lt;a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers" rel="noopener noreferrer"&gt;&lt;code&gt;RuntimeConfig&lt;/code&gt; RPC&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On Kubernetes 1.34+, the feature gate no longer needs to be set explicitly.&lt;/p&gt;

&lt;p&gt;If the runtime lacks the RuntimeConfig RPC, kubelet falls back to the &lt;code&gt;cgroupDriver&lt;/code&gt; value in its own configuration only in Kubernetes versions that still support this &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#:~:text=for%20more%20details.-,KubeletCgroupDriverFromCRI,-Enable%20detection%20of" rel="noopener noreferrer"&gt;fallback&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's start a new cluster using CRI-O as the container runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start -p test-driverfromcri --container-runtime=cri-o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we inspect the &lt;code&gt;/var/lib/kubelet/config.yaml&lt;/code&gt; file, the kubelet config still shows the configured fallback driver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -p test-driverfromcri -- "sudo cat /var/lib/kubelet/config.yaml | grep -A2 cgroupDriver"
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the CRI runtime does not implement the &lt;code&gt;RuntimeConfig&lt;/code&gt; RPC, kubelet falls back to the configured &lt;code&gt;cgroupDriver&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -p test-driverfromcri -- "sudo journalctl -u kubelet | grep -E 'RuntimeConfig|CRI implementation'"
"RuntimeConfig from runtime service failed" err="rpc error: code = Unimplemented desc = unknown method RuntimeConfig"
"CRI implementation should be updated to support RuntimeConfig. Falling back to using cgroupDriver from kubelet config."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Finally, once kubelet settles on a cgroup driver, it uses that driver consistently when placing pods and containers into the node’s cgroup hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The container runtime then passes the resulting cgroup placement into the OCI runtime layer, where &lt;code&gt;runc/libcontainer&lt;/code&gt; applies it by writing to the kernel’s cgroup interfaces.&lt;/p&gt;

&lt;p&gt;Whether the hierarchy is represented through systemd slices and scopes or raw cgroupfs directories, the end result is the same: the Linux kernel enforces the configured CPU, memory, and other resource limits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw169dgvkvlgcui844b9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw169dgvkvlgcui844b9u.png" alt=" " width="640" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figkecqswil7k88c5gxr9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figkecqswil7k88c5gxr9.png" alt=" " width="640" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have seen both sides: cgroup v1 with direct filesystem-managed hierarchies, and cgroup v2 with systemd-managed slices and scopes.&lt;/p&gt;

&lt;p&gt;But enforcement is only half of the story.&lt;/p&gt;

&lt;p&gt;The kernel exposes raw counters, limits, and events through the cgroup filesystem, but Kubernetes still needs a component that can read those low-level files and turn them into useful container and pod-level metrics.&lt;/p&gt;

&lt;p&gt;That is the visibility gap cAdvisor was designed to fill.&lt;/p&gt;

&lt;h2&gt;
  
  
  cAdvisor: Embedded Resource Monitoring in Kubelet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Container Advisor, or cAdvisor, is the default kubelet-integrated path for &lt;a href="https://kubernetes.io/docs/reference/instrumentation/node-metrics" rel="noopener noreferrer"&gt;collecting&lt;/a&gt; container resource usage statistics on Kubernetes nodes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It runs as an embedded component inside the kubelet process and is initialized automatically when kubelet starts.&lt;/p&gt;

&lt;p&gt;Once initialized, it reads resource usage from the cgroup filesystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cAdvisor reads low-level resource data from the cgroup filesystem and attaches labels such as &lt;code&gt;pod&lt;/code&gt;, &lt;code&gt;namespace&lt;/code&gt;, &lt;code&gt;container&lt;/code&gt;, and &lt;code&gt;image&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubelet then exposes the collected metrics through its own HTTP endpoints: the Summary API and cAdvisor metrics endpoint.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; is enabled and the container runtime supports stats through CRI, kubelet fetches pod and container metrics from the runtime instead of cAdvisor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubelet’s Metrics Endpoints
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kubelet exposes several distinct metrics and stats endpoints on its HTTP server.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each serves a specific purpose and differs in data granularity, format, and source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint exposes high-resolution container metrics in Prometheus format.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These metrics come directly from cAdvisor, and kubelet passes them through as-is to the scraper.&lt;/p&gt;

&lt;p&gt;Prometheus typically scrapes this endpoint to collect detailed per-container metrics such as CPU time, memory usage, and I/O statistics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These metrics are useful for low-level monitoring, fine-grained alerting, and capacity planning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To query the kubelet’s &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint, we first need to establish a local proxy to the Kubernetes API server.&lt;/p&gt;

&lt;p&gt;Run the following command and leave it running on another terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl proxy --port=8001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the proxy forwards local HTTP requests to the kubelet’s API on the node, we can access kubelet HTTP endpoints through &lt;code&gt;http://localhost:8001&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/cadvisor

container_cpu_usage_seconds_total{container="python-metrics",cpu="total",pod="python-66dc9f5c8b-2kktd"} 0.105818
container_memory_usage_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 2.5870336e+07
container_fs_reads_bytes_total{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 1.49504e+07
container_processes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 1
container_spec_cpu_shares{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 2
container_spec_memory_limit_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Related node, pod, container, and volume stats are also available through kubelet’s Summary API on &lt;code&gt;/stats/summary&lt;/code&gt;, which returns structured JSON instead of Prometheus-formatted metrics:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/stats/summary&lt;/code&gt; exposes node, pod, container, and volume stats. Metrics Server v0.6.0 and later use &lt;code&gt;/metrics/resource&lt;/code&gt; &lt;a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server" rel="noopener noreferrer"&gt;for CPU and memory metrics instead&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, to inspect our pod’s resource consumption, we can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS \
  http://localhost:8001/api/v1/nodes/minikube/proxy/stats/summary \
  | jq '.pods[] | select(.podRef.name == "python-66dc9f5c8b-2kktd")'
{
  "podRef": {
    "name": "python-66dc9f5c8b-2kktd",
    "namespace": "default",
    "uid": "b60baa0b-1e66-4990-8670-93c5919f09cb"
  },
  "containers": [
    {
      "name": "python-metrics",
      "cpu": {
        "usageNanoCores": 151695,
        "usageCoreNanoSeconds": 226134000
      },
      "memory": {
        "usageBytes": 25870336,
        "workingSetBytes": 22114304,
        "rssBytes": 9596928,
        "pageFaults": 3346,
        "majorPageFaults": 136
      },
      "rootfs": {
        "usedBytes": 122880
      },
      "logs": {
        "usedBytes": 8192
      },
      "swap": {
        "swapAvailableBytes": 0,
        "swapUsageBytes": 0
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If you only need simplified, high-level metrics, &lt;code&gt;/metrics/resource&lt;/code&gt; serves that role.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It exposes CPU and memory usage in Prometheus format, optimized for lightweight node monitoring.&lt;/p&gt;

&lt;p&gt;We can query this endpoint for aggregated container and pod metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/resource | grep python-metrics
container_cpu_usage_seconds_total{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0.298696 1777623311728
container_memory_working_set_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 2.2114304e+07 1777623311728
container_start_time_seconds{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 1.7776221060112867e+09
container_swap_limit_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0 1777623324188
container_swap_usage_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0 1777623324188
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These metrics provide a point-in-time view of how much CPU and memory the pod and its containers are consuming.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What about if we need to debug kubelet’s performance or runtime interactions?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kubelet exposes its own internal metrics at the &lt;code&gt;/metrics&lt;/code&gt; endpoint.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These metrics include runtime operation durations, event counters, and error rates that reflect how kubelet interacts with the container runtime and manages node resources.&lt;/p&gt;

&lt;p&gt;For instance, if pods take longer to start or containers fail to stop cleanly, reviewing &lt;code&gt;kubelet_runtime_operations_duration_seconds&lt;/code&gt; can reveal latency bottlenecks between kubelet and the runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS \
  http://localhost:8001/api/v1/nodes/minikube/proxy/metrics \
  | grep kubelet_runtime_operations_duration_seconds \
  | tail -n 3
kubelet_runtime_operations_duration_seconds_bucket{operation_type="version",le="+Inf"} 152
kubelet_runtime_operations_duration_seconds_sum{operation_type="version"} 0.12228928199999994
kubelet_runtime_operations_duration_seconds_count{operation_type="version"} 152
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The four kubelet metrics endpoints fit together like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0fbvk98x9zpquz9hgjl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0fbvk98x9zpquz9hgjl.png" alt=" " width="640" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Historically, cAdvisor was Kubernetes’ primary mechanism for container resource monitoring.&lt;/p&gt;

&lt;p&gt;It provided an efficient mechanism for exposing container metrics when workloads were simpler and observability requirements were limited.&lt;/p&gt;

&lt;p&gt;But as Kubernetes matured, a question appeared.&lt;/p&gt;

&lt;p&gt;If kubelet already talks to the container runtime through CRI, why should it always ask cAdvisor to rediscover the same containers from the host filesystem?&lt;/p&gt;

&lt;p&gt;To answer that, we need to look at cAdvisor’s design first.&lt;/p&gt;

&lt;h2&gt;
  
  
  From cAdvisor to CRI: How Kubelet Collects Metrics Today
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Originally, cAdvisor collected container metrics by observing the Linux host directly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That model worked well for the classic Linux container path, where containers were visible through the host’s cgroup hierarchy.&lt;/p&gt;

&lt;p&gt;But Kubernetes later standardized kubelet-to-runtime communication through the Container Runtime Interface (CRI).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CRI is a &lt;a href="https://kubernetes.io/docs/concepts/containers/cri/" rel="noopener noreferrer"&gt;gRPC-based API&lt;/a&gt; that lets kubelet talk to different container runtimes without being tied to a specific runtime implementation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So a natural question appears.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If the runtime already created the containers and already tracks their state, why should kubelet always rely on cAdvisor to rediscover that information from the host?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is the design reason behind the CRI stats path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With this path, kubelet gets pod and container stats directly from the runtime.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That path avoids collecting the same data twice when the runtime already has it.&lt;/p&gt;

&lt;p&gt;It also helps with runtimes where cAdvisor cannot easily see containers from the host.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But how does kubelet achieve that?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We can verify the exact method names directly from the CRI protobuf definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sSL https://raw.githubusercontent.com/kubernetes/cri-api/master/pkg/apis/runtime/v1/api.proto \
  | grep -E 'rpc (ContainerStats|ListContainerStats|PodSandboxStats|ListPodSandboxStats)'
    rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
    rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}
    rpc PodSandboxStats(PodSandboxStatsRequest) returns (PodSandboxStatsResponse) {}
    rpc ListPodSandboxStats(ListPodSandboxStatsRequest) returns (ListPodSandboxStatsResponse) {}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime exposes stats through &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cri_stats_provider.go" rel="noopener noreferrer"&gt;CRI RPC methods&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These calls return structured &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto" rel="noopener noreferrer"&gt;Protobuf&lt;/a&gt; messages containing resource usage data such as CPU, memory, network, process, IO, and per-container stats, depending on the platform and runtime implementation.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; enabled, kubelet can use CRI stats methods such as &lt;code&gt;ListPodSandboxStats&lt;/code&gt;, &lt;code&gt;PodSandboxStats&lt;/code&gt;, and &lt;code&gt;ListContainerStats&lt;/code&gt; to collect pod and container metrics from the runtime.&lt;/p&gt;

&lt;p&gt;Kubelet sends these gRPC requests to the runtime endpoint configured on the node.&lt;/p&gt;

&lt;p&gt;For containerd, that endpoint is commonly &lt;code&gt;/run/containerd/containerd.sock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For CRI-O, it is commonly &lt;code&gt;/var/run/crio/crio.sock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once kubelet receives stats from the runtime, it converts the CRI Protobuf responses into kubelet’s internal stats structures and then exposes the resulting stats.&lt;/p&gt;

&lt;p&gt;But did we bypass cAdvisor completely?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even on the CRI stats path, kubelet can still rely on cAdvisor for node-level and filesystem-related stats that are outside the pod and container stats returned by CRI.&lt;/p&gt;

&lt;p&gt;The two stats paths look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzualz0eracbrj2xcy1ey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzualz0eracbrj2xcy1ey.png" alt=" " width="640" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnyro66bah4q6g9rxhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnyro66bah4q6g9rxhd.png" alt=" " width="640" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Validating CRI-Based Metrics Collection in Kubelet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Now that we understand why Kubernetes shifted metrics collection from cAdvisor to the CRI, let’s validate that kubelet is actually pulling metrics from the runtime.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ll configure kubelet to use CRI-based metrics, confirm it through logs, and compare kubelet’s reported data to what containerd provides directly.&lt;/p&gt;

&lt;p&gt;We start by increasing kubelet’s log verbosity by editing its unit file to pass the &lt;code&gt;--v=5&lt;/code&gt; argument.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the above file, we ensure the &lt;code&gt;ExecStart&lt;/code&gt; line includes the verbose logging flag.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Unit]
Wants=containerd.service

[Service]
ExecStart=
ExecStart=/var/lib/minikube/binaries/v1.34.0/kubelet \
  --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
  --config=/var/lib/kubelet/config.yaml \
  --hostname-override=minikube \
  --kubeconfig=/etc/kubernetes/kubelet.conf \
  --node-ip=192.168.49.2 \
  --v=5

[Install]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we save the configuration, we reload the systemd daemon and restart kubelet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl daemon-reload
sudo systemctl restart kubelet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, validate that the container runtime’s socket is active and listening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ss -lx | grep containerd.sock"
u_str LISTEN 0      4096   /run/containerd/containerd.sock.ttrpc 80566      * 0
u_str LISTEN 0      4096   /run/containerd/containerd.sock 79442            * 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Containerd is exposing its CRI endpoint over &lt;code&gt;/run/containerd/containerd.sock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next, verify kubelet is configured to use the correct runtime endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo cat /var/lib/kubelet/config.yaml | grep -i containerRuntimeEndpoint"
containerRuntimeEndpoint: unix:///run/containerd/containerd.sock
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kubelet is communicating with the correct CRI runtime over the expected UNIX domain socket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's tell kubelet to use the CRI for collecting pod and container stats by enabling the &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; feature gate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before we flip this switch, one thing is worth knowing.&lt;/p&gt;

&lt;p&gt;Kubelet reports the maturity of every feature gate it knows about through the &lt;code&gt;/metrics&lt;/code&gt; endpoint, under the &lt;code&gt;kubernetes_feature_enabled&lt;/code&gt; series.&lt;/p&gt;

&lt;p&gt;Querying that series for &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; on a fresh Kubernetes 1.34 cluster gives us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics \
  | grep 'kubernetes_feature_enabled.*PodAndContainer'

kubernetes_feature_enabled{name="PodAndContainerStatsFromCRI",stage="ALPHA"} 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;stage="ALPHA"&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt; means disabled by default.&lt;/p&gt;

&lt;p&gt;We open kubelet's &lt;code&gt;/var/lib/kubelet/config.yaml&lt;/code&gt; configuration file on the minikube node and add the feature gate and ensure the following block is present:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
featureGates:
  PodAndContainerStatsFromCRI: true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we restart kubelet once more.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl restart kubelet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;At this point, kubelet should be sourcing pod and container metrics directly from containerd over the CRI API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When we inspect the kubelet logs with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo journalctl -u kubelet | grep -i containerstats

May 01 10:27:57 minikube kubelet[4205]: feature gates: {map[PodAndContainerStatsFromCRI:true]}
May 01 10:27:57 minikube kubelet[4205]: "PodAndContainerStatsFromCRI": true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Great!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We see kubelet successfully loads the &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; gate.&lt;/p&gt;

&lt;p&gt;But it's output doesn’t confirm metrics are being retrieved from the runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/stats/summary&lt;/code&gt; is kubelet's primary interface for exposing metrics that it collects, whether from cAdvisor or directly from the container runtime through the CRI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; is enabled, kubelet populates this endpoint with data retrieved from the runtime.&lt;/p&gt;

&lt;p&gt;Let's query &lt;code&gt;/stats/summary&lt;/code&gt; endpoint to observe the metrics kubelet is serving and confirm whether they match what the runtime reports.&lt;/p&gt;

&lt;p&gt;We will start the kubelet proxy first if you haven't already and query the summary stats for our pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl proxy --port=8001
curl -sS \
  http://localhost:8001/api/v1/nodes/minikube/proxy/stats/summary \
  | jq '.pods[] | select(.podRef.name == "python-66dc9f5c8b-2kktd")'
{
  "podRef": {
    "name": "python-66dc9f5c8b-2kktd",
    "namespace": "default"
  },
  "containers": [
    {
      "name": "python-metrics",
      "cpu": {
        "usageNanoCores": 149575,
        "usageCoreNanoSeconds": 1647087000
      },
      "memory": {
        "workingSetBytes": 22114304
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Summary API reports &lt;code&gt;22114304&lt;/code&gt; bytes of memory working set, about &lt;code&gt;22.11 MB&lt;/code&gt;, and &lt;code&gt;149575&lt;/code&gt; nanocores of current CPU usage for the &lt;code&gt;python-metrics&lt;/code&gt; container.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But how do we know kubelet sourced this from containerd, not cAdvisor?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We can cross-check by querying containerd directly with &lt;code&gt;crictl&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But first, we need to confirm the container ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-2kktd -o jsonpath='{.status.containerStatuses[*].containerID}'
containerd://9b508d38b441b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we SSH into the node and run &lt;code&gt;crictl stats&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- sudo crictl stats

CONTAINER           CPU %               MEM                 DISK                INODES
...
5e63e93291a32       0.21                75.7MB              36.86kB             11
62bbd4d869537       0.04                66.93MB             65.54kB             24
6cff256e868f3       0.00                37.74MB             65.54kB             24
9b508d38b441b       0.02                22.11MB             122.9kB             16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;python-metrics&lt;/code&gt; container appears as container ID &lt;code&gt;9b508d38b441b&lt;/code&gt; in &lt;code&gt;crictl stats&lt;/code&gt;, with &lt;code&gt;MEM&lt;/code&gt; reported as &lt;code&gt;22.11MB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That matches the Summary API value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CPU is harder to match exactly because both values are point-in-time samples, but they are consistent: kubelet reports &lt;code&gt;149575&lt;/code&gt; nanocores, and &lt;code&gt;crictl stats&lt;/code&gt; shows &lt;code&gt;0.02%&lt;/code&gt; CPU for the same container.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next, we query kubelet’s &lt;code&gt;/metrics/resource&lt;/code&gt; endpoint to see the Prometheus exposition format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/resource \
  | grep -i "python-66dc9f5c8b-2kktd"

pod_cpu_usage_seconds_total{namespace="default",pod="python-66dc9f5c8b-2kktd"} 1.760035 1777632057760
pod_memory_working_set_bytes{namespace="default",pod="python-66dc9f5c8b-2kktd"} 2.2421504e+07 1777632057760
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, the working set is in the same range across all three views:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/metrics/resource&lt;/code&gt; reports about &lt;code&gt;22.42 MB&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/stats/summary&lt;/code&gt; and &lt;code&gt;crictl stats&lt;/code&gt; report about &lt;code&gt;22.11 MB&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Kubelet sources pod and container metrics directly from containerd through the CRI API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What happens when we check kubelet’s &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/cadvisor
machine_cpu_cores{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 20
machine_cpu_physical_cores{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 14
machine_cpu_sockets{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 1
machine_memory_bytes{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 3.338305536e+10
machine_swap_bytes{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 3.4088153088e+10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Huh!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before enabling the CRI stats path, &lt;code&gt;/metrics/cadvisor&lt;/code&gt; exposed detailed container metrics emitted by cAdvisor and labeled by pod, namespace, container, image, and cgroup path.&lt;/p&gt;

&lt;p&gt;Now, in this run, the endpoint only shows machine-level cAdvisor metrics such as CPU topology, installed memory, swap capacity, and machine scrape status.&lt;/p&gt;

&lt;p&gt;In this run, no pod metrics or container-level data appeared in the &lt;code&gt;/metrics/cadvisor&lt;/code&gt; output.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All the pod and container resource usage?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Those pod and container metrics are now sourced from containerd's CRI stats implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kubernetes does not directly enforce Linux resource limits; the Linux kernel enforces them through cgroups. Kubelet and the container runtime translate pod resource settings into cgroup configuration, then the kernel applies the actual CPU, memory, pids, and related controls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cgroup v2 uses a single unified hierarchy where controllers coexist under &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;. cgroup v1 uses separate controller hierarchies, so controllers such as CPU, memory, and pids can be mounted as separate cgroup trees.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cgroup v1 has been officially deprecated since Kubernetes v1.35. As part of &lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5573-remove-cgroup-v1/README.md" rel="noopener noreferrer"&gt;KEP-5573&lt;/a&gt;, kubelet now fails by default on cgroup v1 nodes unless &lt;code&gt;failCgroupV1&lt;/code&gt; is explicitly set to &lt;code&gt;false&lt;/code&gt;, with full code removal planned no earlier than Kubernetes v1.38.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubelet and the container runtime must use a compatible cgroup driver. With the &lt;code&gt;systemd&lt;/code&gt; driver, kubelet and the runtime place containers under systemd-managed slices; with &lt;code&gt;cgroupfs&lt;/code&gt;, they manage cgroup paths directly. For cgroup v2, Kubernetes strongly recommends the &lt;code&gt;systemd&lt;/code&gt; cgroup driver.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;KubeletCgroupDriverFromCRI&lt;/code&gt; graduated to GA in Kubernetes v1.34. At startup, kubelet asks the runtime for the cgroup driver through the CRI &lt;code&gt;RuntimeConfig&lt;/code&gt; RPC when the runtime supports it; otherwise kubelet falls back to its configured &lt;code&gt;cgroupDriver&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cAdvisor is embedded inside the kubelet process and starts as part of kubelet. By default, kubelet uses cAdvisor to collect node, pod, container, volume, and filesystem statistics, then exposes that data through kubelet HTTP endpoints. There is no separate cAdvisor sidecar or daemon in the normal kubelet setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubelet exposes several metrics and stats endpoints. &lt;code&gt;/metrics/cadvisor&lt;/code&gt; exposes cAdvisor-style container and machine metrics in Prometheus format. &lt;code&gt;/stats/summary&lt;/code&gt; returns structured JSON for node, pod, container, and volume stats. &lt;code&gt;/metrics/resource&lt;/code&gt; exposes lightweight CPU and memory resource metrics used by modern Metrics Server versions. &lt;code&gt;/metrics&lt;/code&gt; exposes kubelet’s own internal component metrics, such as operation counters and latencies. Metrics Server 0.6.x and later &lt;a href="https://kubernetes.io/docs/reference/instrumentation/node-metrics" rel="noopener noreferrer"&gt;query&lt;/a&gt; &lt;code&gt;/metrics/resource&lt;/code&gt;, not &lt;code&gt;/stats/summary&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRI is the gRPC API that standardizes kubelet-to-runtime communication. It lets kubelet manage pods and containers through the runtime, and with compatible runtimes it can also collect pod and container metrics directly from the runtime over the runtime socket.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; is an Alpha feature gate and is disabled by default. When enabled with a compatible runtime, kubelet collects pod and container stats through CRI instead of relying on cAdvisor for those pod and container stats.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even with CRI-based pod and container metrics collection, kubelet still depends on cAdvisor for stats that CRI does not provide, especially node-level, machine-level, volume, and filesystem-related data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/" rel="noopener noreferrer"&gt;Kubernetes 1.25: cgroup v2 graduates to GA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/" rel="noopener noreferrer"&gt;Kubernetes v1.34: KubeletCgroupDriverFromCRI graduates to GA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/" rel="noopener noreferrer"&gt;kube-state-metrics addon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/features/kube_features.go" rel="noopener noreferrer"&gt;pkg/features/kube_features.go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/util.go" rel="noopener noreferrer"&gt;pkg/kubelet/cadvisor/util.go&lt;/a&gt; We're interested in &lt;code&gt;UsingLegacyCadvisorStats&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://minikube.sigs.k8s.io/docs/handbook/config/" rel="noopener noreferrer"&gt;minikube Runtime configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/cri-api" rel="noopener noreferrer"&gt;cri-api&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/cri-api/blob/c75ef5b/pkg/apis/runtime/v1/api.proto" rel="noopener noreferrer"&gt;cri protocol definition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grpc.io/" rel="noopener noreferrer"&gt;gRPC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Mirantis/cri-dockerd" rel="noopener noreferrer"&gt;cri-dockerd adapter for docker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go" rel="noopener noreferrer"&gt;kubelet.go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/cadvisor/blob/master/manager/manager.go" rel="noopener noreferrer"&gt;manager.go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/cadvisor/blob/master/container/raw/handler.go" rel="noopener noreferrer"&gt;raw handler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/architecture/cgroups/" rel="noopener noreferrer"&gt;cgroup v2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/cadvisor/issues/2785" rel="noopener noreferrer"&gt;cAdvisor issues #2785&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md" rel="noopener noreferrer"&gt;cAdvisor-less, CRI-full Container and Pod Stats Enhancement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/reference/instrumentation/cri-pod-container-metrics/" rel="noopener noreferrer"&gt;PodAndContainerStatsFromCRI feature gate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/enhancements/issues/2371" rel="noopener noreferrer"&gt;KEP #2371 tracking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/containerd/containerd/pull/10691" rel="noopener noreferrer"&gt;implement CRI ListPodSandboxMetrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/containerd/containerd/blob/main/docs/cri/config.md" rel="noopener noreferrer"&gt;containerd CRI configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kata-containers/kata-containers/issues/5391" rel="noopener noreferrer"&gt;container-stats exporter to the Kata Containers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>architecture</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
