<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gulcan Topcu</title>
    <description>The latest articles on DEV Community by Gulcan Topcu (@gulcantopcu).</description>
    <link>https://dev.to/gulcantopcu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1234954%2F611389cf-8360-4c57-bb34-8cba2c1124fd.png</url>
      <title>DEV Community: Gulcan Topcu</title>
      <link>https://dev.to/gulcantopcu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gulcantopcu"/>
    <language>en</language>
    <item>
      <title>Kubelet Metrics: How cAdvisor and CRI Collect Kubernetes Stats</title>
      <dc:creator>Gulcan Topcu</dc:creator>
      <pubDate>Thu, 28 May 2026 11:23:15 +0000</pubDate>
      <link>https://dev.to/gulcantopcu/kubelet-metrics-how-cadvisor-and-cri-collect-kubernetes-stats-12kj</link>
      <guid>https://dev.to/gulcantopcu/kubelet-metrics-how-cadvisor-and-cri-collect-kubernetes-stats-12kj</guid>
      <description>&lt;p&gt;This article was originally published on &lt;a href="https://learnkube.com/kubernetes-metrics-cadvisor-kubelet-cri" rel="noopener noreferrer"&gt;LearnKube&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;TL;DR: This article dissects the Kubernetes metrics pipeline through kubelet, cAdvisor, and CRI to show where your metrics actually come from and what breaks when the defaults change.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This article breaks down how Kubernetes collects container, pod, and node metrics, starting with cAdvisor and the Linux kernel, then shifting to a CRI-native model powered by gRPC.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You’ll see how kubelet exposes this data, what happens when you flip &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt;, why container metrics on &lt;code&gt;/metrics/cadvisor&lt;/code&gt; can be sourced from CRI instead of cAdvisor, and how to trace each metric back to its origin.&lt;/p&gt;

&lt;p&gt;It also explains how kubelet talks to the CRI over gRPC, and why understanding this matters if you rely on Prometheus, Grafana, or any observability stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Table of contents&lt;/li&gt;
&lt;li&gt;How Kubernetes Monitoring Layers Stack Up&lt;/li&gt;
&lt;li&gt;Where Metrics Originate&lt;/li&gt;
&lt;li&gt;cgroup v1 with cgroupfs: The Legacy Baseline&lt;/li&gt;
&lt;li&gt;At the crux of how cgroup hierarchy is shaped&lt;/li&gt;
&lt;li&gt;How Kubernetes Creates and Manages the Cgroup Hierarchy&lt;/li&gt;
&lt;li&gt;Kubernetes QoS Classes and cgroup Placement&lt;/li&gt;
&lt;li&gt;Auto-Detecting cgroup Drivers via KubeletCgroupDriverFromCRI&lt;/li&gt;
&lt;li&gt;cAdvisor: Embedded Resource Monitoring in Kubelet&lt;/li&gt;
&lt;li&gt;Kubelet’s Metrics Endpoints&lt;/li&gt;
&lt;li&gt;From cAdvisor to CRI: How Kubelet Collects Metrics Today&lt;/li&gt;
&lt;li&gt;Validating CRI-Based Metrics Collection in Kubelet&lt;/li&gt;
&lt;li&gt;Summary&lt;/li&gt;
&lt;li&gt;References&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Kubernetes Monitoring Layers Stack Up
&lt;/h2&gt;

&lt;p&gt;Kubernetes metrics are the lifeblood of observability in your clusters.&lt;/p&gt;

&lt;p&gt;While tools like Prometheus and Grafana often dominate the monitoring conversation, it's worth understanding the native mechanisms that Kubernetes uses to collect, expose, and leverage metrics before they ever reach those external systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes monitoring works as a multi-layered system which provides insights that span from bare metal to application workloads.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each layer builds upon the previous one to create a comprehensive picture of your cluster's health.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At the foundation sit node-level metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueded29uyz3ajjzltoua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueded29uyz3ajjzltoua.png" alt=" " width="640" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These reveal the utilization of physical and virtual resources like CPU, memory, and disk I/O.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/prometheus/node_exporter" rel="noopener noreferrer"&gt;Prometheus Node Exporter&lt;/a&gt; is commonly used to collect these fundamental metrics, but they originate from the operating system itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One layer up are Kubernetes component metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4o67lhgw09c81dxb43x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4o67lhgw09c81dxb43x.png" alt=" " width="640" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These expose the health and performance of core services such as kubelet, kube-proxy, and the API server.&lt;/p&gt;

&lt;p&gt;Metrics like pod startup latency or API request throughput can tell you whether your control plane is running efficiently and reliably.&lt;/p&gt;

&lt;p&gt;Zooming out to the object layer, &lt;strong&gt;API resource metrics, often surfaced by tools like &lt;a href="https://github.com/kubernetes/kube-state-metrics" rel="noopener noreferrer"&gt;&lt;code&gt;kube-state-metrics&lt;/code&gt;&lt;/a&gt;, offer visibility into Kubernetes objects.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy9jhj3a24e97a8m9eio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy9jhj3a24e97a8m9eio.png" alt=" " width="640" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They track details such as the number of pods in a namespace, deployment status, or the number of services running across your cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finally, at the top layer are pod and container workload metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsvxft0l5d1zj7n6vbos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsvxft0l5d1zj7n6vbos.png" alt=" " width="640" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These focus on the actual performance of your applications.&lt;/p&gt;

&lt;p&gt;This is where critical signals like CPU throttling come into play.&lt;/p&gt;

&lt;p&gt;For instance, knowing how often a container is blocked from using CPU because it's hit its limit can reveal performance bottlenecks that might otherwise remain hidden.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Metrics Originate
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes defines resource requests and limits, but the kernel does the actual enforcement.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It relies on the Linux kernel’s control groups, known as cgroups, to apply those rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64uth4zgbmw5qr44iioa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64uth4zgbmw5qr44iioa.png" alt=" " width="640" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpilgju4imj9jyzzdi658.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpilgju4imj9jyzzdi658.png" alt=" " width="640" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhj94960z92lz2t28993f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhj94960z92lz2t28993f.png" alt=" " width="640" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cgroups are directories in the &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt; virtual filesystem.&lt;/p&gt;

&lt;p&gt;They are a live view of resource allocation and enforcement at the kernel level, exposed as files you can read and write.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These directories define how much CPU time, memory, or I/O bandwidth a process is allowed to consume.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this context, a resource is anything the system can allocate, limit, and monitor: CPU cycles, memory usage, disk throughput, network bandwidth, even the number of process IDs a container can spawn.&lt;/p&gt;

&lt;p&gt;But defining resources is only half of the story.&lt;/p&gt;

&lt;p&gt;That’s where controllers make all the difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/resource_management_guide/br-resource_controllers_in_linux_kernel" rel="noopener noreferrer"&gt;A controller is a kernel component&lt;/a&gt; that enforces resource policies and monitors usage for a specific type of resource.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For every resource, there’s a controller in cgroups that governs it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes7uvjuq4gn58qosc4u5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes7uvjuq4gn58qosc4u5.png" alt=" " width="640" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7wlmirvh0rvetqyjkt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7wlmirvh0rvetqyjkt9.png" alt=" " width="640" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The kernel reads them, applies the rules they define, and keeps every container within its resource boundaries.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Let's start a Minikube cluster with containerd as the container runtime, and deploy a Python pod to see this in action:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start -c containerd
kubectl create deployment python \
  --image=ghcr.io/learnk8s/python-metrics \
  --port=8080 \
  -- /usr/local/bin/python3 -m http.server 8080

kubectl get po -o wide
NAME                      READY   STATUS    IP
python-66dc9f5c8b-w6x4b   1/1     Running   10.244.0.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Linux cgroup API has two versions: cgroup v1 and cgroup v2.&lt;/p&gt;

&lt;p&gt;Each version structures resource management differently.&lt;/p&gt;

&lt;p&gt;To understand why cgroup v2 and the systemd driver matter, it helps to start with the older model first: cgroup v1 with the cgroupfs driver.&lt;/p&gt;

&lt;h2&gt;
  
  
  cgroup v1 with cgroupfs: The Legacy Baseline
&lt;/h2&gt;

&lt;p&gt;In this model, Kubernetes and the container runtime manage cgroups by writing directly to the cgroup filesystem.&lt;/p&gt;

&lt;p&gt;That works, but it also means the hierarchy is shaped by separate controller trees rather than one unified resource tree.&lt;/p&gt;

&lt;p&gt;In cgroup v1, kubelet and the container runtime can still be configured to use either &lt;code&gt;systemd&lt;/code&gt; or &lt;code&gt;cgroupfs&lt;/code&gt;, as long as both sides use the same driver.&lt;/p&gt;

&lt;p&gt;Now let's step into a cgroup v1 environment and see how Kubernetes builds its QoS-based hierarchies when it uses the &lt;code&gt;cgroupfs&lt;/code&gt; driver.&lt;/p&gt;

&lt;p&gt;We’ll delete our existing Minikube cluster and reboot into a system where cgroup v1 is enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;There are several ways to switch a Linux system back to cgroup v1.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You might pass kernel boot parameters like &lt;code&gt;systemd.unified_cgroup_hierarchy=0&lt;/code&gt; or disable cgroup v2 entirely, depending on the environment, whether it’s bare metal, a VM, or WSL2.&lt;/p&gt;

&lt;p&gt;Once the node boots into cgroup v1, Kubernetes automatically detects it and adjusts its resource management behavior.&lt;/p&gt;

&lt;p&gt;First, confirm the system is operating under cgroup v1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stat -fc %T /sys/fs/cgroup/
tmpfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now start a fresh Minikube cluster with the containerd runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start -c containerd
kubectl create deployment python \
  --image=ghcr.io/learnk8s/python-metrics \
  --port=8080 \
  -- /usr/local/bin/python3 -m http.server 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And deploy the Python pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get po -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP
python-66dc9f5c8b-4248r   1/1     Running   0          42s   10.244.0.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we focus on how Kubernetes structures the cgroups under cgroup v1 with the cgroupfs driver.&lt;/p&gt;

&lt;p&gt;Kubernetes enforces QoS-based resource isolation by creating separate hierarchies for each QoS class under every controller.&lt;/p&gt;

&lt;p&gt;We confirm the kubelet configuration to verify this setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl proxy --port=8001 &amp;amp;
curl -X GET http://127.0.0.1:8001/api/v1/nodes/minikube/proxy/configz | jq . | grep -i qos
"cgroupsPerQOS": true,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per-QoS hierarchy creation is enabled, but which driver is kubelet using to manage these hierarchies?:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo cat /var/lib/kubelet/config.yaml | grep -i cgroupDriver"
cgroupDriver: cgroupfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In cgroup v1 with &lt;code&gt;cgroupsPerQOS: true&lt;/code&gt;, kubelet’s use of the &lt;code&gt;cgroupfs&lt;/code&gt; driver results in Kubernetes creating and managing separate cgroup subtrees for QoS classes under each controller.&lt;/p&gt;

&lt;p&gt;Let's inspect the CPU controller directory structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/cpu/kubepods/"
drwxr-xr-x 5 root root 0 Mar 20 12:10 besteffort
drwxr-xr-x 7 root root 0 Mar 20 12:11 burstable
drwxr-xr-x 3 root root 0 Mar 20 12:12 guaranteed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each QoS class gets its own directory under each controller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Since our Python pod was deployed without resource requests, we can locate it under the &lt;code&gt;besteffort&lt;/code&gt; QoS class:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/cpu/kubepods/besteffort/"
drwxr-xr-x 4 root root 0 Mar 20 03:51 pod23e59e27-abe5-4529-bf9c-581516ae0c0b
drwxr-xr-x 4 root root 0 Mar 20 03:51 pod9f874003-a948-425d-a072-f389dc21bdff
drwxr-xr-x 4 root root 0 Mar 20 03:51 podc1d8cd50-b50a-4b3c-a33d-8963242c60ef
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We find multiple pod directories, named by their UID.&lt;/p&gt;

&lt;p&gt;To correlate the pod directory with the actual python pod let's retrieve its UID from the Kubernetes API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-4248r -o jsonpath='{.metadata.uid}'
c1d8cd50-b50a-4b3c-a33d-8963242c60ef
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matches the directory &lt;code&gt;podc1d8cd50-b50a-4b3c-a33d-8963242c60ef&lt;/code&gt; under the &lt;code&gt;besteffort&lt;/code&gt; class.&lt;/p&gt;

&lt;p&gt;Inside this pod directory, each container has its own cgroup, named after the container ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/cpu/kubepods/besteffort/podc1d8cd50-b50a-4b3c-a33d-8963242c60ef/"
-rw-r--r-- 1 root root 0 Mar 20 12:16 cpu.shares
-rw-r--r-- 1 root root 0 Mar 20 12:16 cpu.cfs_quota_us
drwxr-xr-x 2 root root 0 Mar 20 03:52 ef455b35bf7e2afa0942e25b58cd10858d40ed1d97fffe7f0b6a664d2e64aa54
-rw-r--r-- 1 root root 0 Mar 20 04:22 tasks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, we can inspect the pod’s memory limit in the memory controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /sys/fs/cgroup/memory/kubepods/besteffort/\
podc1d8cd50-b50a-4b3c-a33d-8963242c60ef/\
memory.limit_in_bytes"

9223372036854771712
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This very large value is an effectively unlimited memory ceiling, which is expected for a BestEffort pod.&lt;/p&gt;

&lt;p&gt;At this point, kubelet decides where the pod belongs in the QoS hierarchy, the container runtime helps create and configure the container cgroups, and the kernel enforces the resulting cgroup settings for the processes attached to them.&lt;/p&gt;

&lt;h2&gt;
  
  
  At the crux of how cgroup hierarchy is shaped
&lt;/h2&gt;

&lt;p&gt;In cgroup v1, each controller operates in its own separate hierarchy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7f918dfwicy8y7jp7dt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7f918dfwicy8y7jp7dt.png" alt=" " width="640" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we list the mounted cgroup controllers in cgroup v1, we see each one mounted independently as its own filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "mount | grep cgroup"

cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,relatime,pids)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This indicates that each controller, whether CPU, memory, or pids, has its own mount point and hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can confirm this separation by checking &lt;code&gt;/proc/cgroups&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /proc/cgroups"

#subsys_name    hierarchy    num_cgroups    enabled
cpuset          1            34             1
cpu             2            52             1
cpuacct         3            34             1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we check the filesystem type of &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt; in cgroup v1, it reports &lt;code&gt;tmpfs&lt;/code&gt; instead of &lt;code&gt;cgroup2fs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "stat -fc %T /sys/fs/cgroup/"

tmpfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cgroup fs structure looks like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/"

drwxr-xr-x 15 root root   0 Feb 23 05:17 blkio
drwxr-xr-x 15 root root   0 Feb 23 05:17 cpu
drwxr-xr-x  2 root root  40 Feb 23 05:17 cpu,cpuacct
drwxr-xr-x 23 root root   0 Feb 23 05:17 cpuacct
drwxr-xr-x 23 root root   0 Feb 23 05:17 cpuset
drwxr-xr-x 18 root root   0 Feb 23 05:17 devices
drwxr-xr-x 23 root root   0 Feb 23 05:17 freezer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core limitation of cgroup v1: CPU, memory, pids, and other controllers can each have their own hierarchy, so resource management is split across multiple trees.&lt;/p&gt;

&lt;p&gt;cgroup v2 fixes that part by moving controllers into a single unified hierarchy.&lt;/p&gt;

&lt;p&gt;Now let's switch to a cgroup v2 system and examine the structure of the cgroup filesystem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/"

-r--r--r-- 1 root root 0 Apr 28 10:51 cgroup.controllers
-r--r--r-- 1 root root 0 Apr 28 10:58 cgroup.stat
-rw-r--r-- 1 root root 0 Apr 28 10:51 memory.high
drwxr-xr-x 5 root root 0 Apr 28 10:51 kubepods.slice
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;All resource controllers are managed together in a single tree rooted at &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To confirm that cgroup v2 is active, we can inspect the mounted cgroup filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "mount | grep cgroup"

cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can list the active controllers that the kernel has attached to this unified hierarchy by reading &lt;code&gt;/proc/cgroups&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In cgroup v2, all controllers operate within a single hierarchy, and the hierarchy column reflects this by showing &lt;code&gt;0&lt;/code&gt; for each controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /proc/cgroups"

#subsys_name    hierarchy       num_cgroups     enabled
cpu     0       208     1
cpuacct 0       208     1
blkio   0       208     1
devices 0       208     1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To verify the filesystem type for &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;, we can run the &lt;code&gt;stat&lt;/code&gt; utility.&lt;/p&gt;

&lt;p&gt;In cgroup v2, this command reports &lt;code&gt;cgroup2fs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "stat -fc %T /sys/fs/cgroup/"

cgroup2fs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If it shows &lt;code&gt;cgroup2fs&lt;/code&gt;, we know we’re running cgroup v2.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So cgroup v2 cleans up the kernel-side hierarchy, but it does not answer the ownership question by itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhkoajifbkuyg20yazw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhkoajifbkuyg20yazw1.png" alt=" " width="640" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On a systemd-based node, Kubernetes still needs to decide who owns and manages the cgroup tree: systemd or direct filesystem writes through cgroupfs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;cgroup v1 is now only relevant for legacy systems, and its days are officially numbered.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Modern distributions such as &lt;a href="https://discourse.ubuntu.com/t/performance/29416" rel="noopener noreferrer"&gt;Ubuntu 22.04+&lt;/a&gt;, &lt;a href="https://fedoraproject.org/wiki/Changes/CGroupsV2" rel="noopener noreferrer"&gt;Fedora 31+&lt;/a&gt;, and &lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/9.0_release_notes/new-features" rel="noopener noreferrer"&gt;RHEL 9+&lt;/a&gt; enable cgroup v2 by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes has supported cgroup v2 as stable since v1.25, and cgroup v1 has been officially deprecated since Kubernetes v1.35 as part of &lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5573-remove-cgroup-v1/README.md" rel="noopener noreferrer"&gt;KEP-5573&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Starting with Kubernetes v1.35, kubelet no longer starts on cgroup v1 nodes by default unless &lt;code&gt;failCgroupV1&lt;/code&gt; is explicitly set to &lt;code&gt;false&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you’re running production clusters that still use cgroup v1, you should plan a migration to cgroup v2 and define an upgrade or rollback strategy in advance.&lt;/p&gt;

&lt;p&gt;So far, we've seen how cgroup v1 and v2 shape the filesystem layout, and we've learned how to verify which mode the node is using.&lt;/p&gt;

&lt;p&gt;But to understand how Kubernetes actually turns that kernel structure into pod and container boundaries, we now need to look at the two decisions kubelet makes next: which cgroup manager it initializes, and which cgroup driver owns the tree.&lt;/p&gt;

&lt;p&gt;And that is where the cgroup driver comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Kubernetes Creates and Manages the Cgroup Hierarchy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;On a Kubernetes node, kubelet and the container runtime collaborate to build and maintain the cgroup hierarchy used for enforcing pod-level resource constraints.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before either component can create or manage any cgroups, kubelet needs to resolve one fundamental question: &lt;em&gt;is the node running cgroup v1 or cgroup v2?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That answer comes early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At startup, kubelet queries the kernel to determine the active cgroup mode.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If it detects cgroup v2, it initializes a v2-specific manager built for the unified hierarchy.&lt;/p&gt;

&lt;p&gt;If the node is using cgroup v1, it falls back to a legacy manager.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This decision locks in the way kubelet will interact with kernel-level resource controls for the lifetime of the process.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But the cgroup version is only half the equation.&lt;/p&gt;

&lt;p&gt;The other part is who is responsible for actually managing the cgroup tree within &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is called the cgroup driver.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubelet supports two drivers: systemd or cgroupfs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5uw62996wkyivuib8p7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5uw62996wkyivuib8p7.png" alt=" " width="640" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It picks one or the other, never both at the same time.&lt;/p&gt;

&lt;p&gt;In cgroup v2, the unified hierarchy makes the &lt;code&gt;systemd&lt;/code&gt; cgroup driver the recommended choice on systemd-based Linux distributions.&lt;/p&gt;

&lt;p&gt;Kubelet can still be configured to use &lt;code&gt;cgroupfs&lt;/code&gt;, but Kubernetes recommends &lt;a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/" rel="noopener noreferrer"&gt;avoiding&lt;/a&gt; a setup where systemd and Kubernetes manage cgroups separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the driver is systemd, kubelet hands cgroup creation to systemd; instead of writing directories itself, it generates logical slice names like &lt;code&gt;kubepods.slice&lt;/code&gt; or &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These slices represent pod resource groups.&lt;/p&gt;

&lt;p&gt;After generating the slice names, kubelet asks systemd to instantiate and manage the cgroup structure beneath &lt;code&gt;/sys/fs/cgroup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the part cgroup v2 does not solve alone: ownership of the tree needs to be consistent.&lt;/p&gt;

&lt;p&gt;From that point on, all resource controls for pods are expressed through systemd’s unit model.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why systemd?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Because when you boot a modern Linux system, systemd is the first userspace process the kernel runs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It becomes PID 1.&lt;/p&gt;

&lt;p&gt;As PID 1, systemd takes ownership of process supervision and resource control for the entire system.&lt;/p&gt;

&lt;p&gt;Rather than using shell scripts, systemd defines behavior through typed units.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@sebastiancarlos/systemds-nuts-and-bolts-0ae7995e45d3" rel="noopener noreferrer"&gt;Units are structured configuration objects like &lt;code&gt;.service&lt;/code&gt;, &lt;code&gt;.scope&lt;/code&gt;, and &lt;code&gt;.slice&lt;/code&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A slice is how systemd partitions the system for resource control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In Kubernetes slices are automatically created by systemd based on pod QoS classes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foukuf89suljmyyh2peve.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foukuf89suljmyyh2peve.png" alt=" " width="640" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Think of slices like namespaces for CPU and memory budgets, managed for you behind the scenes.&lt;/p&gt;

&lt;p&gt;What matters is you can apply limits at the slice level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Services are the more familiar systemd unit type.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4unkuizghqi6ufmq0oyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4unkuizghqi6ufmq0oyc.png" alt=" " width="640" height="627"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;.service&lt;/code&gt; represents a process that systemd starts and supervises directly.&lt;/p&gt;

&lt;p&gt;On a Kubernetes node, kubelet and containerd usually run as services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubelet.service&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;containerd.service&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These services live under &lt;code&gt;system.slice&lt;/code&gt;, not under &lt;code&gt;kubepods.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That distinction matters: kubelet and containerd are host daemons that coordinate pod placement and container startup, but the containers themselves do not become children of &lt;code&gt;containerd.service&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The actual container processes are placed into Kubernetes pod cgroups under &lt;code&gt;kubepods.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Scopes are different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scopes are used when systemd needs to manage a process it inherits from another launcher and still wants to control.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqci10xkgf6fu0melzrus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqci10xkgf6fu0melzrus.png" alt=" " width="640" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example when the runtime launches a container, systemd can still take over and manage it.&lt;/p&gt;

&lt;p&gt;It does this by wrapping the container process in a &lt;code&gt;.scope&lt;/code&gt; unit.&lt;/p&gt;

&lt;p&gt;Then systemd creates a &lt;code&gt;.scope&lt;/code&gt; unit (such as &lt;code&gt;cri-containerd-&amp;lt;container-id&amp;gt;.scope&lt;/code&gt;) and places it inside an appropriate slice determined by the pod’s quality of service (QoS) class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But this only works if both kubelet and the container runtime agree on the cgroup driver.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If kubelet generates systemd slice names but containerd uses cgroupfs, the contract breaks.&lt;/p&gt;

&lt;p&gt;If the cgroup driver is cgroupfs, kubelet goes back to the older model: direct filesystem ownership.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubelet interacts with the kernel’s cgroup API through the filesystem to create and manage cgroup directories.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s step back into our Minikube cluster running cgroup v2 with containerd as the runtime.&lt;/p&gt;

&lt;p&gt;Containerd handles its end of the driver selection agreement through its &lt;a href="https://github.com/containerd/containerd/blob/main/docs/cri/config.md" rel="noopener noreferrer"&gt;configuration file&lt;/a&gt; in &lt;code&gt;/etc/containerd/config.toml&lt;/code&gt; through the &lt;code&gt;SystemdCgroup&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo cat /etc/containerd/config.toml | grep -i -C2 'SystemdCgroup'"
runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

  [plugins."io.containerd.grpc.v1.cri".cni]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the config version 2 format used by containerd 1.x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once kubelet and the runtime align on both the cgroup version and the driver, kubelet can safely take ownership of building the pod-level cgroup hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But in systemd with cgroup v2, which scope unit goes into which systemd slice?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That’s determined by the pod’s QoS class, which kubelet calculates based on the pod’s resource requests and limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes QoS Classes and cgroup Placement
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Based on the pod’s resource requests and limits, &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/" rel="noopener noreferrer"&gt;Kubernetes assigns it to one of three Quality-of-Service (QoS) classes,&lt;/a&gt; which influences where the pod is placed in the cgroup hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pod is classified as &lt;strong&gt;Guaranteed&lt;/strong&gt; only when every container has CPU and memory requests and limits set, and each request exactly matches its corresponding limit.&lt;/li&gt;
&lt;li&gt;A pod is &lt;strong&gt;Burstable&lt;/strong&gt; when it defines at least one CPU or memory request or limit but does not meet the stricter Guaranteed rules.&lt;/li&gt;
&lt;li&gt;A pod is &lt;strong&gt;BestEffort&lt;/strong&gt; when none of its containers define CPU or memory requests or limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This QoS-to-cgroup hierarchy behavior is controlled by kubelet’s &lt;code&gt;--cgroups-per-qos&lt;/code&gt; flag, which &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=%2D%2Dcgroups%2Dper%2Dqos%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0Default%3A%20true" rel="noopener noreferrer"&gt;defaults&lt;/a&gt; to &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When &lt;code&gt;cgroupsPerQOS: true&lt;/code&gt; and systemd manages cgroups on a cgroup v2 node, systemd organizes pods under &lt;code&gt;kubepods.slice&lt;/code&gt; and further into slices based on QoS classes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's inspect the root qos directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -d /sys/fs/cgroup/kubepods.slice/*/"
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/
/sys/fs/cgroup/kubepods-poded2df55a_639e_4beb_aee3_5db422c35910.slice/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Notice the third entry.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It is not a QoS slice like &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt; or &lt;code&gt;kubepods-burstable.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is a pod-level cgroup.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pod...&lt;/code&gt; part maps back to &lt;code&gt;ed2df55a-639e-4beb-aee3-5db422c35910&lt;/code&gt; Kubernetes UID:&lt;/p&gt;

&lt;p&gt;Let's verify which pod owns that UID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -A \
  -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,UID:.metadata.uid' \
  | grep ed2df55a
kube-system   kindnet-qkqvh   ed2df55a-639e-4beb-aee3-5db422c35910
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the third cgroup entry belongs to the &lt;code&gt;kindnet-qkqvh&lt;/code&gt; pod in the &lt;code&gt;kube-system&lt;/code&gt; namespace.&lt;/p&gt;

&lt;p&gt;Now let's verify its QoS class from the Kubernetes API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod kindnet-qkqvh -n kube-system -o jsonpath='{.status.qosClass}{"\n"}'
Guaranteed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we print the QoS class and UID together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod kindnet-qkqvh -n kube-system -o jsonpath='QoS={.status.qosClass}{"\n"}UID={.metadata.uid}{"\n"}'
QoS=Guaranteed
UID=ed2df55a-639e-4beb-aee3-5db422c35910
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;We see the mapping is the cgroup for this pod and that pod is classified by Kubernetes as &lt;code&gt;Guaranteed&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now let's look inside that pod cgroup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -la /sys/fs/cgroup/kubepods.slice/kubepods-poded2df55a_639e_4beb_aee3_5db422c35910.slice/"
cri-containerd-7ae5ffd3996a6ac09031cbf283d6bd9727a24bc723a06e76141132a8e57f1716.scope
cri-containerd-d24246f29f54f7adced123bc6194d9e0f15fd3a15c54326cd8c96d39961760c0.scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two &lt;code&gt;cri-containerd-*.scope&lt;/code&gt; entries are the container-level systemd scope units running inside the &lt;code&gt;kindnet-qkqvh&lt;/code&gt; pod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We have traced a &lt;code&gt;Guaranteed&lt;/code&gt; pod all the way down from the Kubernetes API to its pod slice and container scopes on disk.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Simplified to the branch we just inspected, the mapping looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sys/fs/cgroup/
└── kubepods.slice
    └── kubepods-poded2df55a_639e_4beb_aee3_5db422c35910.slice
        ├── cri-containerd-7ae5ffd3996a6ac09031cbf283d6bd9727a24bc723a06e76141132a8e57f1716.scope
        └── cri-containerd-d24246f29f54f7adced123bc6194d9e0f15fd3a15c54326cd8c96d39961760c0.scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Now let’s do the same for our Python workload, which lands in a different part of the hierarchy because it has a different QoS class.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inside the root slice, systemd further organizes pods into separate slices based on their QoS classes.&lt;/p&gt;

&lt;p&gt;Since our Python pod was deployed without any CPU or memory requests or limits, its resources are managed under &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let's confirm the QoS classification of the pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-2kktd -o jsonpath='{.status.qosClass}'
BestEffort
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's map our python pod and containers to their systemd-managed cgroup slices and scopes.&lt;/p&gt;

&lt;p&gt;To achieve this we will get the pod UID to map it to the slice name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-2kktd -o jsonpath='{.metadata.uid}'
b60baa0b-1e66-4990-8670-93c5919f09cb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Each pod gets its own slice under the qos slices and systemd translates hyphens into underscores when creating pod slice directories (&lt;code&gt;kubepods-{qos class}-pod{pod UID with underscores}.slice&lt;/code&gt;).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;List the available pod slices under &lt;code&gt;kubepods-besteffort.slice&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls -d /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/*/"
/sys/fs/cgroup/.../kubepods-besteffort-pod740242e7_85e5_4369_a8a0_d6101719e386.slice/
/sys/fs/cgroup/.../kubepods-besteffort-pod857495d4_07b5_45a2_895b_0298f68797d8.slice/
/sys/fs/cgroup/.../kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last pod slice corresponds to our Python pod (its UID matches &lt;code&gt;b60baa0b-1e66-4990-8670-93c5919f09cb&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The other entries are other BestEffort pods on the node, such as kube-system pods like CoreDNS or kube-proxy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Within this pod slice, systemd organizes each container into separate &lt;code&gt;.scope&lt;/code&gt; units.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These scopes are named after the containerd runtime and container ID.&lt;/p&gt;

&lt;p&gt;List the contents of the specific pod slice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ls /sys/fs/cgroup/kubepods.slice/\
kubepods-besteffort.slice/kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/ | grep scope"
cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope
cri-containerd-b8609ccf36f85b5a4fc652317358950861a6f0a538e6c4b4c4243241189fbc11.scope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The long hex strings above are the container ID, as assigned by containerd.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Systemd appends them to the &lt;code&gt;.scope&lt;/code&gt; unit it creates for each container.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So now the question is: which one of these is your Python container?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We query containerd to match the container ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo crictl ps --name python"
CONTAINER           IMAGE          NAME              POD ID            POD
b21e881ca9d62       bdbec6b439339  python-metrics    b8609ccf36f85     python-66dc9f5c8b-2kktd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The container ID &lt;code&gt;b21e881ca9d62&lt;/code&gt; matches the first &lt;code&gt;.scope&lt;/code&gt; unit above.&lt;/p&gt;

&lt;p&gt;The other one (&lt;code&gt;b8609ccf36f85...&lt;/code&gt;) is the pod sandbox, which is the pause container we will inspect next.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "\
ls -la \
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/\
kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/\
cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope"
cpu.max
hugetlb.2MB.events
memory.high
memory.stat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the hierarchy for the Python pod looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/sys/fs/cgroup/
└── kubepods.slice
    └── kubepods-besteffort.slice
        └── kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice
            ├── cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope
            │   └── python-metrics container
            └── cri-containerd-b8609ccf36f85b5a4fc652317358950861a6f0a538e6c4b4c4243241189fbc11.scope
                └── pod sandbox / pause container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;We can now dig into its cgroup resource metrics like memory usage statistics.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "cat /sys/fs/cgroup/kubepods.slice/\
kubepods-besteffort.slice/kubepods-besteffort-podb60baa0b_1e66_4990_8670_93c5919f09cb.slice/\
cri-containerd-b21e881ca9d6228281aa32cb1e2ebba5537f2a7b90e860a2f0cc6afec3305229.scope/\
memory.stat" | head -5
anon 9601024
file 13496320
kernel 1056768
kernel_stack 16384
pagetables 94208
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But what about the other scope?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this setup, even a Pod with a single application container has two active container scopes under the pod slice: one for the application container, one for the pause container.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/kubernetes-network-packets"&gt;The pause container is a sandbox environment that sets up the network namespace, IP address, and IPC for the pod.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the sandbox is running and holding that shared environment, Kubernetes starts the Python container inside that namespace.&lt;/p&gt;

&lt;p&gt;Let’s inspect the pod sandbox &lt;code&gt;b8609ccf36f85&lt;/code&gt; to confirm the pause container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo crictl inspectp b8609ccf36f85 | grep image"
"image": "registry.k8s.io/pause:3.10.1",
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The pause container maps to the other &lt;code&gt;.scope&lt;/code&gt; unit, but how can we verify it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We inspect the pod sandbox to retrieve the pause container's PID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo crictl inspectp b8609ccf36f85 | grep -E '\"pid\"'"
"pid": "CONTAINER",
    "pid": 1647,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PID &lt;code&gt;1647&lt;/code&gt; corresponds to the pause container.&lt;/p&gt;

&lt;p&gt;We correlate the PID with the running process and its parent shim:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo ps -e -o pid,ppid,cmd | grep -E '\\b1603\\b|\\b1647\\b'"
1603       1 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id b8609... -address /run/containerd/containerd.sock
1647    1603 /pause
1694    1603 /usr/local/bin/python3 -m http.server 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second scope is the pause container.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;PID &lt;code&gt;1647&lt;/code&gt; is the &lt;code&gt;/pause&lt;/code&gt; process, and it shares the same &lt;code&gt;containerd-shim-runc-v2&lt;/code&gt; parent, PID &lt;code&gt;1603&lt;/code&gt;, with the Python process &lt;code&gt;1694&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Auto-Detecting cgroup Drivers via KubeletCgroupDriverFromCRI
&lt;/h2&gt;

&lt;p&gt;Kubernetes addressed some of the coordination challenges with the &lt;code&gt;KubeletCgroupDriverFromCRI&lt;/code&gt; feature gate, &lt;a href="https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/" rel="noopener noreferrer"&gt;introduced&lt;/a&gt; as alpha in v1.28 and graduated to GA in v1.34.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At startup, kubelet asks the runtime which cgroup driver to use through the CRI &lt;a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers" rel="noopener noreferrer"&gt;&lt;code&gt;RuntimeConfig&lt;/code&gt; RPC&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On Kubernetes 1.34+, the feature gate no longer needs to be set explicitly.&lt;/p&gt;

&lt;p&gt;If the runtime lacks the RuntimeConfig RPC, kubelet falls back to the &lt;code&gt;cgroupDriver&lt;/code&gt; value in its own configuration only in Kubernetes versions that still support this &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#:~:text=for%20more%20details.-,KubeletCgroupDriverFromCRI,-Enable%20detection%20of" rel="noopener noreferrer"&gt;fallback&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's start a new cluster using CRI-O as the container runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start -p test-driverfromcri --container-runtime=cri-o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we inspect the &lt;code&gt;/var/lib/kubelet/config.yaml&lt;/code&gt; file, the kubelet config still shows the configured fallback driver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -p test-driverfromcri -- "sudo cat /var/lib/kubelet/config.yaml | grep -A2 cgroupDriver"
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the CRI runtime does not implement the &lt;code&gt;RuntimeConfig&lt;/code&gt; RPC, kubelet falls back to the configured &lt;code&gt;cgroupDriver&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -p test-driverfromcri -- "sudo journalctl -u kubelet | grep -E 'RuntimeConfig|CRI implementation'"
"RuntimeConfig from runtime service failed" err="rpc error: code = Unimplemented desc = unknown method RuntimeConfig"
"CRI implementation should be updated to support RuntimeConfig. Falling back to using cgroupDriver from kubelet config."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Finally, once kubelet settles on a cgroup driver, it uses that driver consistently when placing pods and containers into the node’s cgroup hierarchy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The container runtime then passes the resulting cgroup placement into the OCI runtime layer, where &lt;code&gt;runc/libcontainer&lt;/code&gt; applies it by writing to the kernel’s cgroup interfaces.&lt;/p&gt;

&lt;p&gt;Whether the hierarchy is represented through systemd slices and scopes or raw cgroupfs directories, the end result is the same: the Linux kernel enforces the configured CPU, memory, and other resource limits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw169dgvkvlgcui844b9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw169dgvkvlgcui844b9u.png" alt=" " width="640" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figkecqswil7k88c5gxr9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figkecqswil7k88c5gxr9.png" alt=" " width="640" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have seen both sides: cgroup v1 with direct filesystem-managed hierarchies, and cgroup v2 with systemd-managed slices and scopes.&lt;/p&gt;

&lt;p&gt;But enforcement is only half of the story.&lt;/p&gt;

&lt;p&gt;The kernel exposes raw counters, limits, and events through the cgroup filesystem, but Kubernetes still needs a component that can read those low-level files and turn them into useful container and pod-level metrics.&lt;/p&gt;

&lt;p&gt;That is the visibility gap cAdvisor was designed to fill.&lt;/p&gt;

&lt;h2&gt;
  
  
  cAdvisor: Embedded Resource Monitoring in Kubelet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Container Advisor, or cAdvisor, is the default kubelet-integrated path for &lt;a href="https://kubernetes.io/docs/reference/instrumentation/node-metrics" rel="noopener noreferrer"&gt;collecting&lt;/a&gt; container resource usage statistics on Kubernetes nodes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It runs as an embedded component inside the kubelet process and is initialized automatically when kubelet starts.&lt;/p&gt;

&lt;p&gt;Once initialized, it reads resource usage from the cgroup filesystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cAdvisor reads low-level resource data from the cgroup filesystem and attaches labels such as &lt;code&gt;pod&lt;/code&gt;, &lt;code&gt;namespace&lt;/code&gt;, &lt;code&gt;container&lt;/code&gt;, and &lt;code&gt;image&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubelet then exposes the collected metrics through its own HTTP endpoints: the Summary API and cAdvisor metrics endpoint.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; is enabled and the container runtime supports stats through CRI, kubelet fetches pod and container metrics from the runtime instead of cAdvisor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubelet’s Metrics Endpoints
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kubelet exposes several distinct metrics and stats endpoints on its HTTP server.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each serves a specific purpose and differs in data granularity, format, and source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint exposes high-resolution container metrics in Prometheus format.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These metrics come directly from cAdvisor, and kubelet passes them through as-is to the scraper.&lt;/p&gt;

&lt;p&gt;Prometheus typically scrapes this endpoint to collect detailed per-container metrics such as CPU time, memory usage, and I/O statistics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These metrics are useful for low-level monitoring, fine-grained alerting, and capacity planning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To query the kubelet’s &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint, we first need to establish a local proxy to the Kubernetes API server.&lt;/p&gt;

&lt;p&gt;Run the following command and leave it running on another terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl proxy --port=8001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the proxy forwards local HTTP requests to the kubelet’s API on the node, we can access kubelet HTTP endpoints through &lt;code&gt;http://localhost:8001&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/cadvisor

container_cpu_usage_seconds_total{container="python-metrics",cpu="total",pod="python-66dc9f5c8b-2kktd"} 0.105818
container_memory_usage_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 2.5870336e+07
container_fs_reads_bytes_total{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 1.49504e+07
container_processes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 1
container_spec_cpu_shares{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 2
container_spec_memory_limit_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Related node, pod, container, and volume stats are also available through kubelet’s Summary API on &lt;code&gt;/stats/summary&lt;/code&gt;, which returns structured JSON instead of Prometheus-formatted metrics:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/stats/summary&lt;/code&gt; exposes node, pod, container, and volume stats. Metrics Server v0.6.0 and later use &lt;code&gt;/metrics/resource&lt;/code&gt; &lt;a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server" rel="noopener noreferrer"&gt;for CPU and memory metrics instead&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, to inspect our pod’s resource consumption, we can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS \
  http://localhost:8001/api/v1/nodes/minikube/proxy/stats/summary \
  | jq '.pods[] | select(.podRef.name == "python-66dc9f5c8b-2kktd")'
{
  "podRef": {
    "name": "python-66dc9f5c8b-2kktd",
    "namespace": "default",
    "uid": "b60baa0b-1e66-4990-8670-93c5919f09cb"
  },
  "containers": [
    {
      "name": "python-metrics",
      "cpu": {
        "usageNanoCores": 151695,
        "usageCoreNanoSeconds": 226134000
      },
      "memory": {
        "usageBytes": 25870336,
        "workingSetBytes": 22114304,
        "rssBytes": 9596928,
        "pageFaults": 3346,
        "majorPageFaults": 136
      },
      "rootfs": {
        "usedBytes": 122880
      },
      "logs": {
        "usedBytes": 8192
      },
      "swap": {
        "swapAvailableBytes": 0,
        "swapUsageBytes": 0
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If you only need simplified, high-level metrics, &lt;code&gt;/metrics/resource&lt;/code&gt; serves that role.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It exposes CPU and memory usage in Prometheus format, optimized for lightweight node monitoring.&lt;/p&gt;

&lt;p&gt;We can query this endpoint for aggregated container and pod metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/resource | grep python-metrics
container_cpu_usage_seconds_total{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0.298696 1777623311728
container_memory_working_set_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 2.2114304e+07 1777623311728
container_start_time_seconds{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 1.7776221060112867e+09
container_swap_limit_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0 1777623324188
container_swap_usage_bytes{container="python-metrics",pod="python-66dc9f5c8b-2kktd"} 0 1777623324188
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These metrics provide a point-in-time view of how much CPU and memory the pod and its containers are consuming.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What about if we need to debug kubelet’s performance or runtime interactions?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kubelet exposes its own internal metrics at the &lt;code&gt;/metrics&lt;/code&gt; endpoint.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These metrics include runtime operation durations, event counters, and error rates that reflect how kubelet interacts with the container runtime and manages node resources.&lt;/p&gt;

&lt;p&gt;For instance, if pods take longer to start or containers fail to stop cleanly, reviewing &lt;code&gt;kubelet_runtime_operations_duration_seconds&lt;/code&gt; can reveal latency bottlenecks between kubelet and the runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS \
  http://localhost:8001/api/v1/nodes/minikube/proxy/metrics \
  | grep kubelet_runtime_operations_duration_seconds \
  | tail -n 3
kubelet_runtime_operations_duration_seconds_bucket{operation_type="version",le="+Inf"} 152
kubelet_runtime_operations_duration_seconds_sum{operation_type="version"} 0.12228928199999994
kubelet_runtime_operations_duration_seconds_count{operation_type="version"} 152
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The four kubelet metrics endpoints fit together like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0fbvk98x9zpquz9hgjl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0fbvk98x9zpquz9hgjl.png" alt=" " width="640" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Historically, cAdvisor was Kubernetes’ primary mechanism for container resource monitoring.&lt;/p&gt;

&lt;p&gt;It provided an efficient mechanism for exposing container metrics when workloads were simpler and observability requirements were limited.&lt;/p&gt;

&lt;p&gt;But as Kubernetes matured, a question appeared.&lt;/p&gt;

&lt;p&gt;If kubelet already talks to the container runtime through CRI, why should it always ask cAdvisor to rediscover the same containers from the host filesystem?&lt;/p&gt;

&lt;p&gt;To answer that, we need to look at cAdvisor’s design first.&lt;/p&gt;

&lt;h2&gt;
  
  
  From cAdvisor to CRI: How Kubelet Collects Metrics Today
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Originally, cAdvisor collected container metrics by observing the Linux host directly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That model worked well for the classic Linux container path, where containers were visible through the host’s cgroup hierarchy.&lt;/p&gt;

&lt;p&gt;But Kubernetes later standardized kubelet-to-runtime communication through the Container Runtime Interface (CRI).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CRI is a &lt;a href="https://kubernetes.io/docs/concepts/containers/cri/" rel="noopener noreferrer"&gt;gRPC-based API&lt;/a&gt; that lets kubelet talk to different container runtimes without being tied to a specific runtime implementation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So a natural question appears.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If the runtime already created the containers and already tracks their state, why should kubelet always rely on cAdvisor to rediscover that information from the host?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is the design reason behind the CRI stats path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With this path, kubelet gets pod and container stats directly from the runtime.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That path avoids collecting the same data twice when the runtime already has it.&lt;/p&gt;

&lt;p&gt;It also helps with runtimes where cAdvisor cannot easily see containers from the host.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But how does kubelet achieve that?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We can verify the exact method names directly from the CRI protobuf definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sSL https://raw.githubusercontent.com/kubernetes/cri-api/master/pkg/apis/runtime/v1/api.proto \
  | grep -E 'rpc (ContainerStats|ListContainerStats|PodSandboxStats|ListPodSandboxStats)'
    rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
    rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}
    rpc PodSandboxStats(PodSandboxStatsRequest) returns (PodSandboxStatsResponse) {}
    rpc ListPodSandboxStats(ListPodSandboxStatsRequest) returns (ListPodSandboxStatsResponse) {}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime exposes stats through &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cri_stats_provider.go" rel="noopener noreferrer"&gt;CRI RPC methods&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These calls return structured &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto" rel="noopener noreferrer"&gt;Protobuf&lt;/a&gt; messages containing resource usage data such as CPU, memory, network, process, IO, and per-container stats, depending on the platform and runtime implementation.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; enabled, kubelet can use CRI stats methods such as &lt;code&gt;ListPodSandboxStats&lt;/code&gt;, &lt;code&gt;PodSandboxStats&lt;/code&gt;, and &lt;code&gt;ListContainerStats&lt;/code&gt; to collect pod and container metrics from the runtime.&lt;/p&gt;

&lt;p&gt;Kubelet sends these gRPC requests to the runtime endpoint configured on the node.&lt;/p&gt;

&lt;p&gt;For containerd, that endpoint is commonly &lt;code&gt;/run/containerd/containerd.sock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For CRI-O, it is commonly &lt;code&gt;/var/run/crio/crio.sock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once kubelet receives stats from the runtime, it converts the CRI Protobuf responses into kubelet’s internal stats structures and then exposes the resulting stats.&lt;/p&gt;

&lt;p&gt;But did we bypass cAdvisor completely?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even on the CRI stats path, kubelet can still rely on cAdvisor for node-level and filesystem-related stats that are outside the pod and container stats returned by CRI.&lt;/p&gt;

&lt;p&gt;The two stats paths look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzualz0eracbrj2xcy1ey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzualz0eracbrj2xcy1ey.png" alt=" " width="640" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnyro66bah4q6g9rxhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnyro66bah4q6g9rxhd.png" alt=" " width="640" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Validating CRI-Based Metrics Collection in Kubelet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Now that we understand why Kubernetes shifted metrics collection from cAdvisor to the CRI, let’s validate that kubelet is actually pulling metrics from the runtime.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ll configure kubelet to use CRI-based metrics, confirm it through logs, and compare kubelet’s reported data to what containerd provides directly.&lt;/p&gt;

&lt;p&gt;We start by increasing kubelet’s log verbosity by editing its unit file to pass the &lt;code&gt;--v=5&lt;/code&gt; argument.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the above file, we ensure the &lt;code&gt;ExecStart&lt;/code&gt; line includes the verbose logging flag.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Unit]
Wants=containerd.service

[Service]
ExecStart=
ExecStart=/var/lib/minikube/binaries/v1.34.0/kubelet \
  --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
  --config=/var/lib/kubelet/config.yaml \
  --hostname-override=minikube \
  --kubeconfig=/etc/kubernetes/kubelet.conf \
  --node-ip=192.168.49.2 \
  --v=5

[Install]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we save the configuration, we reload the systemd daemon and restart kubelet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl daemon-reload
sudo systemctl restart kubelet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, validate that the container runtime’s socket is active and listening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "ss -lx | grep containerd.sock"
u_str LISTEN 0      4096   /run/containerd/containerd.sock.ttrpc 80566      * 0
u_str LISTEN 0      4096   /run/containerd/containerd.sock 79442            * 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Containerd is exposing its CRI endpoint over &lt;code&gt;/run/containerd/containerd.sock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next, verify kubelet is configured to use the correct runtime endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- "sudo cat /var/lib/kubelet/config.yaml | grep -i containerRuntimeEndpoint"
containerRuntimeEndpoint: unix:///run/containerd/containerd.sock
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kubelet is communicating with the correct CRI runtime over the expected UNIX domain socket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's tell kubelet to use the CRI for collecting pod and container stats by enabling the &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; feature gate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before we flip this switch, one thing is worth knowing.&lt;/p&gt;

&lt;p&gt;Kubelet reports the maturity of every feature gate it knows about through the &lt;code&gt;/metrics&lt;/code&gt; endpoint, under the &lt;code&gt;kubernetes_feature_enabled&lt;/code&gt; series.&lt;/p&gt;

&lt;p&gt;Querying that series for &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; on a fresh Kubernetes 1.34 cluster gives us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics \
  | grep 'kubernetes_feature_enabled.*PodAndContainer'

kubernetes_feature_enabled{name="PodAndContainerStatsFromCRI",stage="ALPHA"} 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;stage="ALPHA"&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt; means disabled by default.&lt;/p&gt;

&lt;p&gt;We open kubelet's &lt;code&gt;/var/lib/kubelet/config.yaml&lt;/code&gt; configuration file on the minikube node and add the feature gate and ensure the following block is present:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
featureGates:
  PodAndContainerStatsFromCRI: true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we restart kubelet once more.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl restart kubelet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;At this point, kubelet should be sourcing pod and container metrics directly from containerd over the CRI API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When we inspect the kubelet logs with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo journalctl -u kubelet | grep -i containerstats

May 01 10:27:57 minikube kubelet[4205]: feature gates: {map[PodAndContainerStatsFromCRI:true]}
May 01 10:27:57 minikube kubelet[4205]: "PodAndContainerStatsFromCRI": true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Great!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We see kubelet successfully loads the &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; gate.&lt;/p&gt;

&lt;p&gt;But it's output doesn’t confirm metrics are being retrieved from the runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/stats/summary&lt;/code&gt; is kubelet's primary interface for exposing metrics that it collects, whether from cAdvisor or directly from the container runtime through the CRI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; is enabled, kubelet populates this endpoint with data retrieved from the runtime.&lt;/p&gt;

&lt;p&gt;Let's query &lt;code&gt;/stats/summary&lt;/code&gt; endpoint to observe the metrics kubelet is serving and confirm whether they match what the runtime reports.&lt;/p&gt;

&lt;p&gt;We will start the kubelet proxy first if you haven't already and query the summary stats for our pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl proxy --port=8001
curl -sS \
  http://localhost:8001/api/v1/nodes/minikube/proxy/stats/summary \
  | jq '.pods[] | select(.podRef.name == "python-66dc9f5c8b-2kktd")'
{
  "podRef": {
    "name": "python-66dc9f5c8b-2kktd",
    "namespace": "default"
  },
  "containers": [
    {
      "name": "python-metrics",
      "cpu": {
        "usageNanoCores": 149575,
        "usageCoreNanoSeconds": 1647087000
      },
      "memory": {
        "workingSetBytes": 22114304
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Summary API reports &lt;code&gt;22114304&lt;/code&gt; bytes of memory working set, about &lt;code&gt;22.11 MB&lt;/code&gt;, and &lt;code&gt;149575&lt;/code&gt; nanocores of current CPU usage for the &lt;code&gt;python-metrics&lt;/code&gt; container.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But how do we know kubelet sourced this from containerd, not cAdvisor?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We can cross-check by querying containerd directly with &lt;code&gt;crictl&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But first, we need to confirm the container ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pod python-66dc9f5c8b-2kktd -o jsonpath='{.status.containerStatuses[*].containerID}'
containerd://9b508d38b441b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we SSH into the node and run &lt;code&gt;crictl stats&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube ssh -- sudo crictl stats

CONTAINER           CPU %               MEM                 DISK                INODES
...
5e63e93291a32       0.21                75.7MB              36.86kB             11
62bbd4d869537       0.04                66.93MB             65.54kB             24
6cff256e868f3       0.00                37.74MB             65.54kB             24
9b508d38b441b       0.02                22.11MB             122.9kB             16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;python-metrics&lt;/code&gt; container appears as container ID &lt;code&gt;9b508d38b441b&lt;/code&gt; in &lt;code&gt;crictl stats&lt;/code&gt;, with &lt;code&gt;MEM&lt;/code&gt; reported as &lt;code&gt;22.11MB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That matches the Summary API value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CPU is harder to match exactly because both values are point-in-time samples, but they are consistent: kubelet reports &lt;code&gt;149575&lt;/code&gt; nanocores, and &lt;code&gt;crictl stats&lt;/code&gt; shows &lt;code&gt;0.02%&lt;/code&gt; CPU for the same container.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next, we query kubelet’s &lt;code&gt;/metrics/resource&lt;/code&gt; endpoint to see the Prometheus exposition format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/resource \
  | grep -i "python-66dc9f5c8b-2kktd"

pod_cpu_usage_seconds_total{namespace="default",pod="python-66dc9f5c8b-2kktd"} 1.760035 1777632057760
pod_memory_working_set_bytes{namespace="default",pod="python-66dc9f5c8b-2kktd"} 2.2421504e+07 1777632057760
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, the working set is in the same range across all three views:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/metrics/resource&lt;/code&gt; reports about &lt;code&gt;22.42 MB&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/stats/summary&lt;/code&gt; and &lt;code&gt;crictl stats&lt;/code&gt; report about &lt;code&gt;22.11 MB&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Kubelet sources pod and container metrics directly from containerd through the CRI API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What happens when we check kubelet’s &lt;code&gt;/metrics/cadvisor&lt;/code&gt; endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sS http://localhost:8001/api/v1/nodes/minikube/proxy/metrics/cadvisor
machine_cpu_cores{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 20
machine_cpu_physical_cores{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 14
machine_cpu_sockets{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 1
machine_memory_bytes{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 3.338305536e+10
machine_swap_bytes{machine_id="a5b246...",system_uuid="7bd5a1e2-ea5e-452b-a202-536452caf458"} 3.4088153088e+10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Huh!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before enabling the CRI stats path, &lt;code&gt;/metrics/cadvisor&lt;/code&gt; exposed detailed container metrics emitted by cAdvisor and labeled by pod, namespace, container, image, and cgroup path.&lt;/p&gt;

&lt;p&gt;Now, in this run, the endpoint only shows machine-level cAdvisor metrics such as CPU topology, installed memory, swap capacity, and machine scrape status.&lt;/p&gt;

&lt;p&gt;In this run, no pod metrics or container-level data appeared in the &lt;code&gt;/metrics/cadvisor&lt;/code&gt; output.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All the pod and container resource usage?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Those pod and container metrics are now sourced from containerd's CRI stats implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kubernetes does not directly enforce Linux resource limits; the Linux kernel enforces them through cgroups. Kubelet and the container runtime translate pod resource settings into cgroup configuration, then the kernel applies the actual CPU, memory, pids, and related controls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cgroup v2 uses a single unified hierarchy where controllers coexist under &lt;code&gt;/sys/fs/cgroup/&lt;/code&gt;. cgroup v1 uses separate controller hierarchies, so controllers such as CPU, memory, and pids can be mounted as separate cgroup trees.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cgroup v1 has been officially deprecated since Kubernetes v1.35. As part of &lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5573-remove-cgroup-v1/README.md" rel="noopener noreferrer"&gt;KEP-5573&lt;/a&gt;, kubelet now fails by default on cgroup v1 nodes unless &lt;code&gt;failCgroupV1&lt;/code&gt; is explicitly set to &lt;code&gt;false&lt;/code&gt;, with full code removal planned no earlier than Kubernetes v1.38.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubelet and the container runtime must use a compatible cgroup driver. With the &lt;code&gt;systemd&lt;/code&gt; driver, kubelet and the runtime place containers under systemd-managed slices; with &lt;code&gt;cgroupfs&lt;/code&gt;, they manage cgroup paths directly. For cgroup v2, Kubernetes strongly recommends the &lt;code&gt;systemd&lt;/code&gt; cgroup driver.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;KubeletCgroupDriverFromCRI&lt;/code&gt; graduated to GA in Kubernetes v1.34. At startup, kubelet asks the runtime for the cgroup driver through the CRI &lt;code&gt;RuntimeConfig&lt;/code&gt; RPC when the runtime supports it; otherwise kubelet falls back to its configured &lt;code&gt;cgroupDriver&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cAdvisor is embedded inside the kubelet process and starts as part of kubelet. By default, kubelet uses cAdvisor to collect node, pod, container, volume, and filesystem statistics, then exposes that data through kubelet HTTP endpoints. There is no separate cAdvisor sidecar or daemon in the normal kubelet setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kubelet exposes several metrics and stats endpoints. &lt;code&gt;/metrics/cadvisor&lt;/code&gt; exposes cAdvisor-style container and machine metrics in Prometheus format. &lt;code&gt;/stats/summary&lt;/code&gt; returns structured JSON for node, pod, container, and volume stats. &lt;code&gt;/metrics/resource&lt;/code&gt; exposes lightweight CPU and memory resource metrics used by modern Metrics Server versions. &lt;code&gt;/metrics&lt;/code&gt; exposes kubelet’s own internal component metrics, such as operation counters and latencies. Metrics Server 0.6.x and later &lt;a href="https://kubernetes.io/docs/reference/instrumentation/node-metrics" rel="noopener noreferrer"&gt;query&lt;/a&gt; &lt;code&gt;/metrics/resource&lt;/code&gt;, not &lt;code&gt;/stats/summary&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRI is the gRPC API that standardizes kubelet-to-runtime communication. It lets kubelet manage pods and containers through the runtime, and with compatible runtimes it can also collect pod and container metrics directly from the runtime over the runtime socket.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;PodAndContainerStatsFromCRI&lt;/code&gt; is an Alpha feature gate and is disabled by default. When enabled with a compatible runtime, kubelet collects pod and container stats through CRI instead of relying on cAdvisor for those pod and container stats.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even with CRI-based pod and container metrics collection, kubelet still depends on cAdvisor for stats that CRI does not provide, especially node-level, machine-level, volume, and filesystem-related data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/" rel="noopener noreferrer"&gt;Kubernetes 1.25: cgroup v2 graduates to GA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/" rel="noopener noreferrer"&gt;Kubernetes v1.34: KubeletCgroupDriverFromCRI graduates to GA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/" rel="noopener noreferrer"&gt;kube-state-metrics addon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/features/kube_features.go" rel="noopener noreferrer"&gt;pkg/features/kube_features.go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/util.go" rel="noopener noreferrer"&gt;pkg/kubelet/cadvisor/util.go&lt;/a&gt; We're interested in &lt;code&gt;UsingLegacyCadvisorStats&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://minikube.sigs.k8s.io/docs/handbook/config/" rel="noopener noreferrer"&gt;minikube Runtime configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/cri-api" rel="noopener noreferrer"&gt;cri-api&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/cri-api/blob/c75ef5b/pkg/apis/runtime/v1/api.proto" rel="noopener noreferrer"&gt;cri protocol definition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grpc.io/" rel="noopener noreferrer"&gt;gRPC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Mirantis/cri-dockerd" rel="noopener noreferrer"&gt;cri-dockerd adapter for docker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go" rel="noopener noreferrer"&gt;kubelet.go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/cadvisor/blob/master/manager/manager.go" rel="noopener noreferrer"&gt;manager.go&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/cadvisor/blob/master/container/raw/handler.go" rel="noopener noreferrer"&gt;raw handler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/architecture/cgroups/" rel="noopener noreferrer"&gt;cgroup v2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/cadvisor/issues/2785" rel="noopener noreferrer"&gt;cAdvisor issues #2785&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md" rel="noopener noreferrer"&gt;cAdvisor-less, CRI-full Container and Pod Stats Enhancement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/reference/instrumentation/cri-pod-container-metrics/" rel="noopener noreferrer"&gt;PodAndContainerStatsFromCRI feature gate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/enhancements/issues/2371" rel="noopener noreferrer"&gt;KEP #2371 tracking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/containerd/containerd/pull/10691" rel="noopener noreferrer"&gt;implement CRI ListPodSandboxMetrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/containerd/containerd/blob/main/docs/cri/config.md" rel="noopener noreferrer"&gt;containerd CRI configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kata-containers/kata-containers/issues/5391" rel="noopener noreferrer"&gt;container-stats exporter to the Kata Containers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>architecture</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Hacking Alibaba Cloud's Kubernetes Cluster</title>
      <dc:creator>Gulcan Topcu</dc:creator>
      <pubDate>Tue, 02 Jul 2024 06:34:16 +0000</pubDate>
      <link>https://dev.to/gulcantopcu/hacking-alibaba-clouds-kubernetes-cluster-ofp</link>
      <guid>https://dev.to/gulcantopcu/hacking-alibaba-clouds-kubernetes-cluster-ofp</guid>
      <description>&lt;p&gt;Hacking Alibaba Cloud's Kubernetes Cluster with Hillai Ben-Sasson &amp;amp;Ronen Shustin, Security Researchers at Wiz and Bart Farrell, KubeFM Host&lt;/p&gt;

&lt;p&gt;Securing Kubernetes clusters is one of the toughest challenges in cloud security, but for Ronen Shustin and Hillai Ben-Sasson at Wiz, it's just another day at work. These top-tier researchers are fearless in diving into the deep end. Their latest exploit? Cracking Alibaba Cloud's Kubernetes clusters through clever PostgreSQL vulnerabilities.&lt;/p&gt;

&lt;p&gt;Join Bart Farell as he dives into how their innovative approach identifies vulnerabilities and enhances the overall security of cloud ecosystems.&lt;/p&gt;

&lt;p&gt;You can watch (or listen to) this interview &lt;a href="https://kube.fm/hacking-alibaba-ronen-hillai" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: What are three emerging Kubernetes or other tools that you're keeping an eye on?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Ronen and I have extensive knowledge of Kubernetes, but our expertise only originates from working directly with Kubernetes. We're hackers who transitioned into Kubernetes hacking, not Kubernetes experts who started hacking. So, we need to familiarize ourselves with many Kubernetes tools. Most of the tools we know are those we've encountered and exploited during our engagements. Therefore, we might not be the best sources for the latest Kubernetes tools, but we are excited about ongoing Kubernetes research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Are there any specific tools or infrastructure that you particularly like?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Instead of specific tools, we're more interested in infrastructure elements like service meshes. From an attacker's perspective, engaging with these is quite fascinating. Currently, we need to mention standout tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; For those unfamiliar, can you tell us more about your roles and what you do at&lt;a href="https://www.wiz.io/" rel="noopener noreferrer"&gt; Wiz&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Ronen and I work at Wiz, a cloud security company, as part of the vulnerability research team. We focus on researching primary cloud services and providers like Azure, GCP, AWS, and more. We utilize their open&lt;a href="https://en.wikipedia.org/wiki/Bug_bounty_program" rel="noopener noreferrer"&gt; bug bounty programs&lt;/a&gt; to find and report vulnerabilities. By sharing our findings, we aim to enhance the security of the cloud community, not just for our clients but for everyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Is hacking cloud environments your primary focus, or is this a specialized area within security research?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; It's unique. We didn't start with cloud environments. We began as general security researchers, focusing on hacking techniques. Over time, we transitioned into specializing in cloud security. Our research aims to discover innovative ways attackers might exploit cloud systems, ultimately leading to more secure cloud environments for everyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; How has your hacking experience influenced your approach to Kubernetes security? Did you discover any exciting findings during this research?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Many cloud providers rely on Kubernetes and container technology to manage their services efficiently. Traditionally, setting up individual virtual or physical machines for each customer would only be scalable for some companies. Containers offer a more efficient way to manage large infrastructures. Focusing on cloud environments, we discovered Kubernetes as the go-to tool for&lt;a href="https://www.alibabacloud.com/" rel="noopener noreferrer"&gt; Alibaba Cloud&lt;/a&gt; and companies like IBM. Our journey started with cloud security research and ultimately led us to specialize in Kubernetes security within that domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Our initial focus was on container security. We researched container escapes and other vulnerabilities that might impact containers. This research naturally led us to Kubernetes, as many infrastructures we encountered used it. We had to learn Kubernetes and develop specific techniques to achieve our goals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; If you could go back in time and share one career tip with your younger self, what would it be?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Always follow your curiosity. Research is all about pursuing leads and hunches. We were curious about cloud security, even though we didn't start in that field. It became popular, and we wanted to explore this new area. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; What resources do you use to stay updated on Kubernetes?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; I rely on technical documents the most. I also follow blogs from cloud providers, mainly the&lt;a href="https://www.cncf.io/blog/" rel="noopener noreferrer"&gt; CNCF blog&lt;/a&gt;, because they have valuable information. I use The Kubernetes community on Twitter to learn about new features and technologies; they are highly active there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Additionally, I recommend Reddit. Many communities focused on security, Kubernetes, and cloud computing offer great content. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; We came across an article about how you hacked Alibaba Cloud's Kubernetes cluster and&lt;a href="https://www.youtube.com/watch?v=d81qnGKv4EE" rel="noopener noreferrer"&gt; a talk you gave at KubeCon&lt;/a&gt;. What motivated you to do this research, and did your company support you?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Our company supports security research. At Wiz, we focus on cloud security research, often utilizing&lt;a href="https://en.wikipedia.org/wiki/Offensive_Security" rel="noopener noreferrer"&gt; offensive security&lt;/a&gt; methodologies. We act like attackers to find vulnerabilities and then report them to the vendors. By identifying vulnerabilities, we can report them to the cloud providers and prevent actual attacks. Alibaba Cloud is just one example of this engagement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Our research often leads us to discover new hacking techniques we need to learn about. We share these discoveries with everyone so they can protect themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; One of our previous guests talked about Kubernetes secrets management and&lt;a href="https://owasp.org/www-community/Threat_Modeling" rel="noopener noreferrer"&gt; threat modelling&lt;/a&gt;. How do you approach exploiting vulnerabilities from a hacker's perspective?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt;Our best security insights come from working with different applications, frameworks, and cloud systems. When we engage with one, our primary goal is to find critical security mistakes in its setup. To do this, we must fully understand how the system works and where attackers might discover weaknesses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; There's an interesting difference between traditional and cloud security research. In traditional research, the goal is often to achieve "Remote Code Execution" (&lt;a href="https://en.wikipedia.org/wiki/Remote_code_execution" rel="noopener noreferrer"&gt;RCE&lt;/a&gt;) on a specific application, which means taking control of a machine and running unauthorized code. However, in the cloud, things are different. Since you often have access to a virtual machine yourself, RCE becomes less attractive.&lt;/p&gt;

&lt;p&gt;The real challenge in cloud security lies in breaching the barriers between different customers. Unlike traditional environments, the cloud is a shared space with hundreds of thousands of users. Our focus is to demonstrate the possibility of attackers moving between these customers, even without data access. This risk highlights a unique cloud security risk - the potential for attackers to "jump" from one user to another and compromise their information. This type of research, proving a breach of trust without actually stealing data, is a crucial aspect of cloud security and something rarely seen in traditional security research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; When starting this research, why did you choose Alibaba Cloud? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Our initial study focused on&lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt; PostgreSQL&lt;/a&gt;. Since many cloud providers offer managed PostgreSQL instances, we were interested in how they handle the infrastructure. We discovered vulnerabilities that allowed us to execute code on these instances. We tested several providers, including Alibaba, and presented our findings at&lt;a href="https://www.blackhat.com/us-23/briefings/schedule#bingbang-hacking-bingcom-and-much-more-with-azure-active-directory-33206" rel="noopener noreferrer"&gt; the Black Hat talk&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai&lt;/strong&gt;: We began with PostgreSQL and expanded to Alibaba and other cloud providers. Our&lt;a href="https://www.wiz.io/blog/the-cloud-has-an-isolation-problem-postgresql-vulnerabilities" rel="noopener noreferrer"&gt; blog post&lt;/a&gt; provides more details about PostgreSQL and our Black Hat talk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Why did you choose to focus on PostgreSQL for your research?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; PostgreSQL is a robust database with many features, including the ability to execute code within the database. While this capability can benefit certain users, it poses a potential security risk in cloud environments.&lt;/p&gt;

&lt;p&gt;Cloud providers typically modify PostgreSQL to prevent users from executing code on their managed instances. However, our research identified vulnerabilities in these modifications, not in the core PostgreSQL code itself. We were able to exploit these vulnerabilities to bypass the restrictions and still execute code on the managed databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; How does PostgreSQL relate to Kubernetes in this context? Did you find a way to access a Kubernetes cluster by exploiting the PostgreSQL vulnerabilities?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Cloud providers often use containers and orchestration tools like Kubernetes to manage large-scale services, including PostgreSQL. This approach allows them to offer these services to many customers efficiently. While exploiting the PostgreSQL vulnerabilities, we discovered that we were actually in a Kubernetes environment. The user interface typically abstracts away the underlying infrastructure from the user, but our research methods disclosed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; We've seen various infrastructures, but Alibaba and IBM used Kubernetes for their managed services. Other providers might use different implementations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Security experts often talk about avoiding vulnerabilities caused by misconfigurations, which can be human errors. What were the biggest misconfigurations you found that created security risks?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai&lt;/strong&gt;: The biggest misconfiguration we found is treating containers as the only security barrier. It's important to remember that containers can be a security layer within a more extensive security system, but they should be relied on only partially. Containers alone wouldn't be strong enough to isolate each company's data from each other entirely because any security flaw in the core Linux system (the kernel) could bypass container security. We were able to exploit such misconfigurations during our research.&lt;/p&gt;

&lt;p&gt;Another problem is poorly managed secrets within the Kubernetes environment. These secrets could read information across the system and write and change it, which meant we could overwrite software packages used by many cloud services and customer accounts within Alibaba. Essentially, these powerful secrets allowed someone to access different environments, services, and customer data—all with a single key. That's a significant security risk we wouldn't recommend taking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; The specific secret we found was the&lt;a href="https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod" rel="noopener noreferrer"&gt; image pull secret&lt;/a&gt;. In Kubernetes, when you want to download images from a private registry, you need this secret to configure network access. If you misconfigure it, you might accidentally include a secret key with push permissions instead of pull permissions. This key should only allow downloading images, not uploading them. If an attacker gains access to a key with push permissions (like what we achieved in Alibaba), it could have devastating consequences for your entire environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: To those without a strong background in security, it may seem that security experts click a button, scan your system, and find vulnerabilities. However, security research, like many other fields, is a blend of art and science. Can you elaborate on this further?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Security research requires a lot of creativity. When you hear about a new attack vector, it boils down to creative thinking - coming up with something no one else has considered. In this research, we started by looking for patterns we already knew were risky, like overly permissive settings and shared volumes. We had to think outside the box. Returning to the Alibaba Cloud control panel, we began experimenting. This exploration led us to a breakthrough when we discovered a button enabling SSL encryption for the PostgreSQL instance. Clicking it triggered new activity in the container, which we followed to escape the container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; To help our audience understand, could you explain&lt;a href="https://en.wikipedia.org/wiki/Secure_copy_protocol" rel="noopener noreferrer"&gt; SCP&lt;/a&gt;, its role in the attack, and how you exploited it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; SCP stands for Secure Copy. It's a standard tool on Linux systems that transfers files between machines using secure SSH connections. In our case, the SSL encryption feature we triggered used a new Alibaba management container. This container ran the SCP command on our container to move the SSL certificate.&lt;/p&gt;

&lt;p&gt;SCP reads its configuration from a directory we control within our container by default. We placed a malicious SSH configuration file there. When the SCP command loaded this configuration, it ran a command we placed within the file. This trick let us escape our limited container and jump to the Alibaba Management Container because it unknowingly executed our command.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; A crucial factor in this exploit was the shared volume. This volume acted like a shared home directory for our container and the management container since the same user existed in both containers. We could exploit this shared space because SCP reads its configuration from the user's home directory by default. By replacing the default configuration with ours containing a malicious command, we tricked the management container into running it when it used SCP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; What does successfully creating a&lt;a href="https://kubernetes.io/docs/concepts/policy/pod-security-policy#privileged" rel="noopener noreferrer"&gt; privileged container&lt;/a&gt; using the&lt;a href="https://docs.docker.com/engine/api" rel="noopener noreferrer"&gt; Docker API&lt;/a&gt; tell us about cloud security in general?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Many cloud environments rely on Docker to manage their containers. You can create a new container through an HTTP request if you gain access to the Docker API socket. This container could be privileged, meaning it shares resources like namespaces and possibly even volumes with the underlying host machine, the Kubernetes node. Spawning a privileged container grants you access to almost everything the node has access to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; You transition from being a guest in the container to gaining complete control of the host machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Gainin access to the node would only give you control of some of the Kubernetes clusters, would it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; With code execution on the node, we could use&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet" rel="noopener noreferrer"&gt; Kubelet&lt;/a&gt; credentials to explore further, looking for commands, codes, secrets, and other information. In our case, Alibaba had misconfigured its Kubelet credentials: it was too powerful. We could list all pods, see all the code in the cluster, potentially containing customer data, and even retrieve all the secrets using the "kubectl get secret" command. This misconfiguration was the key that unlocked broader access for us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Did you achieve the entire exploit on a single node within the cluster?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Yes, we were on a single node. Using the compromised Kubelet credentials, we could see all the other nodes and resources in the cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; While the specific node we compromised was isolated and didn't contain data from other customers, the service account associated with Kubelet had excessive permissions. Even though the node itself was secure, this service account allowed us to access sensitive information across the entire cluster, including pods, nodes, and secrets belonging to other customers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; What was the next step after taking over Alibaba's managed PostgreSQL offering? Did you contact Alibaba to report your findings?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Once we discovered the ability to access data belonging to other customers, our research stopped immediately. We wouldn't risk even accidentally accessing someone else's data. At that point, we documented everything we found and sent a detailed report to Alibaba Cloud, and they responded quickly and professionally. They kept us updated on the fixes they deployed throughout the research process. We immediately report any critical issues to prevent others from exploiting them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Can you tell us about any specific fixes they implemented based on your findings?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; The first issue was a misconfiguration that falsely indicated increased resource consumption. We exploited it to execute unauthorized code on the operating system. We collaborated with Alibaba Cloud to fix this problem. They also resolved the SCP vulnerability problem that allowed unauthorized access to their management container. Finally, they restricted the Kubelet permissions to a narrower scope, granting only specific permissions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Following our research, Alibaba took several steps to address the vulnerabilities we discovered. They limited image pull secret permissions to read-only access, preventing unauthorized uploads. Additionally, they implemented a secure container technology similar to Google's&lt;a href="https://gvisor.dev" rel="noopener noreferrer"&gt; gVisor&lt;/a&gt; project. This technology hardens containers and makes them more difficult to escape from, adding another layer of security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Throughout this process, what key lessons did you learn?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; There are two main lessons learned. First, containers shouldn't be relied on as the sole security barrier. While they can be a layer of security, they can be bypassed in various ways. Additional precautions are crucial to ensure proper isolation between customers. We recommend building a layered defense so that a single vulnerability doesn't allow unauthorized access to a competitor company's data.&lt;/p&gt;

&lt;p&gt;Second, strong credentials require careful management. As Ronen mentioned, Alibaba originally had a powerful secret that could be read and written across the cluster. This secret also had push access to the central Docker image registry. Following our report, they limited the scope of these credentials. It's essential to be very cautious with such powerful secrets. Ideally, you should scope the secrets to specific actions and minimize them whenever possible. A powerful secret can allow attackers to move across different environments, including production, development, testing, and even development workstations.&lt;/p&gt;

&lt;p&gt;Another lesson learned relates to the container itself. The SCP vulnerability we exploited highlights the risk of shared namespaces between containers. In the Alibaba incident, the shared namespace and home directory allowed us to exploit the SCP vulnerability. Always be very careful when sharing namespaces between trusted and untrusted containers. The lesson learned is to minimize what you share and never grant unnecessary permissions. Attackers may exploit even seemingly minor misconfigurations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Can you recommend any specific tools that people might need to be aware of if they want to discuss implementing some of these mitigation tactics with their managers?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; There's one framework I highly recommend:&lt;a href="http://peach" rel="noopener noreferrer"&gt; Peach&lt;/a&gt;. It's an open-source project developed by our research team and contributions from fantastic people at many companies.&lt;/p&gt;

&lt;p&gt;Peach is a framework that outlines how to build secure and isolated environments, whether in the cloud or not. Like a white paper, it's a valuable resource that guides you on properly isolating tenants or customers in a multi-tenant environment. It covers common mistakes to avoid, what to look out for, and how to implement the necessary precautions.&lt;/p&gt;

&lt;p&gt;If you manage a multi-tenant environment or need to isolate resources within your environment, Peach is a valuable resource worth exploring. It covers the common mistakes to avoid and offers best practices for implementing protection. It's completely open-source and available on&lt;a href="https://github.com/wiz-sec/peach" rel="noopener noreferrer"&gt; GitHub&lt;/a&gt;. We also welcome contributions from anyone with additional tips or tricks we might need to know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; I also recommend using secret scanning tools. These tools are essential in our research; we use them to identify potential secrets-related vulnerabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; Do you have any recommendations for securing multi-tenant Kubernetes clusters?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Securing multi-tenant Kubernetes clusters involves a few key areas. First, prioritize network security. By default, Kubernetes doesn't restrict node communication, so strong network isolation is essential.&lt;/p&gt;

&lt;p&gt;Second, separating namespaces between customers is a good practice when dealing with multi-tenancy.&lt;/p&gt;

&lt;p&gt;Additionally, consider implementing container security technologies like gVisor or&lt;a href="https://katacontainers.io/" rel="noopener noreferrer"&gt; Kata Containers&lt;/a&gt;. Don't solely rely on Docker's security features to prevent container escapes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; What advice would you give for hardening containers to make them more secure?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Our case study with Alibaba revealed they were using shared Linux namespaces between containers, such as their management container and our container. Sharing Linux namespaces can be dangerous. When designing a system that shares namespaces or resources between management and regular user containers, constantly carefully assess and be aware of the risks involved. Container technologies like GVisor and&lt;a href="https://katacontainers.io/" rel="noopener noreferrer"&gt; Kata Containers&lt;/a&gt; can mitigate the risk of attackers exploiting Linux kernel vulnerabilities in your environment to achieve kernel-level code execution and jump to the Kubernetes node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; What advice would you give to Kubernetes engineers needing more security experience?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Security is crucial. Companies of all sizes, from startups to large corporations, are constantly targeted by malicious actors, not just ethical hackers like us. Anyone managing a service on the internet must understand that they are a potential target for cyberattacks. These attacks range from data breaches to ransomware attacks that turn off your entire operation. Even small projects need to pay more attention to security.&lt;/p&gt;

&lt;p&gt;The good news is that many tools can help you achieve security without being a security expert. Tools like gVisor are relatively easy to implement because you don't need to write them from scratch. By using security hardening tools, you gain significant protection benefits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Besides the tools, many online resources are available to learn about security. These resources can help you understand security risks and how to mitigate them. Kubernetes itself has built-in security features, including default security policies. Be security-conscious and take steps to secure your environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; You discover a vulnerability and report it to the vendor. What prevents you from exploiting the vulnerability for malicious purposes instead? Wouldn't Alibaba eventually find the problem on its own?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; We started seeing signs that Alibaba was taking steps to address the issue while we were still in the research phase. They were transparent with us about their efforts. Cloud providers all have security teams that constantly monitor their environments. They likely knew we were there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; Cloud providers are doing a great job with security. We're ethical hackers; our goal is to improve security for the cloud community. Penetration testing, or offensive research, is a tool to achieve that goal. We want to fix the vulnerabilities, and it's rewarding to hear that our reports lead to security updates that benefit many customers. We do this to make cloud products more secure and help users learn how to secure their deployments.&lt;/p&gt;

&lt;p&gt;We publish blogs and give talks so that security professionals and developers can learn from our research and identify potential problems in their environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; What's next on the agenda for you both?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; We're always working on new research projects.&lt;a href="https://www.wiz.io/authors/sagi" rel="noopener noreferrer"&gt; Sagi&lt;/a&gt; from our team recently published a blog about a vulnerability in&lt;a href="https://www.wiz.io/blog/wiz-and-hugging-face-address-risks-to-ai-infrastructure" rel="noopener noreferrer"&gt; Hugging Face&lt;/a&gt;, an AI provider. We have several ongoing projects under disclosure, meaning we can only reveal them once we fix the vulnerabilities.&lt;/p&gt;

&lt;p&gt;Follow our blog; it's the first place we announce new findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ronen:&lt;/strong&gt; Our research will benefit the Kubernetes security community as well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart:&lt;/strong&gt; How can people contact you if they have questions?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hillai:&lt;/strong&gt; We're both on Twitter. My handle is&lt;a href="https://x.com/hillai" rel="noopener noreferrer"&gt; @hillai&lt;/a&gt;, and Ronen's is&lt;a href="https://x.com/RonenSHH" rel="noopener noreferrer"&gt; @RonenSHH&lt;/a&gt;. You can also email us at &lt;a href="mailto:research@wiz.io"&gt;research@wiz.io&lt;/a&gt;, but Twitter is the best way. Make sure to spell the names correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrap up&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you enjoyed this interview and want more Kubernetes stories and opinions, visit&lt;a href="https://kube.fm/" rel="noopener noreferrer"&gt; KubeFM&lt;/a&gt; and subscribe to the podcast.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you want to keep up-to-date with Kubernetes, subscribe to&lt;a href="https://learnk8s.io/learn-kubernetes-weekly" rel="noopener noreferrer"&gt; Learn Kubernetes Weekly&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you're going to become an expert in Kubernetes, look at courses on&lt;a href="https://learnk8s.io/training" rel="noopener noreferrer"&gt; Learnk8s&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you want to keep in touch, follow me on&lt;a href="https://www.linkedin.com/in/gulcantopcu/" rel="noopener noreferrer"&gt; Linkedin&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>cloudcomputing</category>
      <category>cybersecurity</category>
      <category>hacking</category>
    </item>
    <item>
      <title>eBPF, sidecars, and the future of the service mesh</title>
      <dc:creator>Gulcan Topcu</dc:creator>
      <pubDate>Fri, 07 Jun 2024 07:58:53 +0000</pubDate>
      <link>https://dev.to/gulcantopcu/ebpf-sidecars-and-the-future-of-the-service-mesh-32ad</link>
      <guid>https://dev.to/gulcantopcu/ebpf-sidecars-and-the-future-of-the-service-mesh-32ad</guid>
      <description>&lt;p&gt;Kubernetes and service meshes may seem complex, but not for William Morgan, an engineer-turned-CEO who excels at simplifying the intricacies. In this enlightening podcast, he shares his journey from AI to the cloud-native world with Bart Farrell. &lt;/p&gt;

&lt;p&gt;Discover William's cost-saving strategies for service meshes, gain insights into the ongoing debate between sidecars, Ambient Mesh, and Cilium Cluster Mesh, his surprising connection to Twitter's early days and unique perspective on balancing tech expertise with the humility of being a piano student.&lt;/p&gt;

&lt;p&gt;You can watch (or listen) to this interview &lt;a href="https://kube.fm/service-mesh-william" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Imagine you've just set up a fresh Kubernetes cluster. What's your go-to trio for the first tools to install?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: My first pick would be &lt;a href="https://linkerd.io/" rel="noopener noreferrer"&gt;Linkerd&lt;/a&gt;. It's a must-have for any Kubernetes cluster. I then lean towards tools that complement Linkerd, like &lt;a href="https://argo-cd.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;Argo &lt;/a&gt;and &lt;a href="https://cert-manager.io/" rel="noopener noreferrer"&gt;cert-manager&lt;/a&gt;. You're off to a solid start with these three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Cert Manager and Argo are popular choices, especially in the GitOps domain. What about &lt;a href="https://fluxcd.io/" rel="noopener noreferrer"&gt;Flux&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: Flux would work just fine. I don't have a strong preference between the two. Flux and Argo are great options, especially for tasks like progressive delivery. When paired with Linkerd, they provide a robust safety net for rolling out new code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: As the CEO, who are you accountable to? Could you elaborate on your role and responsibilities?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: Being a CEO is an exciting shift from my previous role as an engineer. I work for myself, and I must say, I’m a demanding boss. As a CEO, I focus on the big picture and align everyone toward a common goal. These are the two skills I’ve had to develop rapidly since transitioning from an engineer, where my primary concern was writing and maintaining code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: From a technical perspective, how did you transition into the cloud-native space? What were you doing before it became mainstream?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: My early career was primarily focused on AI, &lt;a href="https://en.wikipedia.org/wiki/https://en.wikipedia.org/wiki/Natural_language_processing" rel="noopener noreferrer"&gt;NLP&lt;/a&gt;, and machine learning long before they became trendy. I thought I’d enter academia but realized I enjoyed coding more than research. &lt;/p&gt;

&lt;p&gt;I worked at several Bay Area startups, mainly in NLP and machine learning roles. I was part of a company called PowerSet, which was building a natural language processing engine and was acquired by Microsoft. I then joined Twitter in its early days, around 2010, when it had about 200 employees. I started on the AI side but transitioned to infrastructure because I found it more satisfying and challenging. We were doing what I now describe at Twitter as cloud-native, even though the terminology differed. We didn’t have Kubernetes or Docker, but we had &lt;a href="https://mesos.apache.org/" rel="noopener noreferrer"&gt;Mesos&lt;/a&gt;, the JVM for isolation, and cgroups for a basic form of containerization. We transitioned from a monolithic Ruby on Rails service to a massive microservices deployment. When I left Twitter, we tried to apply those same ideas to the emerging world of Kubernetes and Docker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How do you keep up with the rapid changes in the Kubernetes and cloud-native ecosystems, especially transitioning from infrastructure and AI/NLP?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: My current role primarily shapes my strategy. I learn a lot from the engineers and users of &lt;a href="https://www.reddit.com/r/linkerd/new/" rel="noopener noreferrer"&gt;Linkerd&lt;/a&gt;, who are at the forefront of these technologies. I also keep myself updated by reading discussions on Reddit platforms like &lt;a href="https://www.reddit.com/r/kubernetes/" rel="noopener noreferrer"&gt;r/kubernetes&lt;/a&gt; and &lt;a href="https://www.reddit.com/r/linkerd/new/" rel="noopener noreferrer"&gt;r/Linkerd&lt;/a&gt;. Occasionally, I contribute to or follow discussions on &lt;a href="https://news.ycombinator.com/" rel="noopener noreferrer"&gt;Hacker News&lt;/a&gt;. Overall, my primary source of knowledge comes from the experts I work with daily, giving me valuable insights into the latest developments. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: If you could return to your time at Twitter or even before that, what one tip would you give yourself?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: I'd tell myself to prioritize impact. As an engineer, I was obsessed with building and exploring new technologies, which was rewarding. However, I later understood the value of stepping back to see where I could make a real difference in the company. Transitioning my focus to high-impact areas, such as infrastructure at Twitter, was a turning point. Despite my passion for NLP, I realized that infrastructure was where I could truly shine. Always look for opportunities where you can make the most significant impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Let’s focus on "&lt;a href="https://www.techtarget.com/searchitoperations/news/365535362/Sidecarless-eBPF-service-mesh-sparks-debate" rel="noopener noreferrer"&gt;Sidecarless eBPF Service Mesh Sparks Debate&lt;/a&gt;," which follows up on your previous article “&lt;a href="https://buoyant.io/blog/ebpf-sidecars-and-the-future-of-the-service-mesh" rel="noopener noreferrer"&gt;eBPF, sidecars, and the future of the service mesh&lt;/a&gt;.” You're one of the creators of Linkerd. For those unfamiliar, what exactly is a service mesh? Why would someone need it, and what value does it add? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: There are two ways to describe service mesh: what it does and how it works. Service mesh is an additional layer for Kubernetes that enhances key areas Kubernetes doesn't fully address. &lt;/p&gt;

&lt;p&gt;The first area is security. It ensures all connections in your cluster are encrypted, authorized, and authenticated. You can set policies based on services, gRPC methods, or HTTP routes, like allowing Service A to talk to /foo but not /bar. &lt;/p&gt;

&lt;p&gt;The second area is reliability. It enables graceful failovers, transparent traffic shifting between clusters, and progressive delivery. For example, deploying new code and gradually increasing traffic to it to avoid immediate production traffic. It also includes mechanisms like load balancing, circuit breaking, retries, and timeouts.&lt;/p&gt;

&lt;p&gt;The last area is observability. It provides uniform metrics for all workloads across all services, such as success rates, latency distribution, and traffic volume. Importantly, it does this without requiring changes to your application code. &lt;/p&gt;

&lt;p&gt;The most prevalent method today involves using many proxies. This approach has become feasible thanks to technological advancements like Kubernetes and containers, which simplify the deployment and management of many proxies as a unified fleet. A decade ago, deploying 10,000 proxies would have been absurd, but it is feasible and practical today. The specifics of deploying these proxies, their locations, programming languages, and practices are subject to debate. However, at a high level, service meshes work by running these layer seven proxies that understand HTTP, HTTP2, and gRPC traffic and enable various functionalities without requiring changes to your application code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Can you briefly explain &lt;a href="https://blog.envoyproxy.io/service-mesh-data-plane-vs-control-plane-2774e720f7fc" rel="noopener noreferrer"&gt;how the data and control planes work in service meshes&lt;/a&gt;, especially compared to the older sidecar model with an extra container?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: A service mesh architecture consists of two main components: a control plane and a data plane. The control plane allows you to manage and configure the data plane, which directs network traffic within the service mesh. In Kubernetes, the control plane operates as a collection of standard Kubernetes services, typically running within a dedicated namespace or across the entire cluster.&lt;/p&gt;

&lt;p&gt;The data plane is the operational core of a service mesh, where proxies manage network traffic. The sidecar model, employed by service meshes like Linkerd, deploys a dedicated proxy alongside each application pod. Therefore, a service mesh with 20 pods would have 20 corresponding proxies. The overall efficiency and scalability of the service mesh rely heavily on the size and performance of these individual proxies.&lt;/p&gt;

&lt;p&gt;In the sidecar model, service A and service B communication flows through service A's and service B's proxy. Service A sends its message to its sidecar proxy, and then the service A proxy forwards it to service B's sidecar proxy. Finally, service B's proxy delivers the message to service B itself. This indirect communication path adds extra hops, leading to a slight increase in latency. You must carefully consider the potential performance impacts to ensure that service mesh benefits outweigh the trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: We've been discussing the benefits of service meshes, but running an extra container for each pod sounds expensive. Does cost become a significant issue?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: Service meshes have a compute cost, just like adding any component to a system. You pay for CPU and memory, but memory tends to be the more significant concern, as it can force you to scale up instances or nodes.&lt;/p&gt;

&lt;p&gt;However, Linkerd has minimized this issue with a "micro proxy" written in Rust. Rust's strict memory management allows fast, lightweight proxies and avoids memory vulnerabilities like buffer overflows, which are common in C and C++. Studies from both &lt;a href="https://security.googleblog.com/2024/03/secure-by-design-googles-perspective-on.html" rel="noopener noreferrer"&gt;Google&lt;/a&gt; and Microsoft have shown that roughly 70% of security bugs in C and C++ code are due to memory management errors.&lt;/p&gt;

&lt;p&gt;Our choice of Rust as the programming language in 2018 was a calculated risk. Rust offers the best of both worlds: the speed and control of languages like C/C++ and the safety and ease of use of languages with runtime environments like Go. Rust and its network library ecosystem were still relatively young at that time. We invested significantly in underlying libraries like &lt;a href="https://tokio.rs/" rel="noopener noreferrer"&gt;Tokio&lt;/a&gt;, &lt;a href="https://github.com/tower-rs/tower" rel="noopener noreferrer"&gt;Tower&lt;/a&gt;, and H2 to build the necessary infrastructure.&lt;/p&gt;

&lt;p&gt;The critical role of the data plane in handling sensitive application data drove this decision. We ensured its reliability and security.  Rust enables us to build small, fast, and secure proxies that scale with traffic, typically using minimal memory, directly translating to the user experience. Instead of facing long response times (like 5-second tail latencies), users experience faster interactions (closer to 30 milliseconds). A service mesh can optimize these tail latencies, improving user experience and customer retention.  Choosing Rust has proven to be instrumental in achieving these goals.&lt;/p&gt;

&lt;p&gt;While cost is a factor, the actual cost often stems from operational complexity. Do you need dedicated engineers to maintain complex proxies, or does the system primarily work independently? That human cost usually dwarfs the computational one.&lt;/p&gt;

&lt;p&gt;Our design choices have made managing Linkerd’s costs relatively straightforward. However, for other service meshes, costs can escalate if the proxies are large and resource-intensive. Even so, the more significant cost is often not the resources but the operational overhead and complexity. This complexity can demand considerable time and expertise, increasing the overall cost. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: You raise a crucial point about the human aspect. While we address technical challenges, the time spent resolving errors detracts from other tasks. The community has developed products and projects to tackle these concerns and costs. One such example is &lt;a href="https://istio.io/" rel="noopener noreferrer"&gt;Istio&lt;/a&gt; with Ambient Mesh. Another approach is sidecarless service meshes like Cilium Cluster Mesh. Can you explain what Ambient Mesh is and how it enhances the classic sidecar model of service meshes?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: We've delved deep into both of these options in Linkerd. While there might come a time when adopting these projects makes sense for us, we're not there yet. &lt;/p&gt;

&lt;p&gt;Every decision involves trade-offs regarding distributed systems, especially in production environments within companies where the platform is a tool to support applications. At Linkerd, our priority is constantly reducing the operational workload.&lt;/p&gt;

&lt;p&gt;Ambient Mesh and eBPF aren't primarily reactions to complexity but responses to the practical annoyances of sidecars. Their key selling point is eliminating the need for sidecars. However, the real question is: What's the cost of this shift? That's where the analysis becomes crucial.&lt;/p&gt;

&lt;p&gt;In Ambient Mesh, rather than having sidecar containers, you utilize connective components, such as tunnels, within the namespace. These tunnels communicate with proxies located elsewhere in the cluster. So essentially, you have multiple proxies running outside of the pod, and the pods use these tunnels to communicate with the proxies, which then handle specific tasks.&lt;/p&gt;

&lt;p&gt;This setup is indeed intriguing. As mentioned earlier, running sidecars can be challenging due to specific implications. One such implication is the cost factor, which we discussed earlier. In Linkerd’s case, this is a minor concern. However, a more significant implication is the need to restart the pod to upgrade the proxy to the latest version, given the immutability of pods in Kubernetes.&lt;/p&gt;

&lt;p&gt;This situation necessitates managing two separate updates: one to keep the applications up-to-date and another to upgrade the service mesh. Therefore, while the setup has advantages, it also requires careful management to ensure smooth operation and optimal performance.&lt;/p&gt;

&lt;p&gt;We operate the proxy as the first container for various reasons, which can lead to friction points, such as when using &lt;code&gt;kubectl logs&lt;/code&gt;. Typically, when you request logs, you're interested in your application's logs, not the proxy's. This friction, combined with a desire for networking to operate seamlessly in the background, drives the development of solutions like Ambient and eBPF, which aim to eliminate the need for explicit sidecars.&lt;/p&gt;

&lt;p&gt;Both Ambient and eBPF solutions, which are closely related, are reactions to this sentiment of not wanting to deal with sidecars directly. The aim is to make sidecars disappear. Take &lt;a href="https://istio.io/" rel="noopener noreferrer"&gt;Istio&lt;/a&gt; and most service meshes built on &lt;a href="https://www.envoyproxy.io/" rel="noopener noreferrer"&gt;Envoy&lt;/a&gt;, for instance. Envoy is complex and memory-intensive and requires constant attention and tuning based on traffic specifics.&lt;/p&gt;

&lt;p&gt;Challenges with sidecars are more of a cloud-native trend to market solutions, like writing a blog post proclaiming the &lt;a href="https://thenewstack.io/ambient-mesh-no-sidecar-required/" rel="noopener noreferrer"&gt;death of sidecars&lt;/a&gt; rather than being specific to Linkerd. They can sometimes be an inaccurate reflection of the reality of engineering.&lt;/p&gt;

&lt;p&gt;In Ambient, eliminating sidecars by running the proxy elsewhere and using tunnel components allows for separate proxy maintenance without needing to reboot applications for upgrades. However, in a Kubernetes environment, the idea is that pods should be rebootable anytime. Kubernetes can reschedule pods as needed, which aligns with the principles of building applications as distributed systems. Yet, there are legacy applications or specific scenarios where rebooting could be more convenient, making the sidecar approach less appealing. &lt;/p&gt;

&lt;p&gt;Historically, running cron jobs with sidecar proxies in Kubernetes posed a significant challenge. Kubernetes lacked a built-in mechanism to signal the sidecar proxy when the main job was complete, necessitating manual intervention to prevent the proxy from running indefinitely. This manual process went against the core principle of service mesh, which aims to decouple services from their proxies for easier management and scalability.&lt;/p&gt;

&lt;p&gt;Thankfully, one significant development is the &lt;a href="https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/" rel="noopener noreferrer"&gt;Sidecar Container Kubernetes Enhancement Proposal&lt;/a&gt;. With this enhancement, you can designate your proxy as a sidecar container, leading to several benefits, like jobs terminating the proxy once finished and eliminating unnecessary resource consumption.&lt;/p&gt;

&lt;p&gt;For Linkerd, adopting Ambient mesh architecture introduces more complexity than benefits. The additional components, like the tunnel and separate proxies, add unnecessary layers to the system. Unlike Istio, which has encountered issues due to its architecture, Linkerd's existing design hasn't faced similar challenges. Therefore, the trade-offs associated with Ambient aren't justified for Linkerd.&lt;/p&gt;

&lt;p&gt;In contrast, the sidecar model offers distinct advantages. It creates clear operational and security boundaries at the pod level. Each pod becomes a self-contained unit, making independent decisions regarding security and operations, aligning with Kubernetes principles, and simplifying management in a cloud-native environment.&lt;/p&gt;

&lt;p&gt;This sidecar approach is crucial for implementing &lt;a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/#:~:text=Zero%20Trust%20security%20is%20an,outside%20of%20the%20network%20perimeter." rel="noopener noreferrer"&gt;zero-trust&lt;/a&gt; security. The critical principle of zero trust is to enforce security policies at the most granular level possible. Traditional approaches relying on a perimeter firewall and implicitly trusting internal components are no longer sufficient. Instead, each security decision must be made independently at every system layer. This granular enforcement is achieved by deploying a sidecar proxy within each application pod, acting as a security boundary and enabling fine-grained control over network traffic, authentication, and authorization.&lt;/p&gt;

&lt;p&gt;In Linkerd, every request undergoes a rigorous security check within the pod. This check includes verifying the validity of the TLS encryption, confirming the client's identity through cryptographic algorithms, and ensuring the request comes from a trusted source. Additionally, Linkerd checks whether the request can access the specific resource or method it's trying to reach. This multi-layered scrutiny happens directly inside the pod, providing the highest possible level of security within the Kubernetes framework. Maintaining this tight security model is crucial, as any deviation, like separating the proxy and TLS certificate, weakens the model and introduces potential vulnerabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: The next point I'd like to discuss has garnered significant attention in recent years through &lt;a href="https://cilium.io/use-cases/service-mesh/" rel="noopener noreferrer"&gt;Cilium Service Mesh&lt;/a&gt; and various domains. What is eBPF?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: eBPF is a kernel technology that enables the execution of specific code within the kernel, offering significant advantages. Firstly, operations within the kernel are high-speed, eliminating the overhead of context switching between kernel and user space. Secondly, the kernel has unrestricted access to all system resources, requiring robust security measures to ensure eBPF programs are safe. This powerful technology empowers developers to create highly efficient and secure solutions for various system tasks, particularly networking, security, and observability.&lt;/p&gt;

&lt;p&gt;Traditionally, user-space programs lacked direct access to kernel resources, relying on &lt;a href="https://phoenixnap.com/kb/system-call#:~:text=A%20system%20call%20is%20an,functionalities%20from%20the%20OS's%20kernel." rel="noopener noreferrer"&gt;system calls&lt;/a&gt; to communicate with the kernel. While providing security, this syscall boundary introduced cost overhead, especially with frequent requests like network packet processing. &lt;/p&gt;

&lt;p&gt;eBPF revolutionized this by enabling user-defined code to run within the kernel with stringent safety measures. The number of instructions an eBPF program can execute is limited, and infinite loops are prohibited to prevent resource monopolization. The bytecode verifier meticulously ensures every possible execution path can be explored to avoid unexpected behavior or malicious activity. The bytecode is also verified for&lt;a href="https://opensource.stackexchange.com/questions/6549/does-program-that-uses-ebpf-module-needs-to-be-distributed-under-gpl" rel="noopener noreferrer"&gt; GPL compliance&lt;/a&gt; by checking for specific strings in its initial bytes. &lt;/p&gt;

&lt;p&gt;These security measures make eBPF a powerful but restrictive mechanism, enabling previously unattainable capabilities. Understanding what eBPF can and cannot do is crucial, despite marketing claims that might blur these lines. While many promote eBPF as a groundbreaking solution that could eliminate the need for sidecars, the reality is more nuanced. It's crucial to understand its limitations and not be swayed by marketing hype.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: There appears to be some confusion regarding the extent of limitations associated with eBPF. If eBPF has limitations, does that imply that these limitations constrain all service meshes using eBPF?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: The idea of an eBPF-based service mesh can sometimes need clarification. In reality, the Envoy proxy still handles the heavy lifting, even in these eBPF-powered meshes. eBPF has limitations, especially in the network space, and can't fully replace the functionality of a traditional proxy.&lt;/p&gt;

&lt;p&gt;While eBPF has many applications, including security and performance monitoring, its most interesting potential lies in instrumenting applications. The kernel can directly measure CPU usage, function calls, and other performance metrics by residing in the kernel.&lt;/p&gt;

&lt;p&gt;However, when it comes to networking, eBPF faces significant challenges. Maintaining large amounts of state, essential for many network operations, is difficult, bordering on impossible. This challenge highlights the limitations of eBPF in entirely replacing traditional networking components like proxies.&lt;/p&gt;

&lt;p&gt;The role of eBPF in networking, particularly within service meshes, is often overstated. While it excels in certain areas, like efficient TCP packet processing and simple metrics collection, other options exist beyond traditional proxies. Complex tasks like &lt;a href="https://blog.px.dev/ebpf-http2-tracing/" rel="noopener noreferrer"&gt;HTTP2 parsing&lt;/a&gt;, TLS handshakes, or layer seven routings are challenging, if possible, to implement purely with eBPF.&lt;/p&gt;

&lt;p&gt;Some projects attempt complex eBPF implementations for these tasks but often involve convoluted workarounds that sacrifice performance and practicality. In practice, eBPF is typically used for layer 4 (transport layer) tasks, while user-space proxies like Envoy handle more complex layer 7 (application layer) operations.&lt;/p&gt;

&lt;p&gt;Service meshes like Cilium, despite their claims of being sidecar-less, often rely on daemonset proxies to handle these complex tasks. While eliminating sidecars, this approach introduces its own set of problems. Security is compromised as TLS certificates are mixed in the proxy's memory, and operational challenges arise when the daemonset goes down, affecting seemingly random pods scheduled on that machine.&lt;/p&gt;

&lt;p&gt;Linkerd, having experienced similar issues with its &lt;a href="https://github.com/linkerd/linkerd" rel="noopener noreferrer"&gt;first version&lt;/a&gt; (Linkerd1.x) running as a daemonset, opted for the sidecar model in subsequent versions. Sidecars provide clear operational and security boundaries, making management and troubleshooting easier.&lt;/p&gt;

&lt;p&gt;Looking ahead, eBPF can still be a valuable tool for service meshes. Linkerd, for instance, could significantly speed up raw TCP proxying by offloading tasks to the kernel. However, for complex layer seven operations, a user-space proxy remains essential.&lt;/p&gt;

&lt;p&gt;The decision to use eBPF and the choice between sidecars and daemonsets are distinct considerations, each with advantages and drawbacks. While eBPF offers powerful capabilities, it doesn't inherently dictate a specific proxy architecture. Choosing the most suitable approach requires careful evaluation of the system's requirements and trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Can you share your predictions about conflict or uncertainty concerning service meshes and sidecars for the next few years? Is there a possibility of resolving this? Should we anticipate the emergence of new groups? What are your expectations for the near and distant future?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: While innovation in this field is valuable, relying solely on marketing over technical analysis needs more appeal, especially for those prioritizing tangible customer benefits. &lt;/p&gt;

&lt;p&gt;Regarding the future of service meshes, their value proposition is now well-established. The initial hype has given way to a practical understanding of their necessity, with users selecting and implementing solutions without extensive deliberation. This maturity is a positive development, shifting the focus from explaining the need for a service mesh to optimizing its usage.&lt;/p&gt;

&lt;p&gt;Functionally, service meshes converge on core features like MTLS, load balancing, and circuit breaking. However, a significant area of development and our primary focus is mesh expansion, which involves integrating non-Kubernetes components into the mesh. We have a &lt;a href="https://linkerd.io/2024/02/21/announcing-linkerd-2.15/" rel="noopener noreferrer"&gt;big announcement&lt;/a&gt; regarding this in mid-February.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: That sounds intriguing. Please give us a sneak peek into what this announcement is about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: It is about Linkerd 2.15! The release of Linkerd 2.15 is a significant step forward. It introduces the ability to run the data plane outside Kubernetes, enabling seamless TLS communication for both VM and pod workloads.&lt;/p&gt;

&lt;p&gt;The industry mirrors this direction, as evidenced by developments like the Gateway API, which converges to handle both ingress and service mesh configuration within Kubernetes. This unified approach allows consistent configuration primitives for traffic entering, transiting, and exiting the cluster.&lt;/p&gt;

&lt;p&gt;The industry will likely focus on refining details like eBPF integration or the advantages of Ambient Mesh while the fundamental value proposition of service meshes remains consistent. I'm particularly excited about how these advancements can be applied across entire organizations, starting with securing and optimizing Kubernetes environments and extending these benefits to the broader infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Shifting away from the professional side, we heard you have an interesting &lt;a href="https://twitter.com/wm/status/1584940854384685056" rel="noopener noreferrer"&gt;tattoo&lt;/a&gt;. Is it of Linkerd, or what is it about?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: It’s just a temporary one. We handed them out at KubeCon last year as part of our swag. While everyone else gave out stickers, we thought we'd do something more extraordinary. So, we made temporary tattoos of Linky the Lobster, our Linkerd mascot.&lt;/p&gt;

&lt;p&gt;When Linkerd graduated within the CNCF, reaching the top tier of project maturity, we needed a mascot. Most mascots are cute and cuddly, like the &lt;a href="https://go.dev/blog/gopher" rel="noopener noreferrer"&gt;Go Gopher&lt;/a&gt;. We wanted something different, so we chose a blue lobster—an unusual and rare creature reflecting Linkerd's unique position in the CNCF universe.&lt;/p&gt;

&lt;p&gt;The tattoo featured Linky the Lobster crushing some sailboats, which is part of our logo. It was a fun little easter egg. If you were at KubeCon, you might have seen them. That event was in Amsterdam.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: What's next for you? Are there any side projects or new ventures you're excited about?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: I'm devoting all my energy to Linkerd and &lt;a href="https://buoyant.io/" rel="noopener noreferrer"&gt;Buoyant&lt;/a&gt;. That takes up most of my focus. Outside of work, I'm a dad. My kids are learning the piano, so I decided to start learning, too. It's humbling to see how fast they pick it up compared to me. As an adult learner, it's a slow process. It's interesting to be in a role where I'm the student, taking lessons from a teacher who's probably a third my age and incredibly talented. It’s an excellent reminder to stay humble, especially since much of my day involves being the authority on something. It’s a nice change of pace and a bit of a reality check.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: That's a good balance. It's important to remind people that doing something you could be better at is okay. As a kid, you're used to it—no expectations, no judgment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: Exactly. However, it can be a struggle as an adult, especially as a CEO. I've taught Linkerd to hundreds of people without any panic, but playing a piano recital in front of 20 people is terrifying. It's the complete opposite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: If people want to contact you, what's the best way?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;William&lt;/strong&gt;: You can email me at &lt;a href="mailto:william@buoyant.io"&gt;william@buoyant.io&lt;/a&gt;, find me on Linkerd Slack at slack.linkerd.io, or DM me at &lt;a class="mentioned-user" href="https://dev.to/wm"&gt;@wm&lt;/a&gt; on Twitter. I'd love to hear about your challenges and how I can help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrap up&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you enjoyed this interview and want to hear more Kubernetes stories and opinions, visit &lt;a href="https://kube.fm/" rel="noopener noreferrer"&gt;KubeFM&lt;/a&gt; and subscribe to the podcast.&lt;/li&gt;
&lt;li&gt;If you want to keep up-to-date with Kubernetes, subscribe to &lt;a href="https://learnk8s.io/learn-kubernetes-weekly" rel="noopener noreferrer"&gt;Learn Kubernetes Weekly&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you want to become an expert in Kubernetes, look at courses on &lt;a href="https://learnk8s.io/training" rel="noopener noreferrer"&gt;Learnk8s&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Finally, if you want to keep in touch, follow me on &lt;a href="https://www.linkedin.com/in/gulcantopcu/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>servicemesh</category>
      <category>ebpf</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Clusters Are Cattle Until You Deploy Ingress</title>
      <dc:creator>Gulcan Topcu</dc:creator>
      <pubDate>Thu, 30 May 2024 14:07:24 +0000</pubDate>
      <link>https://dev.to/gulcantopcu/clusters-are-cattle-until-you-deploy-ingress-4mon</link>
      <guid>https://dev.to/gulcantopcu/clusters-are-cattle-until-you-deploy-ingress-4mon</guid>
      <description>&lt;p&gt;Managing repeatable infrastructure is the bedrock of efficient Kubernetes operations. While the ideal is to have easily replaceable clusters, reality often dictates a more nuanced approach. Dan Garfield, Co-founder of Codefresh, briefly captures this with the analogy: "A Kubernetes cluster is treated as disposable until you deploy ingress, and then it becomes a pet."&lt;/p&gt;

&lt;p&gt;Dan Garfield joined Bart Farrell to understand how he managed Kubernetes clusters, transforming them from "cattle" to "pets" weaving in fascinating anecdotes about fairy tales, crypto, and snowboarding.&lt;/p&gt;

&lt;p&gt;You can watch (or listen) to this interview &lt;a href="https://kube.fm/ingress-gitops-dan" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: What are your top three must-have tools starting with a fresh Kubernetes cluster?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: &lt;a href="https://argo-cd.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;Argo CD&lt;/a&gt; is the first tool I install. For AWS, I will add &lt;a href="https://karpenter.sh/" rel="noopener noreferrer"&gt;Karpenter&lt;/a&gt; to manage costs. I will also use &lt;a href="https://longhorn.io/" rel="noopener noreferrer"&gt;Longhorn&lt;/a&gt; for on-prem storage solutions, though I'd need ingress. Depending on the situation, I will install Argo CD first and then one of those other two.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Many of our recent podcast guests have highlighted Argo or &lt;a href="https://fluxcd.io/" rel="noopener noreferrer"&gt;Flux&lt;/a&gt;, emphasizing their significance in the  &lt;a href="https://www.gitops.tech/" rel="noopener noreferrer"&gt;GitOps&lt;/a&gt; domain. Why do you think these tools are considered indispensable?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: The entire deployment workflow for Kubernetes revolves around Argo CD. When I set up a cluster, some might default to using &lt;code&gt;kubectl apply&lt;/code&gt;, or if they're using &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;, they might opt for the &lt;a href="https://registry.terraform.io/providers/hashicorp/helm/latest/docs" rel="noopener noreferrer"&gt;Helm provider&lt;/a&gt; to install various Helm charts. However, with Argo CD, I have precise control over deployment processes. &lt;/p&gt;

&lt;p&gt;Typically, the bootstrap pattern involves using Terraform to set up the cluster and Helm provider to install Argo CD and predefined repositories. From there, Argo CD takes care of the rest.&lt;/p&gt;

&lt;p&gt;I have my Kubernetes cluster displayed on the screen behind me, running Argo CD for those who can't see. I utilize &lt;a href="https://argocd-autopilot.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;Argo CD autopilot&lt;/a&gt;, which streamlines repository setup. Last year, when my system was compromised, Argo CD autopilot swiftly restored everything. It's incredibly convenient. Moreover, when debugging, the ability to quickly toggle sync, reset applications, and access logs through the UI is invaluable. Argo CD is, without a doubt, my go-to tool for Kubernetes. Admittedly, I'm biased as an Argo maintainer, but it's hard to argue with its effectiveness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Our numerous podcast discussions with seasoned professionals show that GitOps has been a recurring theme in about 90% of our conversations. Almost every guest we've interviewed has emphasized its importance, often mentioning it as their primary tool alongside other essentials like &lt;a href="https://cert-manager.io/" rel="noopener noreferrer"&gt;cert manager&lt;/a&gt;, &lt;a href="https://kyverno.io/" rel="noopener noreferrer"&gt;Kyverno&lt;/a&gt;, or &lt;a href="https://www.openpolicyagent.org/" rel="noopener noreferrer"&gt;OPA&lt;/a&gt;, depending on their preferences. &lt;/p&gt;

&lt;p&gt;Could you introduce yourself to those unfamiliar with you? Tell us your background, work, and where you're currently employed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: I'm Dan Garfield, the co-founder and chief open-source officer at CodeFresh. As Argo maintainers, we're deeply involved in shaping the GitOps landscape. I've played a key role in creating the GitOps standard, establishing the GitOps working group, and spearheading the &lt;a href="https://opengitops.dev/" rel="noopener noreferrer"&gt;OpenGitOps&lt;/a&gt; project. &lt;/p&gt;

&lt;p&gt;Our journey began seven years ago when we launched &lt;a href="https://codefresh.io/" rel="noopener noreferrer"&gt;CodeFresh&lt;/a&gt; to enhance software delivery in the cloud-native ecosystem, primarily focusing on Kubernetes. Alongside my responsibilities at CodeFresh, I actively contribute to &lt;a href="https://github.com/kubernetes/sig-security" rel="noopener noreferrer"&gt;SIG security&lt;/a&gt; within the Kubernetes community and oversee community-driven events like &lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/co-located-events/argocon/" rel="noopener noreferrer"&gt;ArgoCon&lt;/a&gt;. Outside of work, I reside in Salt Lake City, where I indulge in my passion for snowboarding. Oh, and I'm a proud father of four, eagerly awaiting the arrival of our fifth child.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: It’s a fantastic journey. We'll have to catch up during &lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/" rel="noopener noreferrer"&gt;KubeCon in Salt Lake City&lt;/a&gt; later this year. Before delving into your entrepreneurial venture, could you share how you entered Cloud Native?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: My journey into the tech world began early on as a programmer. However, I found myself gravitating more towards the business side, where I discovered my knack for marketing. My pivotal experience was leading enterprise marketing at &lt;a href="https://www.atlassian.com/" rel="noopener noreferrer"&gt;Atlassian&lt;/a&gt; during the release of &lt;a href="https://www.atlassian.com/enterprise/data-center" rel="noopener noreferrer"&gt;Data Center&lt;/a&gt;, Atlassian's clustered tool version. Initially, it didn't garner much attention internally, but it soon became a game-changer, driving significant revenue for the company. Witnessing this transformation, including Atlassian's public offering, was exhilarating, although my direct contribution was modest as I spent less than two years there.&lt;/p&gt;

&lt;p&gt;I noticed a significant change in containerization, which sparked my interest in taking on a new challenge. Conversations with friends starting container-focused experiences captivated me. Then, &lt;a href="https://www.linkedin.com/in/razielt/" rel="noopener noreferrer"&gt;Raziel&lt;/a&gt;, the founder of Codefresh, reached out, sharing his vision for container-driven software development. His perspective resonated deeply, prompting me to join the venture.&lt;/p&gt;

&lt;p&gt;Codefresh initially prioritized building robust CI tools, recognizing that effective CD hinges on solid CI practices and needed to be improved in many organizations at the time (and possibly still is). As we expanded, we delved into CD and explored ways to leverage Kubernetes insights.&lt;/p&gt;

&lt;p&gt;Kubernetes had yet to emerge as the dominant force when we launched this journey. We evaluated competitors like &lt;a href="https://www.rancher.com/" rel="noopener noreferrer"&gt;Rancher&lt;/a&gt;, &lt;a href="https://www.redhat.com/en/technologies/cloud-computing/openshift" rel="noopener noreferrer"&gt;OpenShift&lt;/a&gt;, &lt;a href="https://kube.fm/ingress-gitops-dan#:~:text=.%20And%20maybe-,Mesosphere,-is%20going%20to" rel="noopener noreferrer"&gt;Mesosphere&lt;/a&gt;, and &lt;a href="https://docs.docker.com/engine/swarm/" rel="noopener noreferrer"&gt;Docker Swarm&lt;/a&gt;. However, after thorough analysis, Kubernetes emerged as the frontrunner, boldly cueing us to bet on its potential.&lt;/p&gt;

&lt;p&gt;Our decision proved visionary as other platforms gradually transitioned towards Kubernetes. Amazon's launch of &lt;a href="https://aws.amazon.com/eks/" rel="noopener noreferrer"&gt;EKS&lt;/a&gt; validated our foresight. This strategic alignment with Kubernetes paved the way for our deep dive into GitOps and Argo CD, driving the project's growth within the &lt;a href="https://www.cncf.io/" rel="noopener noreferrer"&gt;CNCF&lt;/a&gt; and its eventual graduation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: It's impressive how much you've accomplished in such a short timeframe, especially while balancing family life. With the industry evolving rapidly, How do you keep up with the cloud-native scene as a maintainer and a co-founder?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Indeed, staying updated involves reading blogs, scrolling through Twitter, and tuning into podcasts. However, I've found that my most insightful learnings come from direct conversations with individuals. For instance, I've assisted the community with Argo implementations, not as a sales pitch but to help gather insights genuinely. Interacting with Codefresh users and engaging with the broader community provides invaluable perspectives on adoption challenges and user needs.&lt;/p&gt;

&lt;p&gt;Oddly enough, sometimes, the best way to learn is by putting forth incorrect opinions or questions. Recently, while wrestling with AI project complexities, I pondered aloud whether all Docker images with AI models would inevitably be bulky due to &lt;a href="https://pytorch.org/" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt; dependencies. To my surprise, this sparked many helpful responses, offering insights into optimizing image sizes. Being willing to be wrong opens up avenues for rapid learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: That vulnerability can indeed produce rich learning experiences. It's a valuable practice. Shifting gears slightly, if you could offer one piece of career advice to your younger self, what would it be?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Firstly, embrace a mindset of rapid learning and humility. Be more open to being wrong and detach ego from ideas. While standing firm on important matters is essential, recognize that failure and adaptation are part of the journey. Like a stone rolling down a mountain, each collision smooths out the sharp edges, leading to growth.&lt;/p&gt;

&lt;p&gt;Secondly, prioritize hiring decisions. The people you bring into your business shape its trajectory more than any other factor. A wrong hire can have far-reaching consequences beyond their salary. Despite some missteps, I've been fortunate to work with exceptional individuals who contribute immensely to our success. When considering a job opportunity, I always emphasize the people's quality, the mission's significance, and fair compensation. Prioritizing in this order ensures fulfillment and satisfaction in your career journey.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: That's insightful advice, especially about hiring. Surrounding yourself with talented individuals can make all the difference in navigating business challenges. Now, shifting gears to your recent tweet about Kubernetes and Ingress, who was the intended audience for that &lt;a href="https://twitter.com/todaywasawesome/status/1701625561536454879" rel="noopener noreferrer"&gt;tweet&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Honestly, it was more of a reflection for myself, perhaps shouted into the void. I was weighing the significance of deploying Ingress within Kubernetes. In engineering, a saying that "the problem is always DNS" suggests that your cluster becomes more tangible once you configure DNS settings. Similarly, setting up Ingress signifies a shift in how you perceive and manage your cluster. Without Ingress, it might be considered disposable, like a development environment. However, once Ingress is in place, your cluster hosts services that require more attention and care.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: For those unfamiliar with the "&lt;a href="https://www.hava.io/blog/cattle-vs-pets-devops-explained" rel="noopener noreferrer"&gt;cattle versus pets&lt;/a&gt;" analogy in Kubernetes, could you elaborate on its relevance, particularly in the context of Ingress?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: While potentially controversial, the "cattle versus pets" analogy illustrates a fundamental concept in managing infrastructure. In this analogy, cattle represent interchangeable and disposable resources, much like livestock in a ranching operation. Conversely, pets are unique, loved entities requiring personalized care.&lt;/p&gt;

&lt;p&gt;In Kubernetes, deploying resources as "cattle" means treating them as replaceable, identical units. However, Ingress introduces a shift towards a "pet" model, where individual services become distinct and valuable entities. Just as you wouldn't name every cow on a farm, you typically wouldn't concern yourself with the specific details of each interchangeable resource. But once you start deploying services accessible via Ingress, each service becomes unique and worthy of individual attention, akin to caring for a pet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: It seems the "cattle versus pets" analogy is stirring some controversy among vegans, which is understandable given its context. How does this analogy relate to Kubernetes and Ingress?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: In software, the analogy helps distinguish between disposable, interchangeable components (cattle) and unique, loved entities (pets). For instance, in my Kubernetes cluster, the individual nodes are like cattle—replaceable and without specific significance. If one node malfunctions, I can easily swap it out without concern.&lt;/p&gt;

&lt;p&gt;However, once I deploy Ingress and start hosting services, the cluster takes on a different role. While the individual nodes remain disposable, the cluster becomes more akin to a pet. I care about its state, its configuration, and its uptime. Suddenly, I'm monitoring metrics and ensuring its well-being, similar to caring for a pet's health.&lt;/p&gt;

&lt;p&gt;So, the analogy underscores the shift in perception and care that occurs when transitioning from managing generic infrastructure to hosting meaningful services accessible via Ingress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: That's a fascinating perspective. How do Kubernetes and Ingress relate to all of this?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: The ingress in Kubernetes is a central resource for managing incoming traffic to the cluster and routing it to different services. However, unlike other resources in Kubernetes, such as those managed by Argo CD, the ingress is often shared among multiple applications. Each application may have its own deployment rules, allowing for granular control over updates and configurations. For example, one application might only update when manually triggered, while another automatically updates when changes are detected.&lt;/p&gt;

&lt;p&gt;The challenge arises because updating Ingress impacts multiple applications simultaneously. Through this centralized routing mechanism, you're essentially juggling the needs of various applications. This complexity underscores the importance of managing the cluster effectively, as each change to Ingress affects the entire ecosystem of applications.&lt;/p&gt;

&lt;p&gt;The Argo CD community is discussing introducing delegated server-side field permissions. This feature would allow one application to modify components of another, easing the burden of managing shared resources like Ingress. However, it's still under debate, and alternative solutions may emerge. Other tools, like &lt;a href="https://projectcontour.io/" rel="noopener noreferrer"&gt;Contour&lt;/a&gt;, offer a different approach by treating each route as a separate custom resource, allowing applications to manage their routing independently.&lt;/p&gt;

&lt;p&gt;Ultimately, deploying the ingress marks a shift in the cluster's dynamics, requiring considerations such as DNS settings and centralized routing configurations. As a result, the cluster becomes more specialized and less disposable as its configuration becomes bespoke to accommodate the routing needs of various applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Any recommendations for those who aim to keep their infrastructure reproducible while needing Ingress?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: One approach is abstraction and leveraging wildcards. While technically, you can deploy an Ingress without external pointing; I prefer the concept of self-updating components. Tools like &lt;a href="https://www.crossplane.io/" rel="noopener noreferrer"&gt;Crossplane&lt;/a&gt; or &lt;a href="https://cloud.google.com/config-connector/docs/overview" rel="noopener noreferrer"&gt;Google Cloud's Config Connector&lt;/a&gt; allow you to represent non-Kubernetes resources as Kubernetes objects. Incorporating such tools into your cluster bootstrap process ensures the dynamic creation of necessary components.&lt;/p&gt;

&lt;p&gt;However, there's a caveat. Despite reproducible clusters, external components like DNS settings may not be. Updating name servers remains a manual task. It's a tricky aspect of operations that needs a perfect solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How do GitOps and Argo CD fit into solving this challenge?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: GitOps and Argo CD play a crucial role in managing complex infrastructure, especially with sensitive data. The key lies in representing all infrastructure resources, including secrets and certificates, as Kubernetes objects. This approach enables Argo CD to track and reconcile them, ensuring that the desired state defined in Git reflects accurately in your cluster.&lt;/p&gt;

&lt;p&gt;Tools like Crossplane, &lt;a href="https://www.vcluster.com/" rel="noopener noreferrer"&gt;vCluster&lt;/a&gt; (for managing multiple clusters), or &lt;a href="https://cluster-api.sigs.k8s.io/" rel="noopener noreferrer"&gt;Cluster API&lt;/a&gt; (for provisioning additional clusters) can extend this approach to handle various infrastructure resources beyond Kubernetes. Essentially, Git serves as the single source of truth for your entire infrastructure, with Argo CD functioning as the engine to enforce that truth.&lt;/p&gt;

&lt;p&gt;A common issue with Terraform is that its state can get corrupted easily because it must constantly monitor changes. Crossplane often uses Terraform under the hood. The problem is not with Terraform's primitives but with the data store and its maintenance. Crossplane ensures the data store remains uncorrupted, accurately reflecting the current state. If changes occur, they appear as out of sync in Argo CD.&lt;/p&gt;

&lt;p&gt;You can then define policies for reconciliation and updates, guiding the controller on the next steps. This approach is crucial for managing infrastructure effectively. Using etcd as your data store is an excellent pattern and likely the future of infrastructure management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: What would happen if the challenges of managing Kubernetes infrastructure extend beyond handling ingress traffic to managing sensitive information like state secrets and certificates? This added complexity could lead to a "pet" cluster scenario. Would you think backup and recovery tools like &lt;a href="https://velero.io/" rel="noopener noreferrer"&gt;Velero&lt;/a&gt; would be easier to use without these additional challenges?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: I need to familiarize myself with Velero. Can you tell me about it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Velero is a tool focused on backing up and restoring Kubernetes resources. Since you mentioned Argo CD and custom resources earlier, I'm curious about your approach to backing up persistent volumes. How did you manage disaster recovery in your home lab when everything went haywire? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: I've used Longhorn for volume restoration, and clear protocols were in place. I'm currently exploring Velero, which looks like a promising tool for data migration. &lt;/p&gt;

&lt;p&gt;Managing data involves complexities like caring for a pet, requiring careful handling and migration. Many people need help managing stateful workloads in Kubernetes. Fortunately, most of my stateful workloads in Kubernetes can rebuild their databases if data is lost. Therefore, data loss is manageable for me. Most of the elements I work with are replicable. Any items needing persistence between sessions are stored in Git or a versioned, immutable secret repository.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: It's worth noting, especially considering what happened with your home lab. Should small startups prioritize treating their clusters like cattle, or is ClickOps sufficient?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: It depends on the use cases. vCluster, a project I'm fond of, is particularly well-suited for creating disposable development clusters, providing developers with isolated sandboxes for testing and experimentation. It allows deploying a virtualized cluster on an existing Kubernetes setup, which saves significantly on ingress costs, especially on platforms like AWS, where you can consolidate ingress into one.&lt;/p&gt;

&lt;p&gt;Another example is using Argo CD's application sets to create full-stack environments for each pull request in a Git repository. These environments, which include a virtual cluster, are unique to each pull request but remain completely disposable and easily recreated, much like cattle.&lt;/p&gt;

&lt;p&gt;However, managing ingress for disposable clusters can be challenging. When deployed and applied to vClusters, ingress needs custom configurations, requiring separate tracking and maintenance. Despite this, it's still beneficial to prioritize treating infrastructure as disposable. For example, while my on-site Kubernetes cluster is a "pet" that requires careful maintenance, its nodes are considered "cattle" that can be replaced or reconfigured without disrupting overall operations. This abstraction is a core principle of Kubernetes and allows for greater flexibility and resilience.&lt;/p&gt;

&lt;p&gt;By abstracting clusters away from custom configurations and focusing on reproducibility, you can treat them more like cattle, even if they have some pet-like qualities due to ingress deployment and DNS configurations. This commoditization of clusters simplifies management and enables greater scalability. The more you abstract and standardize your infrastructure, the smoother your operations will become. And to be clear, this analogy has nothing to do with dietary choices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: If you could rewind time and change anything, what scenario would you create to avoid writing that tweet?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: We've been discussing a feature in Argo CD that allows for delegated field permissions to happen server-side. It addresses a problem inherent in Kubernetes architecture, particularly regarding ingress. The current setup doesn't allow for external delegation of its components, even though many users operate it that way. If I could make changes, I might have split ingress into an additional resource, including routes as a separate definition that users could manage independently.&lt;/p&gt;

&lt;p&gt;Exploring other scenarios where delegated field permissions would be helpful is crucial. Ingress is the most obvious example, highlighting an area for potential improvement. Creating separate routes and resources could solve this issue without altering Argo CD. This approach, similar to Contour's, could be a promising solution. Contour's separate resource strategy demonstrates learning from Ingress and making improvements. We should consider adopting tools like Contour or other service mesh ingress providers, as several compelling options are available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: If you had to build a cluster from scratch today, how would you address these issues whenever possible?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Sometimes you just have to accept the challenge and not try to work around it. Setting up ingress and configuring DNS for a single cluster might not be a big deal, but it's worth considering a re-architecture if you're doing it on a large scale, like 250,000 times. For instance, with Codefresh, many users opt for our hybrid setup. They deploy our GitOps agent, based on Argo CD, on their cluster, which then connects to our control plane. &lt;/p&gt;

&lt;p&gt;One of the perks we offer is a hosted ingress. Instead of setting up ingresses for each of their 5000 Argo CD instances, users can leverage our hosted ingress, saving money and configuration headaches. Consider alternatives like a tunneling system instead of custom ingress setups, depending on your use case. A hosted ingress can be a game-changer for large-scale distributed setups like multiple Argo CD instances, saving costs and simplifying configurations. Ultimately, re-architecting is always an option tailored to what works best for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: We're nearing the end of the podcast and want to touch on a closing question, which we are looking at from a few different angles. How do you deal with the anxiety of adopting a new tool or practice, only to find out later that it might be wrong?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: I've seen this dynamic play out. Sometimes, organizations invest heavily in a tool, only to realize a few years later that there are better fits. Take the example of a company transitioning to Argo workflows for CICD and deployment, only to discover that Argo CD would have been a better fit for most of their use cases. However, these transitions are well-spent efforts. In their case, the journey through Argo workflows paved the way for a smoother transition to Argo CD. Sometimes, detaching the wrong direction is necessary to reach the correct destination faster.&lt;/p&gt;

&lt;p&gt;You can only sometimes foresee the ideal solution from where you are, and experimenting with different tools is part of the learning process. It's essential not to dwell on mistakes but to learn from them and move forward. After all, even if a tool ultimately proves to be the wrong choice, it often still brings value. The key is recognizing when a change is needed and adapting accordingly. Mistakes only become fatal if we fail to acknowledge and learn from them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: We stumbled upon your blog, &lt;a href="https://todaywasawesome.com/" rel="noopener noreferrer"&gt;Today Was Awesome&lt;/a&gt;, which hasn't seen an update in a while. You wrote a &lt;a href="https://todaywasawesome.com/why-a-bitcoin-crash-could-be-great-for-bitcoin/" rel="noopener noreferrer"&gt;post&lt;/a&gt; about Bitcoin, priced at around $450 in 2015. Are you a crypto millionaire now?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Not quite! Crypto is a fascinating topic, often sparking wild debates. While there's no shortage of scams in the crypto world, there's also genuine innovation happening. I dabbled in Bitcoin early on and even mined a bit to understand its potential use cases better. One notable experience was mentoring at &lt;a href="https://hackthenorth.com/" rel="noopener noreferrer"&gt;Hack the North&lt;/a&gt;, a massive hackathon where numerous projects leveraged Ethereum. I strategically sold my Bitcoin for Ethereum, which turned out well. However, I'm still waiting on those Lambos—I'm not quite at millionaire status yet!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Your blog covers many topics, including one post titled  "&lt;a href="https://todaywasawesome.com/what-are-we-really-supposed-to-learn-from-fairy-tales/" rel="noopener noreferrer"&gt;What are we really supposed to learn from fairy tales&lt;/a&gt;.” How did you decide on such diverse content?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan:&lt;/strong&gt; I can't recall the exact inspiration, but my wife and I often joke about how outdated the moral lessons in fairy tales feel. Exploring their relevance in today's world is an interesting angle to explore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: What's next for you? More fairy tales, moon-bound Lamborghinis, or snowboarding adventures? Also, let's discuss your recent tweet about making your bacon. How did that start?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Ah, yes, making bacon! It's surprisingly simple. First, you get pork belly and cure it in the fridge for seven to ten days. Then, you smoke it for a couple of hours.&lt;/p&gt;

&lt;p&gt;My primary motivation was to avoid the nitrates found in store-bought bacon linked to health issues. Homemade bacon tastes better, is of higher quality, and is cheaper. My freezer now overflows with homemade bacon, which makes for a unique and well-received gift. People love the taste; overall, it's been a rewarding and delicious effort! &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Regardless of dietary choices, considering where your food comes from and being involved in the process—whether by growing your food or making it yourself and turning it into a gift for others—creates a different, enriching experience. What's next for you?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: This year, my focus is on environment management and promotion. In the Kubernetes world, we often think about applications, clusters, and instances of Argo CD to manage everything. We're working on a paradigm shift: we think about products instead of applications. In our context, a product is an application in every environment in which it exists. Hence, if you deploy a development application, move it to stage, and finally to production, you're deploying the same application with variations three times. That's what we call a product. We’re shifting from thinking about where an application lives to considering its entire life cycle. Instead of focusing on clusters, we think about environments because an environment might have many clusters.&lt;/p&gt;

&lt;p&gt;For instance, retail companies like Starbucks, Chick-fil-A, and Pizza Hut often have Kubernetes clusters on-site. Deploying to US West might mean deploying to 1,300 different clusters and 1,300 different Argo CD instances. We abstract all that complexity by grouping them into the environments bucket. We focus on helping people scale up and build their workflow using environments and establishing these relationships. The feedback has been incredible; people are amazed by what we’re demonstrating.&lt;/p&gt;

&lt;p&gt;We're showcasing this at ArgoCon next month in Paris. After that, I plan to do some snowboarding and then make it back in time for the birth of my fifth child. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bart&lt;/strong&gt;: That's a big plan. 2024 is packed for you. If people want to contact you, what's the best way to do it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan&lt;/strong&gt;: Twitter is probably the best. You can find me at &lt;a class="mentioned-user" href="https://dev.to/todaywasawesome"&gt;@todaywasawesome&lt;/a&gt;. If you visit my blog and leave comments, I won't see them, as it's more of an archive now. I keep it around because I worked on it ten years ago and occasionally reference something I wrote. &lt;/p&gt;

&lt;p&gt;You can also reach out on LinkedIn, GitHub, or Slack. I respond slower on Slack, but I do get to it eventually.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Wrap up&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you enjoyed this interview and want to hear more Kubernetes stories and opinions, visit &lt;a href="https://kube.fm" rel="noopener noreferrer"&gt;KubeFM&lt;/a&gt; and subscribe to the podcast.&lt;/li&gt;
&lt;li&gt;If you want to keep up-to-date with Kubernetes, subscribe to &lt;a href="https://learnk8s.io/learn-kubernetes-weekly" rel="noopener noreferrer"&gt;Learn Kubernetes  Weekly&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you want to become an expert in Kubernetes, look at courses on &lt;a href="https://learnk8s.io/training" rel="noopener noreferrer"&gt;Learnk8s&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Finally, if you want to keep in touch, follow me on &lt;a href="https://www.linkedin.com/in/gulcantopcu/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>gitops</category>
      <category>automation</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Upgrading Hundreds of Kubernetes Clusters</title>
      <dc:creator>Gulcan Topcu</dc:creator>
      <pubDate>Wed, 03 Apr 2024 07:22:58 +0000</pubDate>
      <link>https://dev.to/gulcantopcu/upgrading-hundreds-of-kubernetes-clusters-8h0</link>
      <guid>https://dev.to/gulcantopcu/upgrading-hundreds-of-kubernetes-clusters-8h0</guid>
      <description>&lt;p&gt;Automating the upgrade process for hundreds of Kubernetes clusters is a formidable task, but it's one that Pierre Mavro, the co-founder and CTO at Qovery, is well-equipped to handle. With his extensive experience and a dedicated team of engineers, they have successfully automated the upgrade process for both public and private clouds.&lt;/p&gt;

&lt;p&gt;Bart Farell sat with Pierre to understand how he did it without breaking the bank.&lt;/p&gt;

&lt;p&gt;You can watch (or listen) to this interview &lt;a href="https://kube.fm/upgrading-100s-clusters-pierre" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: If you installed three tools on a new Kubernetes cluster, which tools would they be and why?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: The first tool I recommend is &lt;a href="https://k9scli.io/" rel="noopener noreferrer"&gt;K9s&lt;/a&gt;. It's not just a time-saver but a productivity booster. With its intuitive interface, you can speed up all the usual kubectl commands, access logs, edit resources and configurations, and more. It's like having a personal assistant for your cluster management tasks.&lt;/p&gt;

&lt;p&gt;The second one is a combination of tools: &lt;a href="https://github.com/kubernetes-sigs/external-dns" rel="noopener noreferrer"&gt;External DNS&lt;/a&gt;, &lt;a href="https://cert-manager.io/" rel="noopener noreferrer"&gt;cert-manager&lt;/a&gt;, and &lt;a href="https://github.com/kubernetes/ingress-nginx" rel="noopener noreferrer"&gt;NGINX ingress&lt;/a&gt;. Using these as a stack, you can quickly deploy an application, making it available through a DNS with a TLS without much effort via simple annotations. When I first discovered External DNS, I was amazed at its quality.&lt;/p&gt;

&lt;p&gt;The last one is mostly an observability stack with &lt;a href="https://github.com/prometheus-operator/kube-prometheus" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;, &lt;a href="https://github.com/kubernetes-sigs/metrics-server" rel="noopener noreferrer"&gt;Metric server&lt;/a&gt;, and &lt;a href="https://github.com/kubernetes-sigs/prometheus-adapter" rel="noopener noreferrer"&gt;Prometheus adapter&lt;/a&gt; to have excellent insights into what is happening on the cluster. You can reuse the same stack for autoscaling by repurposing all the data collected for monitoring.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Tell us more about your background and how you progressed through your career.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: My journey in the tech industry has been diverse and enriching. I've had the privilege of working for renowned companies like Red Hat and Criteo, where I honed my skills in cloud deployment. Today, as the co-founder and CTO of Qovery, I bring a wealth of experience in distributed systems, particularly for NoSQL databases, and a deep understanding of Kubernetes, which I began exploring in 2016 with version 1.2.&lt;/p&gt;

&lt;p&gt;To provide some context to Qovery's services, we offer a self-service developer platform that allows code deployment on Kubernetes without requiring expertise in infrastructure. We keep our platform cloud-agnostic and place Kubernetes at the core to ensure our deployments are portable across different cloud providers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How was your journey into Kubernetes and the cloud-native world, given the changes since 2016?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Actually, learning Kubernetes was quite a journey. You had a less developed landscape with most Kubernetes components in alpha at these times. In 2016, I was also juggling between my job at Criteo and my own company. &lt;/p&gt;

&lt;p&gt;When it came to deployment, I had several options, and I chose the hard way: deploying Kubernetes on bare metal nodes using &lt;a href="https://github.com/kubernetes-sigs/kubespray" rel="noopener noreferrer"&gt;KubeSpray&lt;/a&gt;. Troubleshooting bare metal Kubernetes deployments honed my skills in pinpointing issues. This hands-on experience provided a deep understanding of how each component, like the &lt;a href="https://kubernetes.io/docs/concepts/overview/components/#control-plane-components:~:text=a%20Kubernetes%20cluster-,Control%20Plane%20Components,-The%20control%20plane%27s" rel="noopener noreferrer"&gt;Control Plane&lt;/a&gt;, &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=kubelet-,kubelet,-Synopsis" rel="noopener noreferrer"&gt;kubelet&lt;/a&gt;, &lt;a href="https://kubernetes.io/docs/setup/production-environment/container-runtimes/#:~:text=Container%20Runtimes-,Container%20Runtimes,-Note%3A%20Dockershim" rel="noopener noreferrer"&gt;Container Runtime&lt;/a&gt;, and &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/#:~:text=Kubernetes%20Scheduler-,Kubernetes%20Scheduler,-In%20Kubernetes%2C" rel="noopener noreferrer"&gt;scheduler&lt;/a&gt;, interacts to orchestrate containers. &lt;/p&gt;

&lt;p&gt;Another resource that I found pretty helpful was "&lt;a href="https://github.com/kelseyhightower/kubernetes-the-hard-way" rel="noopener noreferrer"&gt;Kubernetes the Hard Way&lt;/a&gt;" by Kelsey Hightower despite its complexity.&lt;/p&gt;

&lt;p&gt;Lastly, I got help from the official &lt;a href="https://kubernetes.io/docs/home/" rel="noopener noreferrer"&gt;Kubernetes docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Looking back, is there anything you would do differently or advice you would give to your past self?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Not really. Looking back, KubeSpray was the best option at the time, and there were no significant changes I would make to the decision.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: You've worked on various projects involving bare metal and private clouds. Can you share more about your Kubernetes experience, such as the scale of clusters and nodes?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: At Criteo, I led a NoSQL team supporting several million requests per second on a massive 4,500-node bare-metal cluster. Managing this infrastructure - particularly node failures and data consistency across stateful databases like Cassandra, Couchbase, and Elasticsearch -  was a constant challenge.&lt;/p&gt;

&lt;p&gt;While at Criteo, I also had a personal project where I built a smaller 10-node bare-metal cluster.&lt;br&gt;
This experience with bare metal management solidified my belief in the benefits of Kubernetes, which I later implemented at Criteo.&lt;/p&gt;

&lt;p&gt;When we adopted Kubernetes at Criteo, we encountered initial hurdles. In 2018, Kubernetes operators were still new, and there was internal competition from &lt;a href="https://mesos.apache.org/" rel="noopener noreferrer"&gt;Mesos&lt;/a&gt;. We addressed these challenges by validating Kubernetes performance for our specific needs and building custom Chef recipes, &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#:~:text=StatefulSets-,StatefulSets,-StatefulSet%20is%20the" rel="noopener noreferrer"&gt;StatefulSet &lt;/a&gt;hooks, and startup scripts. &lt;/p&gt;

&lt;p&gt;Migrating to Kubernetes took eight months of dedicated effort. It was a complex process, but it was worth it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: As you’ve mentioned, Kubernetes had competitors in 2018 and continues to do so today. Despite the tooling's immaturity, you led a team to adopt Kubernetes for stateful workloads, which was unconventional. How did you guide your team through this transition?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: We had large instances — all between 50 and 100 CPUs each and 256 gigabytes of RAM up to 500 gigabytes of RAM.&lt;/p&gt;

&lt;p&gt;We had multiple Cassandra clusters on a single Kubernetes cluster, and each Kubernetes node was dedicated to a single Cassandra node. We chose this bare metal setup to optimize disk access with  SSD or NVMe.&lt;/p&gt;

&lt;p&gt;Running these stateful &lt;a href="https://kubernetes.io/docs/concepts/workloads/#:~:text=Workloads-,Workloads,-A%20workload%20is" rel="noopener noreferrer"&gt;workloads &lt;/a&gt;wasn't just a matter of starting them up. We had to handle them carefully because stateful sets like Elasticsearch and Cassandra must keep their data safe even if the machine they're running on fails. &lt;/p&gt;

&lt;p&gt;Kubernetes helped us detect issues with these apps using features like &lt;a href="https://kubernetes.io/docs/tasks/run-application/configure-pdb/" rel="noopener noreferrer"&gt;Pod Disruption Budgets (PDBs)&lt;/a&gt; that limit how often pods can be disrupted, StatefulSets that have consistent ordering of execution and stable storage, and automated &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#:~:text=and%20Startup%20Probes-,Configure%20Liveness%2C%20Readiness%20and%20Startup%20Probes,-This%20page%20shows" rel="noopener noreferrer"&gt;probes &lt;/a&gt;that trigger actions and alerts when something goes wrong.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Your experiences helped me better understand your blog post, The Cost of Upgrading Hundreds of Kubernetes Clusters. After managing large infrastructures, you founded Qovery. What drove you to take this step as an engineer?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Kubernetes has become a standard, but managing it can be a headache for developers. Cloud providers offer a basic Kubernetes setup, but it often needs more features developers need to get started and deploy applications quickly. Managing the cluster and nodes and keeping them up-to-date is time-consuming. Developers must spend a lot of time adding extra tools and configurations on top of the basic setup and then updating everything, which can be time-consuming.&lt;/p&gt;

&lt;p&gt;To tackle these challenges, I founded Qovery. &lt;/p&gt;

&lt;p&gt;Qovery provides two critical solutions. First, it offers a unified, user-friendly stack across cloud providers, simplifying Kubernetes deployment and management complexity. Second, it enables developers to deploy code without hassle.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Managing clusters can have various interpretations. The term can be broad. How do you define cluster management at Qovery in the context of upgrading and recovery?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Yes, that's right. At Qovery, we understand the complexity of managing Kubernetes for customers. That's why we automate and simplify the entire process.&lt;/p&gt;

&lt;p&gt;We automatically notify you about upcoming &lt;a href="https://kubernetes.io/releases/" rel="noopener noreferrer"&gt;Kubernetes updates&lt;/a&gt; and handle the upgrade process on schedule, eliminating the need for manual intervention.&lt;/p&gt;

&lt;p&gt;We deploy and manage various essential charts for your environment, including tools for logging, metrics collection, and certificate management. You don't need to worry about these intricacies.&lt;/p&gt;

&lt;p&gt;We deploy all the necessary infrastructure elements to create a fully functional Kubernetes environment for production within 30 minutes. We provide a complete solution that's ready to go.&lt;/p&gt;

&lt;p&gt;We build your container images, push them to a registry, and deploy them based on your preferences. We also handle the lifecycle of the applications deployed.&lt;/p&gt;

&lt;p&gt;We use &lt;a href="https://github.com/kubernetes/autoscaler" rel="noopener noreferrer"&gt;Cluster Autoscaler&lt;/a&gt; to automatically adjust the number of nodes (cluster size) based on your actual usage to ensure efficiency. Additionally, we deploy &lt;a href="https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler" rel="noopener noreferrer"&gt;Vertical &lt;/a&gt;and &lt;a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/" rel="noopener noreferrer"&gt;Horizontal Pod Autoscalers&lt;/a&gt; to scale your applications' resources as their needs change automatically.&lt;/p&gt;

&lt;p&gt;By taking care of these complexities, Qovery frees your developers to focus solely on what matters most: building incredible applications.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How large is your team of engineers?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: We have ten engineers working on the project.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How do you manage hundreds of clusters with such a small team?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: We run various tests on each code change, including unit tests for individual components and end-to-end tests that simulate real-world usage. These tests cover configurations and deployment scenarios to catch potential issues early on.&lt;/p&gt;

&lt;p&gt;Before deploying a new cluster for a customer, we put it through its paces on our internal systems for weeks. Then, we deploy it to a separate non-production environment where we closely monitor its performance and address any problems before it reaches your applications. &lt;/p&gt;

&lt;p&gt;We closely monitor Kubernetes and cloud providers' updates by following official &lt;a href="https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG" rel="noopener noreferrer"&gt;changelogs&lt;/a&gt;and using RSS feeds, allowing us to anticipate potential issues and adapt our infrastructure proactively.&lt;/p&gt;

&lt;p&gt;We also leverage tools like &lt;a href="https://github.com/doitintl/kube-no-trouble" rel="noopener noreferrer"&gt;Kubent&lt;/a&gt;, &lt;a href="https://github.com/derailed/popeye" rel="noopener noreferrer"&gt;popeye&lt;/a&gt;, &lt;a href="https://github.com/wayfair-incubator/kdave" rel="noopener noreferrer"&gt;kdave&lt;/a&gt;, and &lt;a href="https://github.com/FairwindsOps/pluto" rel="noopener noreferrer"&gt;Pluto &lt;/a&gt;to help us manage &lt;a href="https://kubernetes.io/docs/reference/using-api/deprecation-guide/#:~:text=API%20Migration%20Guide-,Deprecated%20API%20Migration%20Guide,-As%20the%20Kubernetes" rel="noopener noreferrer"&gt;API deprecations&lt;/a&gt; (when Kubernetes deprecates features in updates) and ensure the overall health of our infrastructure.&lt;/p&gt;

&lt;p&gt;Our multi-layered approach has proven successful. We haven't encountered any significant problems when deploying clusters to production environments.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Managing new releases in the Kubernetes ecosystem can be daunting, especially with the extensive changelog. How do you navigate this complexity and spot potential difficulties when a new release is on the horizon?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: While reading the official update changelogs from Kubernetes and cloud providers is our first step, there are other paths to smooth sailing. Furthermore, understanding these detailed technical documents can be challenging, especially for newer team members who don’t have prior on-premise Kubernetes experience.&lt;/p&gt;

&lt;p&gt;Cloud providers typically offer well-defined upgrade processes and document significant changes like removed functionalities, changes in API behavior, or security updates in their changelogs. However, many elements are interconnected in a Kubernetes cluster, especially when you deploy multiple charts for components like logging, observability, and ingress. Even with automated tools, we still need extensive testing and a manual process to ensure everything functions smoothly after an update.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: So, what is your upgrading plan for helm charts?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Upgrading &lt;a href="https://helm.sh/docs/topics/charts/#:~:text=Contribute%20to%20Docs-,Charts,-Helm%20uses%20a" rel="noopener noreferrer"&gt;Helm charts&lt;/a&gt; can be tricky because they bundle both the deployment and the software; for example, upgrading the &lt;a href="https://grafana.com/docs/loki/latest/" rel="noopener noreferrer"&gt;Loki &lt;/a&gt;chart also upgrades Loki itself. To better understand what's changing, you need to review two changelogs: one for the chart itself and another for the software it includes.&lt;/p&gt;

&lt;p&gt;We keep a close eye on all the charts we use by storing them in a central repository. This way, we have a clear history of every version we've used. We use a tool called &lt;a href="https://github.com/Qovery/helm-freeze" rel="noopener noreferrer"&gt;helm-freeze&lt;/a&gt; to lock down the specific version of each chart we want to use. We can also track changes between chart and software versions using the &lt;code&gt;git diff&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;If needed, we can also adjust specific settings within the chart using values override.&lt;/p&gt;

&lt;p&gt;Like any other code change, we thoroughly test the upgraded charts with unit and functional tests to ensure everything works as expected.&lt;/p&gt;

&lt;p&gt;Once testing is complete, we route the updated charts to our test cluster for a final round of real-world testing. After a few days of monitoring, if everything looks good, we confidently release the updates to our customers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How do you handle unexpected situations? Do you have a specific strategy or write more automation in the Helm charts?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: We're excited to see more community Helm charts, including &lt;a href="https://helm.sh/docs/topics/chart_tests/#:~:text=Contribute%20to%20Docs-,Chart%20Tests,-A%20chart%20contains" rel="noopener noreferrer"&gt;built-in tests&lt;/a&gt;! This practice will make it easier for everyone to trust and use these charts in the future.&lt;/p&gt;

&lt;p&gt;At Qovery, we enable specific Helm options by default, like '&lt;a href="https://helm.sh/docs/helm/helm_install/#:~:text=Options-,%2D%2Datomic,-if%20set%2C%20the" rel="noopener noreferrer"&gt;atomic&lt;/a&gt;' and '&lt;a href="https://helm.sh/docs/helm/helm_install/#:~:text=version%20is%20used-,%2D%2Dwait,-if%20set%2C%20will" rel="noopener noreferrer"&gt;wait&lt;/a&gt;,' which help prevent upgrade failures during the process. However, there can still be issues that only show up in the logs, so we run additional tests specifically designed to catch these hidden problems.&lt;/p&gt;

&lt;p&gt;Upgrading charts that deploy &lt;a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#:~:text=Custom%20Resources-,Custom%20Resources,-Custom%20resources%20are" rel="noopener noreferrer"&gt;Custom Resource Definitions (CRDs)&lt;/a&gt; requires special attention. We've automated this process to upgrade the CRDs first (to the required version) and then upgrade the chart itself. Additionally, for critical upgrades like cert-manager (which manages certificates), we back up and restore resources before applying the upgrade to avoid losing any critical certificates.&lt;/p&gt;

&lt;p&gt;If you’re running an older version of a non-critical tool like a logging system, upgrading through each minor version one by one can be time-consuming. We have a better way! Our system allows you to skip to the desired newer version, bypassing all those intermediate updates.&lt;/p&gt;

&lt;p&gt;We've also built safeguards into our system to handle potential problems before they occur during cluster upgrades. For example, the system checks for issues like failed jobs, incorrect Pod Disruption Budgets configuration, or ongoing processes that might block the upgrade. If it detects any problems, our engine automatically attempts to fix or clean up the issue. It will also warn you if any manual intervention is needed.&lt;/p&gt;

&lt;p&gt;Our ultimate goal is to automate the upgrade process as much as possible.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Would you say CRDs are your favorite feature in Kubernetes, or do you have another one?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: CRDs are a powerful tool for customizing Kubernetes, offering a high degree of flexibility. However, the current support and tooling around them leave room for improvement. For example, enhancing Helm with better CRD management capabilities would significantly improve the user experience.&lt;/p&gt;

&lt;p&gt;Despite these limitations, the potential of CRDs for customizing Kubernetes is undeniable, making them a genuinely standout feature.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: With your vast Kubernetes experience since 2016, how does your current process scale beyond 100 clusters? What do you need for such scalability?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: While basic application metrics can provide a general sense of health, managing hundreds of clusters requires more in-depth testing. Here at Qovery, with our experience handling nearly 300 clusters, we've found that:&lt;/p&gt;

&lt;p&gt;More than basic metrics are needed. We need comprehensive testing that leverages application-specific metrics to ensure everything functions as expected.&lt;/p&gt;

&lt;p&gt;Scaling requires more granular control over deployments, such as halting failures and providing detailed information to our users. For instance, quota issues from the cloud provider might necessitate user intervention.&lt;/p&gt;

&lt;p&gt;Drawing from my experience at Criteo, where robust tooling was essential for managing complex tasks, powerful tools are the key to effectively scaling beyond 100 clusters.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Looking ahead at Qovery's roadmap, what's next for your team?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;:  Qovery will add &lt;a href="https://cloud.google.com/?hl=en" rel="noopener noreferrer"&gt;Google Cloud Platform (GCP)&lt;/a&gt; by year-end, joining &lt;a href="https://aws.amazon.com/" rel="noopener noreferrer"&gt;AWS &lt;/a&gt;and &lt;a href="https://www.scaleway.com/en/" rel="noopener noreferrer"&gt;Scaleway&lt;/a&gt;! This expansion gives you more choices for your cloud needs.&lt;/p&gt;

&lt;p&gt;We're extracting reusable code sections, like those related to Helm integration, and transforming them into dedicated libraries. By making these functionalities available as open-source libraries, we empower the developer community to leverage them in their projects.&lt;/p&gt;

&lt;p&gt;We strongly believe in &lt;a href="https://www.rust-lang.org/" rel="noopener noreferrer"&gt;Rust &lt;/a&gt;as a powerful language for building production-grade software, especially for systems like ours that run alongside Kubernetes. &lt;/p&gt;

&lt;p&gt;We're also developing a service catalog feature that offers a user-friendly interface and streamlines complex deployments. This feature will allow users to focus on their applications, not the intricacies of the underlying technology.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: Do you have any plans to include &lt;a href="https://azure.microsoft.com/en-us" rel="noopener noreferrer"&gt;Azure&lt;/a&gt;?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Yes, we have, but integrating a new cloud provider, given our current team size, is challenging. While we are a team of seniors, each cloud provider has nuances; some are more mature or resource-extensive than others. &lt;/p&gt;

&lt;p&gt;Today, our focus is on AWS and GCP, as our customers most request. However, we're also working on a more modular approach that will allow Qovery to be deployed on any Kubernetes cluster, irrespective of the cloud provider, although this is still in progress.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: We're looking forward to hearing more about that. So, with your black belt in karate, how does that experience influence how you approach challenges, breaking them down into manageable steps?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Karate has taught me the importance of discipline, focus, and breaking down complex tasks into manageable steps. Like in karate, where each move is deliberate and precise, I apply the same approach to challenges in my work, breaking them down into smaller, achievable goals. &lt;/p&gt;

&lt;p&gt;Karate has also instilled in me a sense of perseverance and resilience, which are invaluable when facing difficult situations.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: I'm a huge martial arts fan. How do you see martial arts' influence on managing stress in challenging situations?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: It varies from person to person. My experience in the banking industry has shown me that while some can handle stressful situations, others struggle. Martial arts can help manage stress somewhat, depending on the person.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: How has your 25-year journey in karate shaped your perspective?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: Karate has become a part of me, and I plan to continue as long as possible. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Bart&lt;/strong&gt;: What's the best way to reach out to you&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pierre&lt;/strong&gt;: You can reach me on LinkedIn or via email. I'm always happy to help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrap up 🌄&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If you enjoyed this interview and want to listen to more Kubernetes stories and opinions, head to &lt;a href="https://kube.fm" rel="noopener noreferrer"&gt;KubeFM&lt;/a&gt; and subscribe to the podcast.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you want to keep up-to-date with Kubernetes, subscribe to &lt;a href="https://learnk8s.io/learn-kubernetes-weekly" rel="noopener noreferrer"&gt;Learn Kubernetes  Weekly&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you want to become an expert in Kubernetes, look at courses on &lt;a href="https://learnk8s.io/training" rel="noopener noreferrer"&gt;Learnk8s&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And finally, if you want to keep in touch with me, follow me on &lt;a href="https://www.linkedin.com/in/gulcantopcu/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>cloudnative</category>
    </item>
  </channel>
</rss>
