<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Satyaki</title>
    <description>The latest articles on DEV Community by Satyaki (@blackzu).</description>
    <link>https://dev.to/blackzu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F92977%2F8e14d52c-e9f3-4c96-9ec5-3e4fdc11018a.png</url>
      <title>DEV Community: Satyaki</title>
      <link>https://dev.to/blackzu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/blackzu"/>
    <language>en</language>
    <item>
      <title>Anatomy of a High-CPU Crisis: Why Your Code Might Not Be the Problem</title>
      <dc:creator>Satyaki</dc:creator>
      <pubDate>Fri, 29 May 2026 12:33:19 +0000</pubDate>
      <link>https://dev.to/blackzu/anatomy-of-a-high-cpu-crisis-why-your-code-might-not-be-the-problem-17f1</link>
      <guid>https://dev.to/blackzu/anatomy-of-a-high-cpu-crisis-why-your-code-might-not-be-the-problem-17f1</guid>
      <description>&lt;p&gt;Your primary application service is screaming at 100% CPU utilization. &lt;/p&gt;

&lt;p&gt;As engineering leaders and DevOps practitioners, our immediate instinct is usually a binary choice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Infrastructure Guess:&lt;/strong&gt; &lt;em&gt;“We must be getting hit with a massive surge of user traffic. Scale it out!”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Software Guess:&lt;/strong&gt; &lt;em&gt;“A developer pushed a broken &lt;code&gt;while(true)&lt;/code&gt; loop. Revert the commit!”&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But senior systems engineers know a deeper truth: &lt;strong&gt;A computer is a tightly coupled ecosystem.&lt;/strong&gt; A bottleneck in a completely passive resource—like a disk or raw memory—can masquerade as a devastating CPU crisis downstream. &lt;/p&gt;

&lt;p&gt;If you want to move past shallow dashboard watching and truly understand Linux internals during a production outage, we have to look at how applications actually exploit hardware, and exactly how the dots connect when a system begins to melt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Blueprint: The Office Desk Analogy
&lt;/h2&gt;

&lt;p&gt;To understand how software interacts with system hardware, let's look at a running application instance as a human accountant named &lt;strong&gt;"App"&lt;/strong&gt; sitting at an office desk.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware Component&lt;/th&gt;
&lt;th&gt;The System Reality&lt;/th&gt;
&lt;th&gt;The Office Analogy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU (Processing)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The speed at which execution cycles occur.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;App's Brain Power.&lt;/strong&gt; How fast App can read an instruction, calculate math, and execute tasks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM (Memory)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Volatile, high-speed space for active variables.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;The Desktop Surface.&lt;/strong&gt; A fast, easily accessible space where files are laid out flat to be worked on. Space is limited.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Disk (Storage)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-volatile, high-capacity, slower storage block.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;The Filing Cabinet in the Basement.&lt;/strong&gt; Holds massive amounts of historical data, but walking down to get it takes time.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In a healthy system, the CPU is the &lt;em&gt;only&lt;/em&gt; engine doing actual work. RAM and Disk are completely passive grids of silicon and magnets; they cannot move a single byte of data on their own. Every calculation, every file copy, and every memory cleanup cycle requires the CPU's brain power.&lt;/p&gt;

&lt;p&gt;Because the CPU manages all three domains, a failure in Storage or Memory will immediately force the CPU to stop handling business logic and suffocate under infrastructure housekeeping.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. When Memory Attacks the CPU: The Panicked Janitor Loop
&lt;/h2&gt;

&lt;p&gt;High-level runtimes (like Java, Node.js, and Python) utilize an automated internal process called the &lt;strong&gt;Garbage Collector (GC)&lt;/strong&gt;. Think of the GC as a background janitor whose only job is to walk around App's desk, find papers that are no longer needed, and toss them in the trash to keep the workspace clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Meltdown Mechanics
&lt;/h3&gt;

&lt;p&gt;Imagine your code hits a slow memory leak. Variables accumulate, and the desk surface (RAM) hits 98% capacity. &lt;/p&gt;

&lt;p&gt;The background Janitor panics. He starts sprinting around the desk at breakneck speed, checking every single piece of paper over and over again, desperately hunting for something he can safely discard. He finds nothing, spins around, and instantly checks again.&lt;/p&gt;

&lt;p&gt;Because the Janitor is moving his arms and legs billions of times a second, &lt;strong&gt;he consumes 100% of the room's physical energy (CPU).&lt;/strong&gt; The application's brain is completely pinned, not because it's processing user transactions, but because it is hyperventilating over a lack of desk space. &lt;/p&gt;

&lt;p&gt;Eventually, the Linux kernel loses patience with the unworkable chaos, steps in as the building manager, and forcefully shoots the process in the head via the &lt;strong&gt;OOM (Out Of Memory) Killer&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. When Disk Attacks the CPU: The Filing Failure Loop
&lt;/h2&gt;

&lt;p&gt;Applications are strictly designed to keep an audit trail of their operations via logging frameworks. Every time App completes a task, it writes an audit note on an index card and dispatches it down to the basement filing cabinet (Disk).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Meltdown Mechanics
&lt;/h3&gt;

&lt;p&gt;What happens when that filing cabinet hits 100% capacity? &lt;/p&gt;

&lt;p&gt;App tries to slide a logging card into a jammed drawer. Linux rejects the write operation and throws an error: &lt;code&gt;No space left on device&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;If the application’s error-handling architecture isn't flawlessly designed, a catastrophic trap springs open:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The app fails to write its standard log line due to a full disk.&lt;/li&gt;
&lt;li&gt;The code catches that exception and says: &lt;em&gt;"An error occurred! Let me immediately write an explicit emergency error report to the log file!"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;The app tries to write the emergency report to the exact same jammed cabinet. It fails again.&lt;/li&gt;
&lt;li&gt;The error-handler catches &lt;em&gt;that&lt;/em&gt; failure and loops back instantly to retry.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This error loop executes millions of times a second. The CPU core is instantly pinned at 100% capacity, trapped in a frantic, hysterical loop of trying to record its own storage failures into a locked drawer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Senior DevOps Playbook: Triage and Surgical Root Cause
&lt;/h2&gt;

&lt;p&gt;When a 100% CPU alert wakes you up, you can execute a definitive diagnostic triage by following these sequential steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Protect the Users (Stop the Bleeding)
&lt;/h3&gt;

&lt;p&gt;Do not try to debug a live server while it is dropping customer traffic. Instantly remove the failing instance from your Application Load Balancer (ALB) Target Group or isolate it from your Auto Scaling Group (ASG). Allow the ASG to spin up a fresh, healthy instance to assume the user load, and keep the mutilated server alive in an isolated sandbox for debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: The Traffic vs. Code Fork
&lt;/h3&gt;

&lt;p&gt;Log into the isolated instance via SSH or AWS SSM and run &lt;code&gt;top&lt;/code&gt;. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scenario A:&lt;/strong&gt; The CPU usage immediately plummets to near-zero (&lt;code&gt;99.5% id&lt;/code&gt; or Idle). 

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;The Verdict:&lt;/em&gt; &lt;strong&gt;Your code is completely fine.&lt;/strong&gt; The instance melted down purely because of a massive surge of legitimate user traffic. The second you cut the traffic, the CPU relaxed. Your immediate solution is horizontal scaling (more instances) or vertical scaling (larger instance sizes).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Scenario B:&lt;/strong&gt; The instance has zero public user traffic hitting it, but the CPU is &lt;em&gt;still&lt;/em&gt; pinned at 100%.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;The Verdict:&lt;/em&gt; &lt;strong&gt;You have a localized environment or code failure.&lt;/strong&gt; Proceed to Step 3.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Check for Infrastructure Collateral Damage
&lt;/h3&gt;

&lt;p&gt;Before reading application logs, query the hardware status:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;dmesg -T | grep -i oom&lt;/code&gt; to inspect the Linux Kernel’s emergency logbook. If you see the OS actively slaughtering processes, your CPU spike is a downstream symptom of a critical memory starvation event.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;df -h&lt;/code&gt; to check disk utilization across your mounted partitions. If a partition is flatlined at 100%, you are likely dealing with an infinite error-logging loop. Clear out old log buffers or expand the EBS volume to instantly free the CPU.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Surgical Thread Inspection
&lt;/h3&gt;

&lt;p&gt;If memory and disk are completely healthy, a rogue software loop is actively spinning out of control. Open &lt;code&gt;htop&lt;/code&gt; to identify the exact culprit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Press &lt;strong&gt;&lt;code&gt;F5&lt;/code&gt;&lt;/strong&gt; to switch to &lt;strong&gt;Tree View&lt;/strong&gt;. This maps out the exact lineage of parent and child processes.&lt;/li&gt;
&lt;li&gt;Press &lt;strong&gt;&lt;code&gt;Shift + H&lt;/code&gt;&lt;/strong&gt; to toggle &lt;strong&gt;Userland Threads&lt;/strong&gt;. &lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Internal Linux Nuance:&lt;/strong&gt; In Linux, child processes are completely independent programs with isolated memory boundaries. Threads are internal workers ("Lightweight Processes" or LWPs) sharing the exact same memory building. While tools like &lt;code&gt;htop&lt;/code&gt; display a Thread's unique identification number under the &lt;code&gt;PID&lt;/code&gt; column for convenience, it is technically a &lt;strong&gt;TID (Thread ID)&lt;/strong&gt; executing within a shared &lt;strong&gt;TGID (Thread Group ID)&lt;/strong&gt; block. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you expand the thread view using &lt;code&gt;Shift + H&lt;/code&gt;, threads are easily distinguished from child processes because they inherit the exact same parent command string and their row text color is automatically dimmed out by &lt;code&gt;htop&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxenerwp69wwykzkur9xt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxenerwp69wwykzkur9xt.png" alt=" " width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sort by CPU percentage (&lt;code&gt;F6&lt;/code&gt;). Identify the exact &lt;strong&gt;Thread ID (TID)&lt;/strong&gt; riding at the absolute top of the processing stack.&lt;/li&gt;
&lt;/ol&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
  PID  Command
 503  python3 main.py
 504  └─ python3 main.py  &amp;lt;-- [Dimmed Text: This specific Thread is the 100% CPU culprit]
 505  └─ python3 main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>devops</category>
      <category>linux</category>
      <category>architecture</category>
      <category>troubleshooting</category>
    </item>
    <item>
      <title>Docker Builds Were Taking 10 Minutes. This One Change Brought It Down to Seconds</title>
      <dc:creator>Satyaki</dc:creator>
      <pubDate>Sat, 23 May 2026 12:37:50 +0000</pubDate>
      <link>https://dev.to/blackzu/docker-builds-were-taking-10-minutes-this-one-change-brought-it-down-to-seconds-4kd7</link>
      <guid>https://dev.to/blackzu/docker-builds-were-taking-10-minutes-this-one-change-brought-it-down-to-seconds-4kd7</guid>
      <description>&lt;p&gt;If you work with large Docker builds in production, especially with multi-module Spring Boot applications, you’ve probably suffered through this:&lt;/p&gt;

&lt;p&gt;You change one tiny &lt;code&gt;application.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then you rebuild the image.&lt;/p&gt;

&lt;p&gt;And suddenly Docker starts downloading half the internet again.&lt;/p&gt;

&lt;p&gt;I recently faced this while working on a multi-module Spring Boot application with multiple &lt;code&gt;pom.xml&lt;/code&gt; files and a huge dependency tree. Every rebuild felt painful. Sometimes the build would sit for 8–10 minutes just resolving Maven dependencies before even packaging the app.&lt;/p&gt;

&lt;p&gt;The worst part?&lt;/p&gt;

&lt;p&gt;The actual code change was tiny.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem Wasn't Maven
&lt;/h2&gt;

&lt;p&gt;It was the Docker layer strategy.&lt;/p&gt;

&lt;p&gt;A lot of Dockerfiles are written like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;mvn package
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks simple.&lt;/p&gt;

&lt;p&gt;But this completely destroys Docker layer caching.&lt;/p&gt;

&lt;p&gt;Every Docker instruction creates a separate immutable layer.&lt;/p&gt;

&lt;p&gt;Each layer gets its own content hash internally. If any layer changes, Docker invalidates all layers beneath it and rebuilds them again.&lt;/p&gt;

&lt;p&gt;So if your Dockerfile copies source code before resolving dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;mvn package
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then every source code change forces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maven dependency resolution&lt;/li&gt;
&lt;li&gt;Plugin downloads&lt;/li&gt;
&lt;li&gt;Packaging&lt;/li&gt;
&lt;li&gt;Recompilation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;all over again.&lt;/p&gt;

&lt;p&gt;Even though dependencies never changed.&lt;/p&gt;

&lt;p&gt;That’s where most of the build time gets wasted.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Optimization That Changed Everything
&lt;/h2&gt;

&lt;p&gt;I switched to Docker BuildKit.&lt;/p&gt;

&lt;p&gt;At the top of the Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# syntax=docker/dockerfile:1.4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This connects the Dockerfile frontend to the modern BuildKit backend and unlocks advanced features like cache mounts.&lt;/p&gt;

&lt;p&gt;Then instead of doing a normal dependency resolution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;mvn dependency:go-offline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nt"&gt;--mount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cache,target&lt;span class="o"&gt;=&lt;/span&gt;/root/.m2 &lt;span class="se"&gt;\
&lt;/span&gt;    mvn dependency:go-offline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was the game changer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;--mount=type=cache&lt;/code&gt; Actually Does
&lt;/h2&gt;

&lt;p&gt;Normally Maven stores dependencies inside:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/root/.m2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without BuildKit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dependencies download every fresh build&lt;/li&gt;
&lt;li&gt;Docker throws them away after the layer rebuilds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With BuildKit cache mounts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker creates a persistent cache directory on the host&lt;/li&gt;
&lt;li&gt;Maven dependencies stay cached&lt;/li&gt;
&lt;li&gt;Future builds instantly reuse them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So after the first build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dependencies no longer redownload&lt;/li&gt;
&lt;li&gt;Rebuilds become dramatically faster&lt;/li&gt;
&lt;li&gt;Iterative development becomes smooth again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My rebuild time dropped from several minutes to just a few seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Other Optimization Most People Miss
&lt;/h2&gt;

&lt;p&gt;Instruction ordering inside Dockerfiles matters a lot.&lt;/p&gt;

&lt;p&gt;This pattern is extremely important:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; pom.xml .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;mvn dependency:go-offline

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because &lt;code&gt;pom.xml&lt;/code&gt; changes far less frequently than application source code.&lt;/p&gt;

&lt;p&gt;Docker can now cache dependency resolution separately from source changes.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Changing Java code only rebuilds packaging layers&lt;/li&gt;
&lt;li&gt;Dependency layers remain untouched&lt;/li&gt;
&lt;li&gt;Rebuilds stay fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you reverse the order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src ./src&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; pom.xml .&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then every source code change invalidates all subsequent layers.&lt;/p&gt;

&lt;p&gt;That’s catastrophic for large Java builds.&lt;/p&gt;

&lt;h2&gt;
  
  
  But Wait… How Does This Work in CI/CD?
&lt;/h2&gt;

&lt;p&gt;At this point I had another question myself.&lt;/p&gt;

&lt;p&gt;If CI runners like GitHub Actions use fresh ephemeral machines every run, then where is the cache actually stored?&lt;/p&gt;

&lt;p&gt;Because after every pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runner gets destroyed&lt;/li&gt;
&lt;li&gt;Filesystem disappears&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.m2&lt;/code&gt; cache disappears too&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So how does caching survive across builds?&lt;/p&gt;

&lt;p&gt;The answer is BuildKit remote cache export/import.&lt;/p&gt;

&lt;p&gt;In GitHub Actions, you can persist BuildKit cache across runners like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-buildx-action@v3&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
      &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./Dockerfile&lt;/span&gt;
      &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myimage:latest&lt;/span&gt;

      &lt;span class="na"&gt;cache-from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha&lt;/span&gt;
      &lt;span class="na"&gt;cache-to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha,mode=max&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is insanely powerful.&lt;/p&gt;

&lt;p&gt;What happens here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache-to&lt;/code&gt; pushes BuildKit cache to GitHub's ephemeral cache storage&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache-from&lt;/code&gt; restores it in future workflow runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So even though every runner is brand new:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maven dependencies stay cached&lt;/li&gt;
&lt;li&gt;Docker layers stay reusable&lt;/li&gt;
&lt;li&gt;Builds remain fast across pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how modern production CI/CD systems optimize container builds at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters in Production
&lt;/h2&gt;

&lt;p&gt;This isn’t just about developer convenience.&lt;/p&gt;

&lt;p&gt;In production engineering environments this directly impacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD speed&lt;/li&gt;
&lt;li&gt;Deployment frequency&lt;/li&gt;
&lt;li&gt;Compute costs&lt;/li&gt;
&lt;li&gt;Feedback loops&lt;/li&gt;
&lt;li&gt;Developer productivity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams deploying dozens or hundreds of services, shaving even 5 minutes off builds compounds into massive engineering efficiency gains.&lt;/p&gt;

&lt;p&gt;Especially in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microservice architectures&lt;/li&gt;
&lt;li&gt;Monorepos&lt;/li&gt;
&lt;li&gt;Multi-module Maven projects&lt;/li&gt;
&lt;li&gt;Kubernetes delivery pipelines&lt;/li&gt;
&lt;li&gt;High-frequency deployment environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;If your Docker builds are painfully slow, don’t immediately blame:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maven&lt;/li&gt;
&lt;li&gt;Spring Boot&lt;/li&gt;
&lt;li&gt;Network latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of the time the real issue is poor Docker layer design and missing cache strategy.&lt;/p&gt;

&lt;p&gt;A properly structured Dockerfile combined with BuildKit caching can reduce rebuild times from 10 minutes to a few seconds.&lt;/p&gt;

&lt;p&gt;And once you experience that speed difference, there’s no going back.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>performance</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Image Volume Type GA in Kubernetes 1.36 — Finally Killing the Init Container Copy Pattern</title>
      <dc:creator>Satyaki</dc:creator>
      <pubDate>Thu, 21 May 2026 05:49:06 +0000</pubDate>
      <link>https://dev.to/blackzu/image-volume-type-ga-in-kubernetes-136-finally-killing-the-init-container-copy-pattern-182k</link>
      <guid>https://dev.to/blackzu/image-volume-type-ga-in-kubernetes-136-finally-killing-the-init-container-copy-pattern-182k</guid>
      <description>&lt;p&gt;For years, Kubernetes engineers have used the same awkward pattern whenever an application needed large read-only assets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Init container&lt;/li&gt;
&lt;li&gt;&lt;code&gt;emptyDir&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cp -r&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Wait for startup&lt;/li&gt;
&lt;li&gt;Duplicate storage usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It worked, but it always felt like a workaround rather than a first-class Kubernetes primitive.&lt;/p&gt;

&lt;p&gt;With Kubernetes 1.36, the &lt;code&gt;image&lt;/code&gt; volume type is now GA, and it fundamentally changes how Pods can consume immutable file bundles.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pulling files into an init container,&lt;/li&gt;
&lt;li&gt;copying them into an &lt;code&gt;emptyDir&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;and mounting them into the main container,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kubernetes can now directly mount an OCI image filesystem into a container as a read-only volume.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no init-container copy step,&lt;/li&gt;
&lt;li&gt;no duplicated bytes on disk,&lt;/li&gt;
&lt;li&gt;faster startup times,&lt;/li&gt;
&lt;li&gt;smaller artifact images,&lt;/li&gt;
&lt;li&gt;and much cleaner Pod specs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This becomes especially powerful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ML models&lt;/li&gt;
&lt;li&gt;static websites&lt;/li&gt;
&lt;li&gt;WASM modules&lt;/li&gt;
&lt;li&gt;OPA bundles&lt;/li&gt;
&lt;li&gt;language packs&lt;/li&gt;
&lt;li&gt;Grafana dashboards&lt;/li&gt;
&lt;li&gt;plugin distributions&lt;/li&gt;
&lt;li&gt;and any independently versioned read-only asset bundle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let me walk through a realistic end-to-end scenario.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Scenario
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Team A owns nginx (the web server).&lt;/li&gt;
&lt;li&gt;Team B owns the website content (HTML/CSS/JS).&lt;/li&gt;
&lt;li&gt;Team B ships new content 5x/day.&lt;/li&gt;
&lt;li&gt;Team A ships nginx config changes maybe once a month.&lt;/li&gt;
&lt;li&gt;They should not be coupled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The static assets live in an OCI image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.example.com/web/site-assets:v42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scratch + files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No shell. No entrypoint. Just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/site/index.html
/site/style.css
/site/app.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  The Old Way (Pre-1.36): Init Container + emptyDir
&lt;/h1&gt;

&lt;h2&gt;
  
  
  How You Built the Assets Image
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile for site-assets&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;busybox:1.36&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./site /site&lt;/span&gt;

&lt;span class="c"&gt;# Final image needs a shell because the init container will run `cp`&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; busybox:1.36&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=source /site /site&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice something important:&lt;/p&gt;

&lt;p&gt;You cannot use &lt;code&gt;FROM scratch&lt;/code&gt; here because the init container needs tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cp&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the image is bloated with BusyBox purely to enable the copy operation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pod Manifest
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web-old-way&lt;/span&gt;

&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regcred&lt;/span&gt;

  &lt;span class="na"&gt;initContainers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;load-assets&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.example.com/web/site-assets:v42&lt;/span&gt;

      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sh&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cp -r /site/* /shared/&lt;/span&gt;

      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shared&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/shared&lt;/span&gt;

  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.27&lt;/span&gt;

      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shared&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/usr/share/nginx/html&lt;/span&gt;
          &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shared&lt;/span&gt;
      &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Actually Happening Under the Hood
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Images are pulled
&lt;/h3&gt;

&lt;p&gt;Kubelet pulls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;nginx:1.27&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;site-assets:v42&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;using the Pod's &lt;code&gt;imagePullSecrets&lt;/code&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Kubelet creates the emptyDir
&lt;/h3&gt;

&lt;p&gt;Kubernetes creates an actual directory on the node filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/var/lib/kubelet/pods/&amp;lt;pod-uid&amp;gt;/volumes/kubernetes.io~empty-dir/shared/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point it is completely empty.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Init container starts
&lt;/h3&gt;

&lt;p&gt;The init container's root filesystem is the &lt;code&gt;site-assets&lt;/code&gt; image.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;emptyDir&lt;/code&gt; gets bind-mounted into the init container at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/shared
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  4. The copy operation happens
&lt;/h3&gt;

&lt;p&gt;The init container executes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /site/&lt;span class="k"&gt;*&lt;/span&gt; /shared/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every byte gets physically copied:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;image layer -&amp;gt; emptyDir
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  5. Init container exits
&lt;/h3&gt;

&lt;p&gt;Kubelet records successful completion.&lt;/p&gt;




&lt;h3&gt;
  
  
  6. nginx starts
&lt;/h3&gt;

&lt;p&gt;The same &lt;code&gt;emptyDir&lt;/code&gt; is mounted into nginx at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/usr/share/nginx/html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;nginx now serves the copied files.&lt;/p&gt;




&lt;h1&gt;
  
  
  Real Problems With the Old Pattern
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Disk Usage Doubles
&lt;/h2&gt;

&lt;p&gt;The files now exist in two places:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/var/lib/containerd/...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AND&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;emptyDir
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A 2 GB ML model becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2 GB image layer + 2 GB copy = 4 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;per Pod.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Startup Latency
&lt;/h2&gt;

&lt;p&gt;Large copy operations are expensive.&lt;/p&gt;

&lt;p&gt;A 2 GB copy operation can easily add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10–30 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;before the main container even starts.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. You Need a Shell
&lt;/h2&gt;

&lt;p&gt;You cannot use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; scratch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;because the image needs tooling like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cp&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;larger images&lt;/li&gt;
&lt;li&gt;more CVEs&lt;/li&gt;
&lt;li&gt;more attack surface&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Verbose YAML
&lt;/h2&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;init containers&lt;/li&gt;
&lt;li&gt;shared volumes&lt;/li&gt;
&lt;li&gt;multiple mounts&lt;/li&gt;
&lt;li&gt;copy commands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All just to move files around.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. No Sharing Across Pods
&lt;/h2&gt;

&lt;p&gt;Every Pod independently copies the same bytes into its own &lt;code&gt;emptyDir&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Ten Pods on one node means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 independent copies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  The New Way (Kubernetes 1.36): Image Volume Type
&lt;/h1&gt;

&lt;h2&gt;
  
  
  How You Build the Assets Image Now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; scratch&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./site /site&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it.&lt;/p&gt;

&lt;p&gt;No shell.&lt;br&gt;
No BusyBox.&lt;br&gt;
No executables.&lt;/p&gt;

&lt;p&gt;Just files.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Pod Manifest
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;

&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web-new-way&lt;/span&gt;

&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regcred&lt;/span&gt;

  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.27&lt;/span&gt;

      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;assets&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/usr/share/nginx/html&lt;/span&gt;
          &lt;span class="na"&gt;subPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;site&lt;/span&gt;

  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;assets&lt;/span&gt;

      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.example.com/web/site-assets:v42&lt;/span&gt;
        &lt;span class="na"&gt;pullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IfNotPresent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;No init container.&lt;br&gt;
No &lt;code&gt;emptyDir&lt;/code&gt;.&lt;br&gt;
No copy operation.&lt;/p&gt;


&lt;h1&gt;
  
  
  What's Actually Happening Under the Hood Now
&lt;/h1&gt;
&lt;h2&gt;
  
  
  1. Kubelet sees an image volume
&lt;/h2&gt;

&lt;p&gt;Kubelet notices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and asks the CRI runtime to mount the image.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Runtime pulls the image
&lt;/h2&gt;

&lt;p&gt;containerd or CRI-O pulls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;site-assets:v42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;using the normal image pull pipeline.&lt;/p&gt;

&lt;p&gt;Exactly the same mechanism used for container images.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Runtime unpacks image layers
&lt;/h2&gt;

&lt;p&gt;The image gets unpacked into the runtime snapshot store.&lt;/p&gt;

&lt;p&gt;For example with containerd:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;overlayfs snapshotter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No container is started.&lt;/p&gt;

&lt;p&gt;Only the filesystem is materialized.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Filesystem is bind-mounted directly
&lt;/h2&gt;

&lt;p&gt;The runtime bind-mounts the image filesystem directly into the nginx container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/usr/share/nginx/html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read-only by design.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. nginx starts immediately
&lt;/h2&gt;

&lt;p&gt;No copy step.&lt;br&gt;
No waiting.&lt;/p&gt;

&lt;p&gt;The files already exist.&lt;/p&gt;


&lt;h1&gt;
  
  
  The Mental Model Shift
&lt;/h1&gt;

&lt;p&gt;The old model was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Start a helper container and copy files somewhere."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The new model is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Mount an OCI image filesystem directly as a volume."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sounds subtle, but architecturally it's a major shift.&lt;/p&gt;

&lt;p&gt;Kubernetes is effectively treating OCI images as generic immutable data artifacts — not just runnable containers.&lt;/p&gt;


&lt;h1&gt;
  
  
  What You Gain
&lt;/h1&gt;
&lt;h2&gt;
  
  
  No Data Duplication
&lt;/h2&gt;

&lt;p&gt;The image is bind-mounted directly.&lt;/p&gt;

&lt;p&gt;No copy operation.&lt;/p&gt;


&lt;h2&gt;
  
  
  Faster Startup
&lt;/h2&gt;

&lt;p&gt;You eliminate the init-container copy phase entirely.&lt;/p&gt;

&lt;p&gt;For large datasets or ML models, this is massive.&lt;/p&gt;


&lt;h2&gt;
  
  
  Smaller and Safer Images
&lt;/h2&gt;

&lt;p&gt;You can now use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; scratch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;smaller images&lt;/li&gt;
&lt;li&gt;fewer CVEs&lt;/li&gt;
&lt;li&gt;reduced attack surface&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Always Read-Only
&lt;/h2&gt;

&lt;p&gt;Image volumes are immutable by specification.&lt;/p&gt;

&lt;p&gt;The runtime enforces it.&lt;/p&gt;

&lt;p&gt;Applications cannot modify the mounted content.&lt;/p&gt;




&lt;h2&gt;
  
  
  Shared Across Pods
&lt;/h2&gt;

&lt;p&gt;Ten Pods mounting the same image on the same node share the same underlying bytes.&lt;/p&gt;

&lt;p&gt;Huge improvement for large artifact distribution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cleaner YAML
&lt;/h2&gt;

&lt;p&gt;The Pod spec now clearly expresses intent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Mount this image's filesystem here."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;instead of implementing an entire file-copy workflow.&lt;/p&gt;




&lt;h1&gt;
  
  
  Important Caveats
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Image Volumes Are Always Read-Only
&lt;/h2&gt;

&lt;p&gt;If your application needs writable storage, use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;emptyDir&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;PVCs&lt;/li&gt;
&lt;li&gt;ephemeral storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  subPath Is Extremely Useful
&lt;/h2&gt;

&lt;p&gt;If your files live under:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/site
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;inside the image but you want them mounted directly into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/usr/share/nginx/html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;subPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;site&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;solves that cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  pullPolicy Works Exactly Like Container Images
&lt;/h2&gt;

&lt;p&gt;You can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;pullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Always&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;pullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IfNotPresent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;exactly as you already do with containers.&lt;/p&gt;




&lt;h2&gt;
  
  
  No Environment Variable Substitution
&lt;/h2&gt;

&lt;p&gt;This does NOT work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;reference&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${ASSET_VERSION}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The field is literal.&lt;/p&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Helm&lt;/li&gt;
&lt;li&gt;Kustomize&lt;/li&gt;
&lt;li&gt;templating&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running Pods Don't Automatically Update
&lt;/h2&gt;

&lt;p&gt;If somebody pushes a new image to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;:v42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;existing Pods continue using the old mounted bytes.&lt;/p&gt;

&lt;p&gt;You must roll the Pod to pick up changes.&lt;/p&gt;

&lt;p&gt;Which is good for reproducibility.&lt;/p&gt;

&lt;p&gt;In production, pin image digests.&lt;/p&gt;




&lt;h2&gt;
  
  
  Runtime Support Matters
&lt;/h2&gt;

&lt;p&gt;You need modern runtimes.&lt;/p&gt;

&lt;p&gt;Roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;containerd &amp;gt;= 2.1&lt;/li&gt;
&lt;li&gt;CRI-O &amp;gt;= 1.31&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Older runtimes will fail with a clear unsupported feature error.&lt;/p&gt;




&lt;h1&gt;
  
  
  How imagePullSecrets Work
&lt;/h1&gt;

&lt;p&gt;This is one of the nicest parts.&lt;/p&gt;

&lt;p&gt;Image volumes automatically use the same authentication flow as normal container images.&lt;/p&gt;

&lt;p&gt;That means Kubernetes automatically uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pod &lt;code&gt;imagePullSecrets&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;ServiceAccount &lt;code&gt;imagePullSecrets&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;kubelet credential providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No additional auth wiring required.&lt;/p&gt;

&lt;p&gt;So this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regcred&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;works for BOTH:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;container images&lt;/li&gt;
&lt;li&gt;image volumes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Multiple Private Registries
&lt;/h2&gt;

&lt;p&gt;If assets and application images live in different registries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-registry-creds&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;assets-registry-creds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime tries the secrets in order and uses whichever matches the registry hostname.&lt;/p&gt;




&lt;h1&gt;
  
  
  Quick Comparison
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Init Container + emptyDir&lt;/th&gt;
&lt;th&gt;Image Volume&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pod complexity&lt;/td&gt;
&lt;td&gt;Multiple containers and mounts&lt;/td&gt;
&lt;td&gt;Single volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assets image&lt;/td&gt;
&lt;td&gt;Needs shell/cp&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;FROM scratch&lt;/code&gt; works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk usage&lt;/td&gt;
&lt;td&gt;Image + copied bytes&lt;/td&gt;
&lt;td&gt;Image only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Startup time&lt;/td&gt;
&lt;td&gt;Pull + copy&lt;/td&gt;
&lt;td&gt;Pull only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writable&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharing across Pods&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;imagePullSecrets&lt;/td&gt;
&lt;td&gt;Pod spec&lt;/td&gt;
&lt;td&gt;Pod spec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update without restart&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes support&lt;/td&gt;
&lt;td&gt;Always&lt;/td&gt;
&lt;td&gt;1.31 alpha → 1.33 beta → 1.36 GA&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h1&gt;
  
  
  When You Should Actually Use It
&lt;/h1&gt;

&lt;p&gt;Image volumes are ideal when you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large read-only assets&lt;/li&gt;
&lt;li&gt;independently versioned bundles&lt;/li&gt;
&lt;li&gt;OCI-distributed artifacts&lt;/li&gt;
&lt;li&gt;data shared across multiple Pods&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ML models&lt;/li&gt;
&lt;li&gt;static websites&lt;/li&gt;
&lt;li&gt;OPA bundles&lt;/li&gt;
&lt;li&gt;plugins&lt;/li&gt;
&lt;li&gt;WASM modules&lt;/li&gt;
&lt;li&gt;Grafana dashboards&lt;/li&gt;
&lt;li&gt;language packs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They're especially useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the artifacts are too large for ConfigMaps&lt;/li&gt;
&lt;li&gt;you want registry-native distribution&lt;/li&gt;
&lt;li&gt;you want image signing/scanning/RBAC&lt;/li&gt;
&lt;li&gt;the content must remain immutable&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  When NOT To Use It
&lt;/h1&gt;

&lt;p&gt;Don't use image volumes when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the application needs writable storage&lt;/li&gt;
&lt;li&gt;the content is tiny text configuration&lt;/li&gt;
&lt;li&gt;the data is stateful per Pod&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ConfigMaps&lt;/li&gt;
&lt;li&gt;PVCs&lt;/li&gt;
&lt;li&gt;&lt;code&gt;emptyDir&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are still better fits.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;The &lt;code&gt;image&lt;/code&gt; volume type feels small on paper, but it removes one of the longest-standing operational hacks in Kubernetes.&lt;/p&gt;

&lt;p&gt;For years, platform engineers built elaborate init-container copy workflows just to move immutable files into Pods.&lt;/p&gt;

&lt;p&gt;Now Kubernetes finally has a native primitive for it.&lt;/p&gt;

&lt;p&gt;If your workloads distribute:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large read-only assets&lt;/li&gt;
&lt;li&gt;ML models&lt;/li&gt;
&lt;li&gt;frontend bundles&lt;/li&gt;
&lt;li&gt;policy packs&lt;/li&gt;
&lt;li&gt;plugins&lt;/li&gt;
&lt;li&gt;shared runtime data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;this feature can significantly reduce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;startup latency&lt;/li&gt;
&lt;li&gt;storage duplication&lt;/li&gt;
&lt;li&gt;image complexity&lt;/li&gt;
&lt;li&gt;YAML noise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More importantly, it aligns Kubernetes with a broader industry shift:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;OCI images are no longer just executable containers.&lt;br&gt;
They're becoming the standard distribution format for software artifacts in general.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And image volumes push Kubernetes one step further in that direction.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>containers</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>CPU Humbled Me — A Kubernetes Throttling Story Hidden Between Prometheus Scrapes</title>
      <dc:creator>Satyaki</dc:creator>
      <pubDate>Fri, 15 May 2026 15:12:32 +0000</pubDate>
      <link>https://dev.to/blackzu/cpu-humbled-me-a-kubernetes-throttling-story-hidden-between-prometheus-scrapes-4ah8</link>
      <guid>https://dev.to/blackzu/cpu-humbled-me-a-kubernetes-throttling-story-hidden-between-prometheus-scrapes-4ah8</guid>
      <description>&lt;p&gt;&lt;strong&gt;Memory is easy. CPU humbled me.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With memory, the rule is brutal but clear — cross the limit, the pod gets OOMKilled. Done.&lt;/p&gt;

&lt;p&gt;CPU? CPU is sneaky. And I ignored it for the longest time… until it broke production.&lt;/p&gt;

&lt;p&gt;Here's what happened 👇&lt;/p&gt;

&lt;p&gt;We had an app running peacefully in-house. Then it went client-facing. Traffic surged, and suddenly ~15% of requests started timing out — most of them on DB calls.&lt;/p&gt;

&lt;p&gt;I opened Grafana expecting a smoking gun. Nothing. CPU usage looked "fine." No throttling alerts screaming at me. Just confused timeouts.&lt;/p&gt;

&lt;p&gt;The trap? &lt;strong&gt;Throttling happens in milliseconds. Prometheus scrapes every 15 seconds.&lt;/strong&gt; Every bit of evidence was hiding between the scrapes.&lt;/p&gt;

&lt;p&gt;Here was the setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;200m&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512Mi&lt;/span&gt;
  &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;800m&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.5Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Numbers from the incident (rough, but directionally honest):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Normal:&lt;/strong&gt; 300 req/min → avg CPU ~180m&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Surge:&lt;/strong&gt; 1200 req/min → avg CPU ~650m, ~15% timeouts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I sat down and actually did the math instead of guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How CPU actually works
&lt;/h2&gt;

&lt;p&gt;CPU is compressible. Memory isn't. When CPU runs out, your process doesn't die — it gets &lt;em&gt;throttled&lt;/em&gt;. The Linux CFS scheduler slices time into periods (default: &lt;strong&gt;100ms&lt;/strong&gt;). Within each period, your container gets a quota based on its limit. Cross the quota mid-period? You wait for the next one. That wait &lt;em&gt;is&lt;/em&gt; the latency you're seeing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Walking through the numbers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Normal load:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;300 req/min = 5 req/sec = 0.5 requests per 100ms
Avg CPU 180m = 18ms of CPU work per 100ms period
→ 18ms ÷ 0.5 req = ~36ms of CPU work per request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Surge load:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1200 req/min = 20 req/sec = 2 requests per 100ms
2 × 36ms = 72ms of CPU work needed per 100ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the limit was 800m → &lt;strong&gt;80ms quota per 100ms&lt;/strong&gt;. Looks fine on paper, right?&lt;/p&gt;

&lt;p&gt;Here's the catch: avg CPU was 650m (65ms). The &lt;em&gt;average&lt;/em&gt; hides the bursts. Some periods sat well below quota; others blew past the 80ms ceiling and got throttled. Average everything out across 15s scrapes and the dashboard whispers "all good" while users get timeouts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the lesson.&lt;/strong&gt; Average CPU is a liar in bursty workloads. Throttling lives in the gaps your monitoring can't see.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually look at
&lt;/h2&gt;

&lt;p&gt;Stop staring at &lt;code&gt;container_cpu_usage_seconds_total&lt;/code&gt;. Look at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;container_cpu_cfs_throttled_periods_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_cpu_cfs_throttled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ratio of throttled periods to total periods tells you the truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remediations (in order of maturity, not just "increase the limit")
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Right-size first.&lt;/strong&gt; Requests and limits should reflect real workload behavior, not guesses copy-pasted from a template.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load test before going client-facing.&lt;/strong&gt; Running an app in-house ≠ serving real traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPA recommendations&lt;/strong&gt; to understand what the app actually wants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HPA&lt;/strong&gt; so bursts get distributed across replicas instead of crushing one pod.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Then, if needed, raise the limit&lt;/strong&gt; — with intent, not panic.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bumping the limit is the easiest fix and the most expensive habit. Every patch carries a hidden cost — node capacity, bin-packing, cluster bills, blast radius. Understand the &lt;em&gt;why&lt;/em&gt; before you reach for the YAML.&lt;/p&gt;




&lt;p&gt;This one incident taught me more about Kubernetes resource management than months of reading docs. If you're running anything client-facing, please don't wait for a production incident to learn this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CPU isn't just a number on a dashboard. It's a time budget — and your users feel every millisecond you overspend.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have you been burned by CFS throttling? What metric finally gave it away for you? Drop it in the comments — I'd love to compare notes.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>sre</category>
      <category>observability</category>
    </item>
    <item>
      <title>Understanding Kube-proxy &amp; CoreDNS in Kubernetes no bluff</title>
      <dc:creator>Satyaki</dc:creator>
      <pubDate>Thu, 22 Jan 2026 15:23:21 +0000</pubDate>
      <link>https://dev.to/blackzu/understanding-kube-proxy-coredns-in-kubernetes-no-bluff-23bc</link>
      <guid>https://dev.to/blackzu/understanding-kube-proxy-coredns-in-kubernetes-no-bluff-23bc</guid>
      <description>&lt;p&gt;🛠 Setting the Stage: A Kind Cluster&lt;/p&gt;

&lt;p&gt;Kubernetes is full of magic, but one of its most fascinating components is kube-proxy. It’s the silent operator that ensures traffic hitting a Service gets distributed across the right Pods. Under the hood, kube-proxy leverages Linux iptables to make this happen. Let’s peel back the layers and see it in action.&lt;/p&gt;

&lt;p&gt;For this demo, I spun up a 3-node Kind cluster. On top of it, I deployed a simple nginx Deployment exposed via a ClusterIP Service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uk84g5ojzvn24hvx408.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uk84g5ojzvn24hvx408.png" alt=" " width="800" height="48"&gt;&lt;/a&gt;&lt;br&gt;
Here’s the deployment and service in action:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5jq98x81jx6sq0ezoli.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5jq98x81jx6sq0ezoli.png" alt=" " width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📜 Peeking into iptables&lt;/p&gt;

&lt;p&gt;Now comes the fun part. I logged into one of the nodes where a Pod is running and listed the NAT rules in the KUBE-SERVICES chain:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbgtit5b6cok39fcyuj7d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbgtit5b6cok39fcyuj7d.png" alt=" " width="800" height="86"&gt;&lt;/a&gt;&lt;br&gt;
Notice the entry for our nginx-deployment Service. The destination IP here is the ClusterIP of the Service. This is kube-proxy’s starting point for redirecting traffic&lt;/p&gt;

&lt;p&gt;🔀 Diving into the Service Chain&lt;/p&gt;

&lt;p&gt;Every Service gets its own chain. For nginx, that’s KUBE-SVC-WRNOD73BKRQH4VVX. Let’s inspect it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ge4gfetafngmbrj0xmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ge4gfetafngmbrj0xmw.png" alt=" " width="800" height="66"&gt;&lt;/a&gt;&lt;br&gt;
And here’s the magic:&lt;br&gt;
When traffic hits the ClusterIP, kube-proxy rewrites it to one of the Pod IPs backing the Deployment.&lt;br&gt;
The rules show a probability ratio — in this case, 50/50. That means half the traffic goes to one Pod, and the other half to the second Pod.&lt;br&gt;
This is how kube-proxy achieves load balancing using nothing more than iptables.&lt;br&gt;
So, what did we just see?&lt;/p&gt;

&lt;p&gt;ClusterIP → Pod IPs translation via iptables.&lt;br&gt;
Masquerading ensures the source IP is rewritten correctly.&lt;br&gt;
Probability rules distribute traffic evenly across endpoints&lt;/p&gt;

&lt;p&gt;🌐 How DNS Works in the Cluster&lt;/p&gt;

&lt;p&gt;So far, we’ve seen how kube-proxy handles traffic routing and load balancing. But how does your application even know where to send requests? That’s where CoreDNS comes in.&lt;br&gt;
CoreDNS acts as the nameserver inside Kubernetes, resolving Service names into their corresponding ClusterIPs. Let’s walk through it step by step.&lt;/p&gt;

&lt;p&gt;🔍 Inspecting the kube-dns Service&lt;/p&gt;

&lt;p&gt;In the kube-system namespace, you’ll find the kube-dns Service. This is essentially the front door to CoreDNS:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkbhtfrcmu171avcrp6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkbhtfrcmu171avcrp6u.png" alt=" " width="800" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📄 The resolv.conf File&lt;/p&gt;

&lt;p&gt;Inside Pods, the resolv.conf file contains the nameserver details and DNS search domains. This is how Kubernetes ensures that when you query something like nginx-deployment.default.svc.cluster.local, it knows how to resolve it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz598u8m5l5i21bint9vt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz598u8m5l5i21bint9vt.png" alt=" " width="738" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🧪 Testing with nslookup&lt;/p&gt;

&lt;p&gt;Let’s put it to the test. Logging into a node and running an nslookup shows the DNS resolution in action:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnog9vvihm632zdd02kyk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnog9vvihm632zdd02kyk.png" alt=" " width="614" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And it works exactly as expected — the Service name resolves to the ClusterIP, which kube-proxy then maps to the Pod IPs.&lt;/p&gt;

&lt;p&gt;🎯 Wrapping It All Up&lt;/p&gt;

&lt;p&gt;Between kube-proxy and CoreDNS, Kubernetes ensures that:&lt;/p&gt;

&lt;p&gt;Traffic hitting a Service is load balanced across Pods.&lt;br&gt;
Service names are resolved seamlessly into ClusterIPs.&lt;br&gt;
Applications don’t need to worry about IP addresses — they just use DNS names. These two components are the backbone of Kubernetes networking. Without them, Services wouldn’t be discoverable or scalable.&lt;br&gt;
🔥 And that’s the no-bluff walkthrough of kube-proxy and CoreDNS — two vital pieces of the Kubernetes puzzle. Next time you deploy an app, you’ll know exactly how the traffic finds its way to the right Pod.&lt;/p&gt;

&lt;p&gt;Thats what kube-proxy does. Isnt it really cool ? &lt;/p&gt;

</description>
      <category>devops</category>
      <category>networking</category>
      <category>kubernetes</category>
    </item>
  </channel>
</rss>
