DEV Community: Satyaki

Anatomy of a High-CPU Crisis: Why Your Code Might Not Be the Problem

Satyaki — Fri, 29 May 2026 12:33:19 +0000

Your primary application service is screaming at 100% CPU utilization.

As engineering leaders and DevOps practitioners, our immediate instinct is usually a binary choice:

The Infrastructure Guess: “We must be getting hit with a massive surge of user traffic. Scale it out!”
The Software Guess: “A developer pushed a broken while(true) loop. Revert the commit!”

But senior systems engineers know a deeper truth: A computer is a tightly coupled ecosystem. A bottleneck in a completely passive resource—like a disk or raw memory—can masquerade as a devastating CPU crisis downstream.

If you want to move past shallow dashboard watching and truly understand Linux internals during a production outage, we have to look at how applications actually exploit hardware, and exactly how the dots connect when a system begins to melt.

The Blueprint: The Office Desk Analogy

To understand how software interacts with system hardware, let's look at a running application instance as a human accountant named "App" sitting at an office desk.

Hardware Component	The System Reality	The Office Analogy
CPU (Processing)	The speed at which execution cycles occur.	App's Brain Power. How fast App can read an instruction, calculate math, and execute tasks.
RAM (Memory)	Volatile, high-speed space for active variables.	The Desktop Surface. A fast, easily accessible space where files are laid out flat to be worked on. Space is limited.
Disk (Storage)	Non-volatile, high-capacity, slower storage block.	The Filing Cabinet in the Basement. Holds massive amounts of historical data, but walking down to get it takes time.

In a healthy system, the CPU is the only engine doing actual work. RAM and Disk are completely passive grids of silicon and magnets; they cannot move a single byte of data on their own. Every calculation, every file copy, and every memory cleanup cycle requires the CPU's brain power.

Because the CPU manages all three domains, a failure in Storage or Memory will immediately force the CPU to stop handling business logic and suffocate under infrastructure housekeeping.

1. When Memory Attacks the CPU: The Panicked Janitor Loop

High-level runtimes (like Java, Node.js, and Python) utilize an automated internal process called the Garbage Collector (GC). Think of the GC as a background janitor whose only job is to walk around App's desk, find papers that are no longer needed, and toss them in the trash to keep the workspace clean.

The Meltdown Mechanics

Imagine your code hits a slow memory leak. Variables accumulate, and the desk surface (RAM) hits 98% capacity.

The background Janitor panics. He starts sprinting around the desk at breakneck speed, checking every single piece of paper over and over again, desperately hunting for something he can safely discard. He finds nothing, spins around, and instantly checks again.

Because the Janitor is moving his arms and legs billions of times a second, he consumes 100% of the room's physical energy (CPU). The application's brain is completely pinned, not because it's processing user transactions, but because it is hyperventilating over a lack of desk space.

Eventually, the Linux kernel loses patience with the unworkable chaos, steps in as the building manager, and forcefully shoots the process in the head via the OOM (Out Of Memory) Killer.

2. When Disk Attacks the CPU: The Filing Failure Loop

Applications are strictly designed to keep an audit trail of their operations via logging frameworks. Every time App completes a task, it writes an audit note on an index card and dispatches it down to the basement filing cabinet (Disk).

The Meltdown Mechanics

What happens when that filing cabinet hits 100% capacity?

App tries to slide a logging card into a jammed drawer. Linux rejects the write operation and throws an error: No space left on device.

If the application’s error-handling architecture isn't flawlessly designed, a catastrophic trap springs open:

The app fails to write its standard log line due to a full disk.
The code catches that exception and says: "An error occurred! Let me immediately write an explicit emergency error report to the log file!"
The app tries to write the emergency report to the exact same jammed cabinet. It fails again.
The error-handler catches that failure and loops back instantly to retry.

This error loop executes millions of times a second. The CPU core is instantly pinned at 100% capacity, trapped in a frantic, hysterical loop of trying to record its own storage failures into a locked drawer.

The Senior DevOps Playbook: Triage and Surgical Root Cause

When a 100% CPU alert wakes you up, you can execute a definitive diagnostic triage by following these sequential steps.

Step 1: Protect the Users (Stop the Bleeding)

Do not try to debug a live server while it is dropping customer traffic. Instantly remove the failing instance from your Application Load Balancer (ALB) Target Group or isolate it from your Auto Scaling Group (ASG). Allow the ASG to spin up a fresh, healthy instance to assume the user load, and keep the mutilated server alive in an isolated sandbox for debugging.

Step 2: The Traffic vs. Code Fork

Log into the isolated instance via SSH or AWS SSM and run top.

Scenario A: The CPU usage immediately plummets to near-zero (99.5% id or Idle).
- The Verdict: Your code is completely fine. The instance melted down purely because of a massive surge of legitimate user traffic. The second you cut the traffic, the CPU relaxed. Your immediate solution is horizontal scaling (more instances) or vertical scaling (larger instance sizes).
Scenario B: The instance has zero public user traffic hitting it, but the CPU is still pinned at 100%.
- The Verdict: You have a localized environment or code failure. Proceed to Step 3.

Step 3: Check for Infrastructure Collateral Damage

Before reading application logs, query the hardware status:

Run dmesg -T | grep -i oom to inspect the Linux Kernel’s emergency logbook. If you see the OS actively slaughtering processes, your CPU spike is a downstream symptom of a critical memory starvation event.
Run df -h to check disk utilization across your mounted partitions. If a partition is flatlined at 100%, you are likely dealing with an infinite error-logging loop. Clear out old log buffers or expand the EBS volume to instantly free the CPU.

Step 4: Surgical Thread Inspection

If memory and disk are completely healthy, a rogue software loop is actively spinning out of control. Open htop to identify the exact culprit:

Press F5 to switch to Tree View. This maps out the exact lineage of parent and child processes.
Press Shift + H to toggle Userland Threads.

💡 Internal Linux Nuance: In Linux, child processes are completely independent programs with isolated memory boundaries. Threads are internal workers ("Lightweight Processes" or LWPs) sharing the exact same memory building. While tools like htop display a Thread's unique identification number under the PID column for convenience, it is technically a TID (Thread ID) executing within a shared TGID (Thread Group ID) block.

When you expand the thread view using Shift + H, threads are easily distinguished from child processes because they inherit the exact same parent command string and their row text color is automatically dimmed out by htop.

Sort by CPU percentage (F6). Identify the exact Thread ID (TID) riding at the absolute top of the processing stack.


text
  PID  Command
 503  python3 main.py
 504  └─ python3 main.py  <-- [Dimmed Text: This specific Thread is the 100% CPU culprit]
 505  └─ python3 main.py

Docker Builds Were Taking 10 Minutes. This One Change Brought It Down to Seconds

Satyaki — Sat, 23 May 2026 12:37:50 +0000

If you work with large Docker builds in production, especially with multi-module Spring Boot applications, you’ve probably suffered through this:

You change one tiny application.yaml.

Then you rebuild the image.

And suddenly Docker starts downloading half the internet again.

I recently faced this while working on a multi-module Spring Boot application with multiple pom.xml files and a huge dependency tree. Every rebuild felt painful. Sometimes the build would sit for 8–10 minutes just resolving Maven dependencies before even packaging the app.

The worst part?

The actual code change was tiny.

The Real Problem Wasn't Maven

It was the Docker layer strategy.

A lot of Dockerfiles are written like this:

COPY . .
RUN mvn package

Looks simple.

But this completely destroys Docker layer caching.

Every Docker instruction creates a separate immutable layer.

Each layer gets its own content hash internally. If any layer changes, Docker invalidates all layers beneath it and rebuilds them again.

So if your Dockerfile copies source code before resolving dependencies:

COPY src ./src
RUN mvn package

then every source code change forces:

Maven dependency resolution
Plugin downloads
Packaging
Recompilation

all over again.

Even though dependencies never changed.

That’s where most of the build time gets wasted.

The Optimization That Changed Everything

I switched to Docker BuildKit.

At the top of the Dockerfile:

# syntax=docker/dockerfile:1.4

This connects the Dockerfile frontend to the modern BuildKit backend and unlocks advanced features like cache mounts.

Then instead of doing a normal dependency resolution:

RUN mvn dependency:go-offline

I used:

RUN --mount=type=cache,target=/root/.m2 \
    mvn dependency:go-offline

This was the game changer.

What `--mount=type=cache` Actually Does

Normally Maven stores dependencies inside:

/root/.m2

Without BuildKit:

Dependencies download every fresh build
Docker throws them away after the layer rebuilds

With BuildKit cache mounts:

Docker creates a persistent cache directory on the host
Maven dependencies stay cached
Future builds instantly reuse them

So after the first build:

Dependencies no longer redownload
Rebuilds become dramatically faster
Iterative development becomes smooth again

My rebuild time dropped from several minutes to just a few seconds.

The Other Optimization Most People Miss

Instruction ordering inside Dockerfiles matters a lot.

This pattern is extremely important:

COPY pom.xml .
RUN mvn dependency:go-offline

COPY src ./src

Why?

Because pom.xml changes far less frequently than application source code.

Docker can now cache dependency resolution separately from source changes.

So:

Changing Java code only rebuilds packaging layers
Dependency layers remain untouched
Rebuilds stay fast

If you reverse the order:

COPY src ./src
COPY pom.xml .

then every source code change invalidates all subsequent layers.

That’s catastrophic for large Java builds.

But Wait… How Does This Work in CI/CD?

At this point I had another question myself.

If CI runners like GitHub Actions use fresh ephemeral machines every run, then where is the cache actually stored?

Because after every pipeline:

Runner gets destroyed
Filesystem disappears
.m2 cache disappears too

So how does caching survive across builds?

The answer is BuildKit remote cache export/import.

In GitHub Actions, you can persist BuildKit cache across runners like this:

steps:
  - uses: actions/checkout@v4

  - uses: docker/setup-buildx-action@v3

  - uses: docker/build-push-action@v5
    with:
      context: .
      file: ./Dockerfile
      tags: myimage:latest

      cache-from: type=gha
      cache-to: type=gha,mode=max

This is insanely powerful.

What happens here:

cache-to pushes BuildKit cache to GitHub's ephemeral cache storage
cache-from restores it in future workflow runs

So even though every runner is brand new:

Maven dependencies stay cached
Docker layers stay reusable
Builds remain fast across pipelines

This is how modern production CI/CD systems optimize container builds at scale.

Why This Matters in Production

This isn’t just about developer convenience.

In production engineering environments this directly impacts:

CI/CD speed
Deployment frequency
Compute costs
Feedback loops
Developer productivity

For teams deploying dozens or hundreds of services, shaving even 5 minutes off builds compounds into massive engineering efficiency gains.

Especially in:

Microservice architectures
Monorepos
Multi-module Maven projects
Kubernetes delivery pipelines
High-frequency deployment environments

Final Takeaway

If your Docker builds are painfully slow, don’t immediately blame:

Maven
Spring Boot
Network latency

Most of the time the real issue is poor Docker layer design and missing cache strategy.

A properly structured Dockerfile combined with BuildKit caching can reduce rebuild times from 10 minutes to a few seconds.

And once you experience that speed difference, there’s no going back.

Image Volume Type GA in Kubernetes 1.36 — Finally Killing the Init Container Copy Pattern

Satyaki — Thu, 21 May 2026 05:49:06 +0000

For years, Kubernetes engineers have used the same awkward pattern whenever an application needed large read-only assets:

Init container
emptyDir
cp -r
Wait for startup
Duplicate storage usage

It worked, but it always felt like a workaround rather than a first-class Kubernetes primitive.

With Kubernetes 1.36, the image volume type is now GA, and it fundamentally changes how Pods can consume immutable file bundles.

Instead of:

pulling files into an init container,
copying them into an emptyDir,
and mounting them into the main container,

Kubernetes can now directly mount an OCI image filesystem into a container as a read-only volume.

That means:

no init-container copy step,
no duplicated bytes on disk,
faster startup times,
smaller artifact images,
and much cleaner Pod specs.

This becomes especially powerful for:

ML models
static websites
WASM modules
OPA bundles
language packs
Grafana dashboards
plugin distributions
and any independently versioned read-only asset bundle

Let me walk through a realistic end-to-end scenario.

The Scenario

Team A owns nginx (the web server).
Team B owns the website content (HTML/CSS/JS).
Team B ships new content 5x/day.
Team A ships nginx config changes maybe once a month.
They should not be coupled.

The static assets live in an OCI image:

registry.example.com/web/site-assets:v42

This image is:

scratch + files

No shell. No entrypoint. Just:

/site/index.html
/site/style.css
/site/app.js

The Old Way (Pre-1.36): Init Container + emptyDir

How You Built the Assets Image

# Dockerfile for site-assets

FROM busybox:1.36 AS source
COPY ./site /site

# Final image needs a shell because the init container will run `cp`
FROM busybox:1.36
COPY --from=source /site /site

Notice something important:

You cannot use FROM scratch here because the init container needs tools like:

cp
sh

So the image is bloated with BusyBox purely to enable the copy operation.

The Pod Manifest

apiVersion: v1
kind: Pod
metadata:
  name: web-old-way

spec:
  imagePullSecrets:
    - name: regcred

  initContainers:
    - name: load-assets
      image: registry.example.com/web/site-assets:v42

      command:
        - sh
        - -c
        - cp -r /site/* /shared/

      volumeMounts:
        - name: shared
          mountPath: /shared

  containers:
    - name: nginx
      image: nginx:1.27

      volumeMounts:
        - name: shared
          mountPath: /usr/share/nginx/html
          readOnly: true

  volumes:
    - name: shared
      emptyDir: {}

What's Actually Happening Under the Hood

1. Images are pulled

Kubelet pulls:

nginx:1.27
site-assets:v42

using the Pod's imagePullSecrets.

2. Kubelet creates the emptyDir

Kubernetes creates an actual directory on the node filesystem:

/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~empty-dir/shared/

At this point it is completely empty.

3. Init container starts

The init container's root filesystem is the site-assets image.

The emptyDir gets bind-mounted into the init container at:

/shared

4. The copy operation happens

The init container executes:

cp -r /site/* /shared/

Every byte gets physically copied:

image layer -> emptyDir

5. Init container exits

Kubelet records successful completion.

6. nginx starts

The same emptyDir is mounted into nginx at:

/usr/share/nginx/html

nginx now serves the copied files.

Real Problems With the Old Pattern

1. Disk Usage Doubles

The files now exist in two places:

/var/lib/containerd/...

AND

emptyDir

A 2 GB ML model becomes:

2 GB image layer + 2 GB copy = 4 GB

per Pod.

2. Startup Latency

Large copy operations are expensive.

A 2 GB copy operation can easily add:

10–30 seconds

before the main container even starts.

3. You Need a Shell

You cannot use:

FROM scratch

because the image needs tooling like:

cp
sh

That means:

larger images
more CVEs
more attack surface

4. Verbose YAML

You need:

init containers
shared volumes
multiple mounts
copy commands

All just to move files around.

5. No Sharing Across Pods

Every Pod independently copies the same bytes into its own emptyDir.

Ten Pods on one node means:

10 independent copies

The New Way (Kubernetes 1.36): Image Volume Type

How You Build the Assets Image Now

FROM scratch

COPY ./site /site

That's it.

No shell.
No BusyBox.
No executables.

Just files.

The Pod Manifest

apiVersion: v1
kind: Pod

metadata:
  name: web-new-way

spec:
  imagePullSecrets:
    - name: regcred

  containers:
    - name: nginx
      image: nginx:1.27

      volumeMounts:
        - name: assets
          mountPath: /usr/share/nginx/html
          subPath: site

  volumes:
    - name: assets

      image:
        reference: registry.example.com/web/site-assets:v42
        pullPolicy: IfNotPresent

No init container.
No emptyDir.
No copy operation.

What's Actually Happening Under the Hood Now

1. Kubelet sees an image volume

Kubelet notices:

volumes:
  - image:

and asks the CRI runtime to mount the image.

2. Runtime pulls the image

containerd or CRI-O pulls:

site-assets:v42

using the normal image pull pipeline.

Exactly the same mechanism used for container images.

3. Runtime unpacks image layers

The image gets unpacked into the runtime snapshot store.

For example with containerd:

overlayfs snapshotter

No container is started.

Only the filesystem is materialized.

4. Filesystem is bind-mounted directly

The runtime bind-mounts the image filesystem directly into the nginx container:

/usr/share/nginx/html

Read-only by design.

5. nginx starts immediately

No copy step.
No waiting.

The files already exist.

The Mental Model Shift

The old model was:

"Start a helper container and copy files somewhere."

The new model is:

"Mount an OCI image filesystem directly as a volume."

That sounds subtle, but architecturally it's a major shift.

Kubernetes is effectively treating OCI images as generic immutable data artifacts — not just runnable containers.

What You Gain

No Data Duplication

The image is bind-mounted directly.

No copy operation.

Faster Startup

You eliminate the init-container copy phase entirely.

For large datasets or ML models, this is massive.

Smaller and Safer Images

You can now use:

FROM scratch

which means:

smaller images
fewer CVEs
reduced attack surface

Always Read-Only

Image volumes are immutable by specification.

The runtime enforces it.

Applications cannot modify the mounted content.

Shared Across Pods

Ten Pods mounting the same image on the same node share the same underlying bytes.

Huge improvement for large artifact distribution.

Cleaner YAML

The Pod spec now clearly expresses intent:

"Mount this image's filesystem here."

instead of implementing an entire file-copy workflow.

Important Caveats

Image Volumes Are Always Read-Only

If your application needs writable storage, use:

emptyDir
PVCs
ephemeral storage

instead.

subPath Is Extremely Useful

If your files live under:

/site

inside the image but you want them mounted directly into:

/usr/share/nginx/html

then:

subPath: site

solves that cleanly.

pullPolicy Works Exactly Like Container Images

You can use:

pullPolicy: Always

or:

pullPolicy: IfNotPresent

exactly as you already do with containers.

No Environment Variable Substitution

This does NOT work:

reference: ${ASSET_VERSION}

The field is literal.

Use:

Helm
Kustomize
templating

instead.

Running Pods Don't Automatically Update

If somebody pushes a new image to:

:v42

existing Pods continue using the old mounted bytes.

You must roll the Pod to pick up changes.

Which is good for reproducibility.

In production, pin image digests.

Runtime Support Matters

You need modern runtimes.

Roughly:

containerd >= 2.1
CRI-O >= 1.31

Older runtimes will fail with a clear unsupported feature error.

How imagePullSecrets Work

This is one of the nicest parts.

Image volumes automatically use the same authentication flow as normal container images.

That means Kubernetes automatically uses:

Pod imagePullSecrets
ServiceAccount imagePullSecrets
kubelet credential providers

No additional auth wiring required.

So this:

imagePullSecrets:
  - name: regcred

works for BOTH:

container images
image volumes

Multiple Private Registries

If assets and application images live in different registries:

imagePullSecrets:
  - name: app-registry-creds
  - name: assets-registry-creds

The runtime tries the secrets in order and uses whichever matches the registry hostname.

Quick Comparison

Aspect	Init Container + emptyDir	Image Volume
Pod complexity	Multiple containers and mounts	Single volume
Assets image	Needs shell/cp	`FROM scratch` works
Disk usage	Image + copied bytes	Image only
Startup time	Pull + copy	Pull only
Writable	Yes	No
Sharing across Pods	No	Yes
imagePullSecrets	Pod spec	Pod spec
Update without restart	No	No
Kubernetes support	Always	1.31 alpha → 1.33 beta → 1.36 GA

When You Should Actually Use It

Image volumes are ideal when you have:

large read-only assets
independently versioned bundles
OCI-distributed artifacts
data shared across multiple Pods

Examples include:

ML models
static websites
OPA bundles
plugins
WASM modules
Grafana dashboards
language packs

They're especially useful when:

the artifacts are too large for ConfigMaps
you want registry-native distribution
you want image signing/scanning/RBAC
the content must remain immutable

When NOT To Use It

Don't use image volumes when:

the application needs writable storage
the content is tiny text configuration
the data is stateful per Pod

In those cases:

ConfigMaps
PVCs
emptyDir

are still better fits.

Final Thoughts

The image volume type feels small on paper, but it removes one of the longest-standing operational hacks in Kubernetes.

For years, platform engineers built elaborate init-container copy workflows just to move immutable files into Pods.

Now Kubernetes finally has a native primitive for it.

If your workloads distribute:

large read-only assets
ML models
frontend bundles
policy packs
plugins
shared runtime data

this feature can significantly reduce:

startup latency
storage duplication
image complexity
YAML noise

More importantly, it aligns Kubernetes with a broader industry shift:

OCI images are no longer just executable containers.
They're becoming the standard distribution format for software artifacts in general.

And image volumes push Kubernetes one step further in that direction.

CPU Humbled Me — A Kubernetes Throttling Story Hidden Between Prometheus Scrapes

Satyaki — Fri, 15 May 2026 15:12:32 +0000

Memory is easy. CPU humbled me.

With memory, the rule is brutal but clear — cross the limit, the pod gets OOMKilled. Done.

CPU? CPU is sneaky. And I ignored it for the longest time… until it broke production.

Here's what happened 👇

We had an app running peacefully in-house. Then it went client-facing. Traffic surged, and suddenly ~15% of requests started timing out — most of them on DB calls.

I opened Grafana expecting a smoking gun. Nothing. CPU usage looked "fine." No throttling alerts screaming at me. Just confused timeouts.

The trap? Throttling happens in milliseconds. Prometheus scrapes every 15 seconds. Every bit of evidence was hiding between the scrapes.

Here was the setup:

resources:
  requests:
    cpu: 200m
    memory: 512Mi
  limits:
    cpu: 800m
    memory: 1.5Gi

Numbers from the incident (rough, but directionally honest):

Normal: 300 req/min → avg CPU ~180m
Surge: 1200 req/min → avg CPU ~650m, ~15% timeouts

So I sat down and actually did the math instead of guessing.

How CPU actually works

CPU is compressible. Memory isn't. When CPU runs out, your process doesn't die — it gets throttled. The Linux CFS scheduler slices time into periods (default: 100ms). Within each period, your container gets a quota based on its limit. Cross the quota mid-period? You wait for the next one. That wait is the latency you're seeing.

Walking through the numbers

Normal load:

300 req/min = 5 req/sec = 0.5 requests per 100ms
Avg CPU 180m = 18ms of CPU work per 100ms period
→ 18ms ÷ 0.5 req = ~36ms of CPU work per request

Surge load:

1200 req/min = 20 req/sec = 2 requests per 100ms
2 × 36ms = 72ms of CPU work needed per 100ms

But the limit was 800m → 80ms quota per 100ms. Looks fine on paper, right?

Here's the catch: avg CPU was 650m (65ms). The average hides the bursts. Some periods sat well below quota; others blew past the 80ms ceiling and got throttled. Average everything out across 15s scrapes and the dashboard whispers "all good" while users get timeouts.

That's the lesson. Average CPU is a liar in bursty workloads. Throttling lives in the gaps your monitoring can't see.

What to actually look at

Stop staring at container_cpu_usage_seconds_total. Look at:

container_cpu_cfs_throttled_periods_total
container_cpu_cfs_throttled_seconds_total

The ratio of throttled periods to total periods tells you the truth.

Remediations (in order of maturity, not just "increase the limit")

Right-size first. Requests and limits should reflect real workload behavior, not guesses copy-pasted from a template.
Load test before going client-facing. Running an app in-house ≠ serving real traffic.
VPA recommendations to understand what the app actually wants.
HPA so bursts get distributed across replicas instead of crushing one pod.
Then, if needed, raise the limit — with intent, not panic.

Bumping the limit is the easiest fix and the most expensive habit. Every patch carries a hidden cost — node capacity, bin-packing, cluster bills, blast radius. Understand the why before you reach for the YAML.

This one incident taught me more about Kubernetes resource management than months of reading docs. If you're running anything client-facing, please don't wait for a production incident to learn this.

CPU isn't just a number on a dashboard. It's a time budget — and your users feel every millisecond you overspend.

Have you been burned by CFS throttling? What metric finally gave it away for you? Drop it in the comments — I'd love to compare notes.

Understanding Kube-proxy & CoreDNS in Kubernetes no bluff

Satyaki — Thu, 22 Jan 2026 15:23:21 +0000

🛠 Setting the Stage: A Kind Cluster

Kubernetes is full of magic, but one of its most fascinating components is kube-proxy. It’s the silent operator that ensures traffic hitting a Service gets distributed across the right Pods. Under the hood, kube-proxy leverages Linux iptables to make this happen. Let’s peel back the layers and see it in action.

For this demo, I spun up a 3-node Kind cluster. On top of it, I deployed a simple nginx Deployment exposed via a ClusterIP Service.

Here’s the deployment and service in action:

📜 Peeking into iptables

Now comes the fun part. I logged into one of the nodes where a Pod is running and listed the NAT rules in the KUBE-SERVICES chain:

Notice the entry for our nginx-deployment Service. The destination IP here is the ClusterIP of the Service. This is kube-proxy’s starting point for redirecting traffic

🔀 Diving into the Service Chain

Every Service gets its own chain. For nginx, that’s KUBE-SVC-WRNOD73BKRQH4VVX. Let’s inspect it:

And here’s the magic:
When traffic hits the ClusterIP, kube-proxy rewrites it to one of the Pod IPs backing the Deployment.
The rules show a probability ratio — in this case, 50/50. That means half the traffic goes to one Pod, and the other half to the second Pod.
This is how kube-proxy achieves load balancing using nothing more than iptables.
So, what did we just see?

ClusterIP → Pod IPs translation via iptables.
Masquerading ensures the source IP is rewritten correctly.
Probability rules distribute traffic evenly across endpoints

🌐 How DNS Works in the Cluster

So far, we’ve seen how kube-proxy handles traffic routing and load balancing. But how does your application even know where to send requests? That’s where CoreDNS comes in.
CoreDNS acts as the nameserver inside Kubernetes, resolving Service names into their corresponding ClusterIPs. Let’s walk through it step by step.

🔍 Inspecting the kube-dns Service

In the kube-system namespace, you’ll find the kube-dns Service. This is essentially the front door to CoreDNS:

📄 The resolv.conf File

Inside Pods, the resolv.conf file contains the nameserver details and DNS search domains. This is how Kubernetes ensures that when you query something like nginx-deployment.default.svc.cluster.local, it knows how to resolve it.

🧪 Testing with nslookup

Let’s put it to the test. Logging into a node and running an nslookup shows the DNS resolution in action:

And it works exactly as expected — the Service name resolves to the ClusterIP, which kube-proxy then maps to the Pod IPs.

🎯 Wrapping It All Up

Between kube-proxy and CoreDNS, Kubernetes ensures that:

Traffic hitting a Service is load balanced across Pods.
Service names are resolved seamlessly into ClusterIPs.
Applications don’t need to worry about IP addresses — they just use DNS names. These two components are the backbone of Kubernetes networking. Without them, Services wouldn’t be discoverable or scalable.
🔥 And that’s the no-bluff walkthrough of kube-proxy and CoreDNS — two vital pieces of the Kubernetes puzzle. Next time you deploy an app, you’ll know exactly how the traffic finds its way to the right Pod.

Thats what kube-proxy does. Isnt it really cool ?

DEV Community: Satyaki

Anatomy of a High-CPU Crisis: Why Your Code Might Not Be the Problem

The Blueprint: The Office Desk Analogy

1. When Memory Attacks the CPU: The Panicked Janitor Loop

The Meltdown Mechanics

2. When Disk Attacks the CPU: The Filing Failure Loop

The Meltdown Mechanics

The Senior DevOps Playbook: Triage and Surgical Root Cause

Step 1: Protect the Users (Stop the Bleeding)

Step 2: The Traffic vs. Code Fork

Step 3: Check for Infrastructure Collateral Damage

Step 4: Surgical Thread Inspection

Docker Builds Were Taking 10 Minutes. This One Change Brought It Down to Seconds

The Real Problem Wasn't Maven

The Optimization That Changed Everything

What --mount=type=cache Actually Does

The Other Optimization Most People Miss

But Wait… How Does This Work in CI/CD?

Why This Matters in Production

Final Takeaway

Image Volume Type GA in Kubernetes 1.36 — Finally Killing the Init Container Copy Pattern

The Scenario

The Old Way (Pre-1.36): Init Container + emptyDir

How You Built the Assets Image

The Pod Manifest

What's Actually Happening Under the Hood

1. Images are pulled

2. Kubelet creates the emptyDir

3. Init container starts

4. The copy operation happens

5. Init container exits

6. nginx starts

Real Problems With the Old Pattern

1. Disk Usage Doubles

2. Startup Latency

3. You Need a Shell

4. Verbose YAML

5. No Sharing Across Pods

The New Way (Kubernetes 1.36): Image Volume Type

How You Build the Assets Image Now

The Pod Manifest

What's Actually Happening Under the Hood Now

1. Kubelet sees an image volume

2. Runtime pulls the image

3. Runtime unpacks image layers

4. Filesystem is bind-mounted directly

5. nginx starts immediately

The Mental Model Shift

What You Gain

No Data Duplication

Faster Startup

Smaller and Safer Images

Always Read-Only

Shared Across Pods

Cleaner YAML

Important Caveats

Image Volumes Are Always Read-Only

subPath Is Extremely Useful

pullPolicy Works Exactly Like Container Images

No Environment Variable Substitution

Running Pods Don't Automatically Update

Runtime Support Matters

How imagePullSecrets Work

Multiple Private Registries

Quick Comparison

When You Should Actually Use It

When NOT To Use It

Final Thoughts

CPU Humbled Me — A Kubernetes Throttling Story Hidden Between Prometheus Scrapes

How CPU actually works

Walking through the numbers

What to actually look at

Remediations (in order of maturity, not just "increase the limit")

Understanding Kube-proxy & CoreDNS in Kubernetes no bluff

What `--mount=type=cache` Actually Does