Josep Damià Carbonell Seguí

Posted on Mar 30

kubectl-prof: Profile Your Kubernetes Applications with Zero Overhead and Zero Modifications

#go #kubernetes #performance #containers

🔥 kubectl-prof: Profile Your Kubernetes Apps Without Touching Them

Have you ever needed to debug a performance issue in a production Kubernetes pod and thought: "I wish I could just attach a profiler without restarting anything"?

That's exactly what kubectl-prof solves.

It's a kubectl plugin that lets you profile running pods — generating FlameGraphs, JFR files, heap dumps, thread dumps, memory dumps, and more — without modifying your deployments, without restarting pods, and with minimal overhead.

✨ What Makes It Special?

🎯 Zero modifications — attach to any running pod, no sidecar needed
🌐 Multi-language — Java, Go, Python, Ruby, Node.js, Rust, Clang/Clang++, PHP, .NET
📊 Rich output formats — FlameGraphs, JFR, SpeedScope, thread dumps, heap dumps, GC dumps, memory dumps, memory flamegraphs, allocation summaries, and more
⚡ Low overhead — minimal impact on production workloads
🔄 Continuous profiling — support for both one-shot and interval-based modes
🐳 Multiple runtimes — containerd and CRI-O supported

🚀 Quick Start

Install via Krew:

kubectl krew index add kubectl-prof https://github.com/josepdcs/kubectl-prof
kubectl krew install kubectl-prof/prof

Profile a Java app for 1 minute and get a FlameGraph:

kubectl prof my-pod -t 1m -l java

Profile a Python app and save the output to /tmp:

kubectl prof my-pod -t 1m -l python --local-path=/tmp

That's it. A profiling Job is spun up on the same node, profiles the target pod, and delivers the result back to your terminal.

🔧 How It Works

When you run kubectl prof, it:

Identifies the node where your target pod is running
Launches a Kubernetes Job on that same node with the appropriate profiling agent image
The agent attaches to the running container process using language-specific tools
Results are streamed back and saved locally

No changes to your application. No restarts. No sidecars.

💻 Language Support

☕ Java (JVM)

kubectl-prof supports both async-profiler and jcmd:

# FlameGraph (default, uses async-profiler)
kubectl prof mypod -t 5m -l java -o flamegraph

# JFR recording
kubectl prof mypod -t 5m -l java -o jfr

# Thread dump
kubectl prof mypod -l java -o threaddump

# Heap dump
kubectl prof mypod -l java -o heapdump --tool jcmd

# Heap histogram
kubectl prof mypod -l java -o heaphistogram --tool jcmd

You can also target specific profiling events with async-profiler:

# CPU (default: ctimer), memory allocation, or lock contention
kubectl prof mypod -t 5m -l java -e alloc
kubectl prof mypod -t 5m -l java -e lock

And pass extra arguments directly to async-profiler:

# Wall-clock profiling in per-thread mode
kubectl prof mypod -t 5m -l java -e wall --async-profiler-args -t

For Alpine-based containers, add --alpine:

kubectl prof mypod -t 1m -l java -o flamegraph --alpine

🐍 Python

Uses py-spy under the hood:

kubectl prof mypod -t 1m -l python -o flamegraph
kubectl prof mypod -l python -o threaddump
kubectl prof mypod -t 1m -l python -o speedscope

🧠 Memory Profiling with Memray

For memory profiling, kubectl-prof now integrates Memray — Bloomberg's powerful Python memory profiler. While py-spy reveals where your CPU time goes, memray reveals where your memory goes: every allocation, every deallocation, tracked in real time.

How it works (zero-downtime, zero code changes):
Memray attaches to the running Python process via GDB injection, enters the target container's network namespace via nsenter, and runs memray attach --aggregate directly against the live process. The agent automatically stages a version-matched memray package into the target container's filesystem — no memray installation is required in your application image.

Requirements:

SYS_PTRACE + SYS_ADMIN capabilities — added automatically when --tool memray is used
Python 3.10, 3.11, 3.12, 3.13 (glibc-based images only)
❌ Not supported: Alpine/musl targets or statically-linked Python builds

Output types:

Output	Flag	Format	Description
Memory flamegraph	`-o flamegraph`	HTML	Interactive flamegraph of allocation call stacks & sizes
Allocation summary	`-o summary`	Text	Tabular list of top allocators by total bytes

# Interactive HTML memory flamegraph — open in any browser
kubectl prof mypod -t 1m -l python --tool memray -o flamegraph --local-path=/tmp

# Text summary of the biggest allocators
kubectl prof mypod -t 1m -l python --tool memray -o summary --local-path=/tmp

# Long session with custom heartbeat (keeps the connection alive through proxies)
kubectl prof mypod -t 10m -l python --tool memray -o flamegraph --heartbeat-interval=15s

# Target a specific process in a multi-process pod
kubectl prof mypod -t 2m -l python --tool memray -o flamegraph --pid 1234
kubectl prof mypod -t 2m -l python --tool memray -o flamegraph --pgrep my-worker

Note: --tool memray must be set explicitly. The default Python tool remains py-spy.

🐹 Go

Uses eBPF profiling. Two options available:

# BPF (default) — requires kernel headers
kubectl prof mypod -t 1m -l go -o flamegraph

# BTF (CO-RE) — no kernel headers needed, works on modern kernels
kubectl prof mypod -t 1m -l go --tool btf

The BTF option is great for cloud providers like DigitalOcean where kernel headers may not be available.

📗 Node.js

# FlameGraph via eBPF
kubectl prof mypod -t 1m -l node -o flamegraph

# Heap snapshot
kubectl prof mypod -l node -o heapsnapshot

Tip: Run Node.js with --perf-basic-prof for better JavaScript symbol resolution.

💎 Ruby

Uses rbspy:

kubectl prof mypod -t 1m -l ruby -o flamegraph
kubectl prof mypod -t 1m -l ruby -o speedscope
kubectl prof mypod -t 1m -l ruby -o callgrind

🦀 Rust

Uses cargo-flamegraph for Rust-optimized profiling with great symbol resolution:

kubectl prof mypod -t 1m -l rust -o flamegraph

🐘 PHP

Uses phpspy — works with PHP 7+, zero modifications needed:

kubectl prof mypod -t 1m -l php -o flamegraph
kubectl prof mypod -t 1m -l php -o raw

🟣 .NET (Core / .NET 5+)

This is where kubectl-prof really shines. Four tools from the .NET diagnostics suite are fully supported:

Tool	Output	Use case
`dotnet-trace` (default)	`.speedscope.json` or `.nettrace`	CPU traces & runtime events
`dotnet-gcdump`	`.gcdump`	GC heap snapshot
`dotnet-counters`	`.json`	Real-time performance counters
`dotnet-dump`	`.dmp`	Full memory dump

# CPU trace → open in speedscope.app
kubectl prof mypod -t 30s -l dotnet -o speedscope

# GC heap dump
kubectl prof mypod -l dotnet --tool dotnet-gcdump -o gcdump

# Performance counters (CPU, GC, thread pool, exceptions…)
kubectl prof mypod -t 30s -l dotnet --tool dotnet-counters -o counters

# Full memory dump
kubectl prof mypod -l dotnet --tool dotnet-dump -o dump

🎯 Advanced Features

Profile Multiple Pods at Once

kubectl prof --selector app=myapp -t 5m -l java -o jfr

⚠️ Use with caution — this profiles ALL matching pods.

Continuous Profiling

Generate results at regular intervals:

kubectl prof mypod -l java -t 5m --interval 60s

Target a Specific Process

# By PID
kubectl prof mypod -l java --pid 1234

# By process name
kubectl prof mypod -l java --pgrep java-app-process

Custom Resource Limits

kubectl prof mypod -l java -t 5m \
  --cpu-limits=1 \
  --cpu-requests=100m \
  --mem-limits=200Mi \
  --mem-requests=100Mi

Cross-Namespace Profiling

kubectl prof mypod -n profiling \
  --service-account=profiler \
  --target-namespace=my-apps \
  -l go

Handle Large Output Files

For heap dumps, memory dumps, and other large files, split them into chunks for easier transfer:

kubectl prof mypod -l java -o heapdump --tool jcmd --output-split-size=100M
kubectl prof mypod -l dotnet --tool dotnet-dump -o dump --output-split-size=500M

Node Tolerations

Profile pods on nodes with taints:

kubectl prof my-pod -t 5m -l java \
  --tolerations=node.kubernetes.io/disk-pressure=true:NoSchedule \
  --tolerations=dedicated=profiling:PreferNoSchedule

📦 Installation Options

Krew (Recommended)

kubectl krew index add kubectl-prof https://github.com/josepdcs/kubectl-prof
kubectl krew install kubectl-prof/prof

Pre-built Binaries

# Linux x86_64
wget https://github.com/josepdcs/kubectl-prof/releases/download/1.11.1/kubectl-prof_1.11.1_linux_amd64.tar.gz
tar xvfz kubectl-prof_1.11.1_linux_amd64.tar.gz
sudo install kubectl-prof /usr/local/bin/

Build from Source

go get -d github.com/josepdcs/kubectl-prof
cd $GOPATH/src/github.com/josepdcs/kubectl-prof
make install-deps
make build

🧠 When Should You Use kubectl-prof?

kubectl-prof is the tool you want when:

🔥 You're hitting unexpected CPU spikes in production and need a FlameGraph now
💾 You suspect a memory leak and want a heap dump without restarting
🐌 Your app is slow and you need to identify the bottleneck across any language
🔄 You need continuous profiling over time with interval-based snapshots
🚫 You can't modify the running workload (no sidecars, no redeploys)

🤝 Contributing

The project is open source (Apache 2.0) and welcomes contributions:

🐛 Bug reports and fixes
💡 Feature requests
📝 Documentation improvements
🔧 Pull requests

Check the Contributing guide and give the repo a ⭐ if you find it useful!

👉 GitHub: https://github.com/josepdcs/kubectl-prof

Have you tried kubectl-prof? What language or profiling scenario would you like to see covered next? Drop a comment below! 👇

DEV Community