DEV Community

Gaurav Tayade
Gaurav Tayade

Posted on

Stop Copy-Pasting kubectl Commands to Debug Pods

Every time a pod crashes, you run the same 5 commands. There's a better way.


The pain

It's 2 AM. Your on-call phone fires. A pod in production is crashing.

You SSH in, open your terminal, and start the ritual:

kubectl get pods -n nxs-demo-bad
kubectl describe pod crash-loop-demo -n nxs-demo-bad
kubectl logs crash-loop-demo -n nxs-demo-bad --previous
kubectl get events -n nxs-demo-bad --sort-by='.lastTimestamp'
kubectl top pod crash-loop-demo -n nxs-demo-bad
Enter fullscreen mode Exit fullscreen mode

You get walls of output. You scan through it manually. You try to piece together what went wrong. You copy the error into Google. You find a Stack Overflow answer from 2019.

This is the reality for most engineers debugging Kubernetes. Not because they don't know what they're doing — but because the tooling makes you do all the work yourself.

Here's what a real broken namespace looks like:

$ kubectl get pods -n nxs-demo-bad

NAME              READY   STATUS             RESTARTS         AGE
crash-loop-demo   0/1     Error              68 (5m20s ago)   22h
image-pull-demo   0/1     ImagePullBackOff   0                22h
oom-demo          0/1     CrashLoopBackOff   95 (30s ago)     22h
pending-demo      0/1     Pending            0                22h
Enter fullscreen mode Exit fullscreen mode

4 pods broken. 4 different problems. Where do you even start?


What's actually happening

When a pod crashes, the information you need is spread across three places:

  • Logs — what the application printed before it died
  • Events — what Kubernetes did (pulled image, scheduled, killed)
  • Describe — the pod spec, resource limits, exit codes

You have to gather all three, read them together, and mentally correlate them to find the root cause. For experienced engineers this takes 5-10 minutes. For developers who don't live in kubectl every day, it can take much longer.


The new way

I built nxs — an open-source CLI that does all of this in one command.

nxs k8s debug --pod crash-loop-demo -n nxs-demo-bad
Enter fullscreen mode Exit fullscreen mode

That's it. nxs automatically:

  1. Fetches the pod logs (including --previous for crashed containers)
  2. Fetches kubectl describe output
  3. Sends both to AI for root cause analysis
  4. Returns: what broke, why, and exact fix commands

Here's the real output from running it against that broken pod:

╔══════════════════════════════════════════════════════════╗
║  ⚡ nxs                                        v2.0.0  ║
║     Kubernetes deep-dive debugger                        ║
╚══════════════════════════════════════════════════════════╝

────────────────────────────────────────────────────────────
  ☸  KUBERNETES DETECTED
────────────────────────────────────────────────────────────

📋 SUMMARY

  The pod 'crash-loop-demo' is crashing in a loop due to an error
  in the container. The container exits with a non-zero exit code
  after failing to find a configuration file.

🔍 ROOT CAUSE

  1. The container exits with code 1 — the config file
     /etc/app/config.yaml does not exist inside the container.
  2. Kubernetes repeatedly restarts the container due to its
     restart policy, leading to CrashLoopBackOff.
  3. No ConfigMap or volume is mounted to provide the config file.

💡 FIX STEPS

  1. Ensure the required config file is mounted via a ConfigMap.
  2. Verify the container command matches what exists in the image.
  3. Adjust restart policy or fix the application error handling.

💻 REMEDIATION COMMANDS

  ┌─ shell ────────────────────────────────────────────────────────┐
  │ 1. kubectl describe pod crash-loop-demo -n nxs-demo-bad        │
  │ 2. kubectl logs crash-loop-demo -n nxs-demo-bad                │
  │ 3. kubectl exec -it crash-loop-demo -n nxs-demo-bad -- /bin/sh │
  └─────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Plain English. Root cause numbered. Commands ready to copy.
No manual log reading. No Googling. No Stack Overflow.


It also works with piped logs

If you already have the logs, just pipe them:

# From a file
nxs k8s debug pod-error.log

# From kubectl directly
kubectl logs my-pod --previous | nxs k8s debug --stdin

# Full describe + logs combined
kubectl describe pod my-pod | nxs k8s debug --stdin
Enter fullscreen mode Exit fullscreen mode

Debug an entire deployment at once

Got a deployment with 3 replicas all crashing? Instead of checking each pod:

nxs k8s debug --deployment my-app -n production
Enter fullscreen mode Exit fullscreen mode

nxs fetches logs and describe from all pods in the deployment concurrently and analyzes them together — so you get one diagnosis instead of three.


Works without an AI key

No API key? No problem. nxs has a smart mock mode that pattern-matches common errors (CrashLoopBackOff, OOMKilled, ImagePullBackOff, Pending) and returns accurate responses without any AI call.

To add a free Groq key (recommended — much more accurate):

nxs config --setup
# Groq is free: console.groq.com
Enter fullscreen mode Exit fullscreen mode

Install

npm install -g @nextsight/nxs-cli
Enter fullscreen mode Exit fullscreen mode

Requirements: Node.js 18+, kubectl configured


What's next

This is article 1 in the "DevOps in 1 Command" series.

Next up: How I catch OOMKills before they happen in production — using nxs predict to surface at-risk pods before Kubernetes kills them.


nxs is open source. Star it on GitHub: https://github.com/gauravtayade11/nxs

Install: npm install -g @nextsight/nxs-cli

Top comments (0)