mibii

Posted on Apr 18

Kubernetes Cheatsheet: Daily Commands Every Developer Actually Uses

#kubernetes #devops #cheatsheet #kubectl

What this is: The 20% of Kubernetes that covers 80% of daily work. Cluster state, pod management, debugging, deployments, secrets, logs, resource inspection — organized by task, not by resource type.

⚡ Cluster State — Is Everything OK?

# Overall health check
kubectl get nodes
kubectl get nodes -o wide          # with IPs and OS info

# All resources across all namespaces
kubectl get all -A

# What's failing right now
kubectl get pods -A | grep -v Running | grep -v Completed

# Cluster events (recent activity, errors)
kubectl get events --sort-by='.lastTimestamp'
kubectl get events -A --sort-by='.lastTimestamp' | tail -30

# Resource usage (requires metrics-server)
kubectl top nodes
kubectl top pods -A
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory

🏃 Pods — Daily Inspection

# List pods
kubectl get pods                              # current namespace
kubectl get pods -n <namespace>               # specific namespace
kubectl get pods -A                           # all namespaces
kubectl get pods -o wide                      # with node placement

# Watch pods in real time
kubectl get pods -w
kubectl get pods -w -n <namespace>

# Pod details (why is it stuck?)
kubectl describe pod <pod-name>
kubectl describe pod <pod-name> -n <namespace>

# Get pod YAML
kubectl get pod <pod-name> -o yaml

# Delete a stuck pod
kubectl delete pod <pod-name>
kubectl delete pod <pod-name> --force --grace-period=0   # force kill

Pod status quick reference:

Status	What it means
`Running`	All containers up
`Pending`	Waiting for scheduling (check node resources)
`CrashLoopBackOff`	Container keeps crashing (check logs)
`OOMKilled`	Out of memory — increase limits
`ImagePullBackOff`	Can't pull image (wrong name, no auth)
`Terminating` (stuck)	Finalizer blocking — use `--force`
`Evicted`	Node was under pressure

📋 Logs — What Is It Saying?

# Basic logs
kubectl logs <pod-name>
kubectl logs <pod-name> -n <namespace>

# Follow live
kubectl logs -f <pod-name>

# Last N lines
kubectl logs <pod-name> --tail=100

# Logs since time
kubectl logs <pod-name> --since=1h
kubectl logs <pod-name> --since=30m

# Multi-container pod — specify container
kubectl logs <pod-name> -c <container-name>

# Previous container (after crash)
kubectl logs <pod-name> --previous
kubectl logs <pod-name> -c <container-name> --previous

# All pods in a deployment at once
kubectl logs -l app=<label-value> --all-containers=true
kubectl logs -l app=<label-value> -f --tail=50

🔍 Debugging — Getting Inside

# Shell into a running pod
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -- /bin/sh          # if bash not available
kubectl exec -it <pod-name> -c <container> -- /bin/sh

# Run a one-off command
kubectl exec <pod-name> -- env
kubectl exec <pod-name> -- cat /etc/hosts
kubectl exec <pod-name> -- ls /app

# Debug with a temporary pod (when your pod has no shell)
kubectl run debug --image=busybox -it --rm -- /bin/sh
kubectl run debug --image=nicolaka/netshoot -it --rm -- /bin/bash

# Debug a specific node
kubectl debug node/<node-name> -it --image=ubuntu

# Copy files to/from pod
kubectl cp <pod-name>:/path/to/file ./local-file
kubectl cp ./local-file <pod-name>:/path/to/file

🚀 Deployments — Rolling Updates and Rollbacks

# List deployments
kubectl get deployments
kubectl get deploy -n <namespace>

# Deployment status
kubectl rollout status deployment/<name>

# Rollout history
kubectl rollout history deployment/<name>

# Update image
kubectl set image deployment/<name> <container>=<image>:<tag>
kubectl set image deployment/api api=myregistry/api:v2.1.0

# Scale
kubectl scale deployment/<name> --replicas=5

# Rollback
kubectl rollout undo deployment/<name>                        # to previous
kubectl rollout undo deployment/<name> --to-revision=2        # to specific

# Pause/resume rollout
kubectl rollout pause deployment/<name>
kubectl rollout resume deployment/<name>

# Restart all pods (rolling restart, zero downtime)
kubectl rollout restart deployment/<name>

# Force immediate update (re-pulls image, destroys and recreates)
kubectl rollout restart deployment/<name>

🗂️ Namespaces — Context Switching

# List namespaces
kubectl get namespaces

# Set default namespace for session
kubectl config set-context --current --namespace=<namespace>

# One-off in specific namespace
kubectl get pods -n kube-system

# View current context
kubectl config current-context
kubectl config get-contexts

# Switch context (cluster)
kubectl config use-context <context-name>

# Useful alias to avoid typing -n constantly
alias kns='kubectl config set-context --current --namespace'
# Usage: kns production

🔑 Secrets and ConfigMaps

# List
kubectl get secrets
kubectl get configmaps

# View (careful — values are base64)
kubectl describe secret <name>
kubectl get secret <name> -o yaml

# Decode a specific key
kubectl get secret <name> -o jsonpath='{.data.<key>}' | base64 -d

# Create secret from literal
kubectl create secret generic <name> \
  --from-literal=DB_PASSWORD=mysecret \
  --from-literal=API_KEY=abc123

# Create secret from file
kubectl create secret generic <name> --from-file=./secret.env

# Create docker registry secret
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=pass

# Create configmap
kubectl create configmap <name> --from-literal=ENV=production
kubectl create configmap <name> --from-file=./config.properties

# Edit in place
kubectl edit secret <name>
kubectl edit configmap <name>

📊 Resource Inspection — What Is Using What?

# Describe a node (capacity, allocations, events)
kubectl describe node <node-name>

# Check resource requests vs limits across pods
kubectl get pods -o custom-columns=\
"NAME:.metadata.name,\
CPU_REQ:.spec.containers[*].resources.requests.cpu,\
CPU_LIM:.spec.containers[*].resources.limits.cpu,\
MEM_REQ:.spec.containers[*].resources.requests.memory,\
MEM_LIM:.spec.containers[*].resources.limits.memory"

# Find pods without resource limits (danger)
kubectl get pods -o json | \
  jq -r '.items[] | select(.spec.containers[].resources.limits == null) | .metadata.name'

# PersistentVolumes and Claims
kubectl get pv
kubectl get pvc
kubectl get pvc -A
kubectl describe pvc <name>

🌐 Services and Networking

# List services
kubectl get services
kubectl get svc -A

# Describe (check ports, selectors, endpoints)
kubectl describe svc <name>

# Check endpoints (actual pod IPs behind a service)
kubectl get endpoints <service-name>

# Quick port-forward to test a service locally
kubectl port-forward svc/<service-name> 8080:80
kubectl port-forward pod/<pod-name> 9000:8080

# Expose a deployment as a service
kubectl expose deployment <name> --port=80 --target-port=8080 --type=ClusterIP

# Get Ingress
kubectl get ingress -A
kubectl describe ingress <name>

⚙️ Apply, Diff, Delete

# Apply from file
kubectl apply -f deployment.yaml
kubectl apply -f ./k8s/                    # all files in directory
kubectl apply -f ./k8s/ -R                # recursive

# Diff before applying (see what will change)
kubectl diff -f deployment.yaml

# Delete
kubectl delete -f deployment.yaml
kubectl delete pod <name>
kubectl delete deployment <name>
kubectl delete all -l app=<label>          # by label

# Dry run (what would happen without doing it)
kubectl apply -f deployment.yaml --dry-run=client
kubectl delete pod <name> --dry-run=client

🏷️ Labels and Selectors — Finding Things

# Filter by label
kubectl get pods -l app=api
kubectl get pods -l env=production,tier=backend
kubectl get pods -l 'env in (staging, production)'

# Show labels
kubectl get pods --show-labels

# Add label
kubectl label pod <name> env=production

# Remove label
kubectl label pod <name> env-

# Label a node (for scheduling)
kubectl label node <node-name> disktype=ssd

🛡️ RBAC — Quick Check

# What can I do?
kubectl auth can-i --list
kubectl auth can-i create pods
kubectl auth can-i delete deployments -n production

# What can a service account do?
kubectl auth can-i list pods \
  --as=system:serviceaccount:<namespace>:<sa-name>

# List roles and bindings
kubectl get roles -A
kubectl get rolebindings -A
kubectl get clusterroles | grep -v system:
kubectl get clusterrolebindings | grep -v system:

🔄 Jobs and CronJobs

# List
kubectl get jobs
kubectl get cronjobs

# Watch job progress
kubectl get jobs -w

# Trigger a CronJob manually
kubectl create job --from=cronjob/<name> <job-name>

# Check job logs
kubectl logs job/<job-name>

# Delete completed jobs
kubectl delete jobs --field-selector status.successful=1

🧹 Cleanup — Housekeeping

# Delete completed pods
kubectl delete pods --field-selector=status.phase=Succeeded -A
kubectl delete pods --field-selector=status.phase=Failed -A

# Delete evicted pods
kubectl get pods -A | grep Evicted | \
  awk '{print $2 " -n " $1}' | xargs -L1 kubectl delete pod

# Remove all in a namespace
kubectl delete all --all -n <namespace>

# Force delete stuck terminating namespace
kubectl get namespace <name> -o json | \
  jq '.spec.finalizers = []' | \
  kubectl replace --raw /api/v1/namespaces/<name>/finalize -f -

⚡ Essential Aliases

Add to ~/.bashrc or ~/.zshrc:

alias k='kubectl'
alias kgp='kubectl get pods'
alias kgpa='kubectl get pods -A'
alias kgpw='kubectl get pods -w'
alias kgs='kubectl get services'
alias kgd='kubectl get deployments'
alias kgn='kubectl get nodes'
alias kd='kubectl describe'
alias kdp='kubectl describe pod'
alias kl='kubectl logs'
alias klf='kubectl logs -f'
alias ke='kubectl exec -it'
alias kaf='kubectl apply -f'
alias kdf='kubectl delete -f'
alias kns='kubectl config set-context --current --namespace'
alias kctx='kubectl config use-context'
alias kctxs='kubectl config get-contexts'

After adding: source ~/.bashrc

🔧 kubectl Plugins Worth Installing

Install krew (plugin manager) first:

(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

Then install plugins:

Plugin	Install	What it does
ctx	`kubectl krew install ctx`	Fast context switching
ns	`kubectl krew install ns`	Fast namespace switching
neat	`kubectl krew install neat`	Clean YAML output (removes noise)
tree	`kubectl krew install tree`	Show ownership tree of resources
stern	`kubectl krew install stern`	Multi-pod log tailing with regex
top	built-in	Resource usage (requires metrics-server)

🆘 The 5-Minute Debugging Checklist

Pod not starting or crashing? Go in order:

# 1. What state is it in?
kubectl get pod <name> -o wide

# 2. Why? (look at Events section at the bottom)
kubectl describe pod <name>

# 3. What is it saying?
kubectl logs <name>
kubectl logs <name> --previous   # if already crashed

# 4. Can it reach its dependencies?
kubectl exec -it <name> -- curl http://<service-name>:<port>/health

# 5. What are its resource limits?
kubectl describe pod <name> | grep -A3 Limits

# 6. Check node pressure
kubectl describe node $(kubectl get pod <name> -o jsonpath='{.spec.nodeName}')

If describe shows OOMKilled → increase memory limits.
If ImagePullBackOff → check image name and registry credentials.
If Pending → check node capacity with kubectl describe node.
If CrashLoopBackOff → check logs with --previous flag.

April 2026. Tested against Kubernetes 1.32. Commands work on any managed cluster (EKS, GKE, AKS) and local setups (minikube, k3s, kind).

DEV Community