What this is: The 20% of Kubernetes that covers 80% of daily work. Cluster state, pod management, debugging, deployments, secrets, logs, resource inspection — organized by task, not by resource type.
⚡ Cluster State — Is Everything OK?
# Overall health check
kubectl get nodes
kubectl get nodes -o wide # with IPs and OS info
# All resources across all namespaces
kubectl get all -A
# What's failing right now
kubectl get pods -A | grep -v Running | grep -v Completed
# Cluster events (recent activity, errors)
kubectl get events --sort-by='.lastTimestamp'
kubectl get events -A --sort-by='.lastTimestamp' | tail -30
# Resource usage (requires metrics-server)
kubectl top nodes
kubectl top pods -A
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory
🏃 Pods — Daily Inspection
# List pods
kubectl get pods # current namespace
kubectl get pods -n <namespace> # specific namespace
kubectl get pods -A # all namespaces
kubectl get pods -o wide # with node placement
# Watch pods in real time
kubectl get pods -w
kubectl get pods -w -n <namespace>
# Pod details (why is it stuck?)
kubectl describe pod <pod-name>
kubectl describe pod <pod-name> -n <namespace>
# Get pod YAML
kubectl get pod <pod-name> -o yaml
# Delete a stuck pod
kubectl delete pod <pod-name>
kubectl delete pod <pod-name> --force --grace-period=0 # force kill
Pod status quick reference:
| Status | What it means |
|---|---|
Running |
All containers up |
Pending |
Waiting for scheduling (check node resources) |
CrashLoopBackOff |
Container keeps crashing (check logs) |
OOMKilled |
Out of memory — increase limits |
ImagePullBackOff |
Can't pull image (wrong name, no auth) |
Terminating (stuck) |
Finalizer blocking — use --force
|
Evicted |
Node was under pressure |
📋 Logs — What Is It Saying?
# Basic logs
kubectl logs <pod-name>
kubectl logs <pod-name> -n <namespace>
# Follow live
kubectl logs -f <pod-name>
# Last N lines
kubectl logs <pod-name> --tail=100
# Logs since time
kubectl logs <pod-name> --since=1h
kubectl logs <pod-name> --since=30m
# Multi-container pod — specify container
kubectl logs <pod-name> -c <container-name>
# Previous container (after crash)
kubectl logs <pod-name> --previous
kubectl logs <pod-name> -c <container-name> --previous
# All pods in a deployment at once
kubectl logs -l app=<label-value> --all-containers=true
kubectl logs -l app=<label-value> -f --tail=50
🔍 Debugging — Getting Inside
# Shell into a running pod
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -- /bin/sh # if bash not available
kubectl exec -it <pod-name> -c <container> -- /bin/sh
# Run a one-off command
kubectl exec <pod-name> -- env
kubectl exec <pod-name> -- cat /etc/hosts
kubectl exec <pod-name> -- ls /app
# Debug with a temporary pod (when your pod has no shell)
kubectl run debug --image=busybox -it --rm -- /bin/sh
kubectl run debug --image=nicolaka/netshoot -it --rm -- /bin/bash
# Debug a specific node
kubectl debug node/<node-name> -it --image=ubuntu
# Copy files to/from pod
kubectl cp <pod-name>:/path/to/file ./local-file
kubectl cp ./local-file <pod-name>:/path/to/file
🚀 Deployments — Rolling Updates and Rollbacks
# List deployments
kubectl get deployments
kubectl get deploy -n <namespace>
# Deployment status
kubectl rollout status deployment/<name>
# Rollout history
kubectl rollout history deployment/<name>
# Update image
kubectl set image deployment/<name> <container>=<image>:<tag>
kubectl set image deployment/api api=myregistry/api:v2.1.0
# Scale
kubectl scale deployment/<name> --replicas=5
# Rollback
kubectl rollout undo deployment/<name> # to previous
kubectl rollout undo deployment/<name> --to-revision=2 # to specific
# Pause/resume rollout
kubectl rollout pause deployment/<name>
kubectl rollout resume deployment/<name>
# Restart all pods (rolling restart, zero downtime)
kubectl rollout restart deployment/<name>
# Force immediate update (re-pulls image, destroys and recreates)
kubectl rollout restart deployment/<name>
🗂️ Namespaces — Context Switching
# List namespaces
kubectl get namespaces
# Set default namespace for session
kubectl config set-context --current --namespace=<namespace>
# One-off in specific namespace
kubectl get pods -n kube-system
# View current context
kubectl config current-context
kubectl config get-contexts
# Switch context (cluster)
kubectl config use-context <context-name>
# Useful alias to avoid typing -n constantly
alias kns='kubectl config set-context --current --namespace'
# Usage: kns production
🔑 Secrets and ConfigMaps
# List
kubectl get secrets
kubectl get configmaps
# View (careful — values are base64)
kubectl describe secret <name>
kubectl get secret <name> -o yaml
# Decode a specific key
kubectl get secret <name> -o jsonpath='{.data.<key>}' | base64 -d
# Create secret from literal
kubectl create secret generic <name> \
--from-literal=DB_PASSWORD=mysecret \
--from-literal=API_KEY=abc123
# Create secret from file
kubectl create secret generic <name> --from-file=./secret.env
# Create docker registry secret
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass
# Create configmap
kubectl create configmap <name> --from-literal=ENV=production
kubectl create configmap <name> --from-file=./config.properties
# Edit in place
kubectl edit secret <name>
kubectl edit configmap <name>
📊 Resource Inspection — What Is Using What?
# Describe a node (capacity, allocations, events)
kubectl describe node <node-name>
# Check resource requests vs limits across pods
kubectl get pods -o custom-columns=\
"NAME:.metadata.name,\
CPU_REQ:.spec.containers[*].resources.requests.cpu,\
CPU_LIM:.spec.containers[*].resources.limits.cpu,\
MEM_REQ:.spec.containers[*].resources.requests.memory,\
MEM_LIM:.spec.containers[*].resources.limits.memory"
# Find pods without resource limits (danger)
kubectl get pods -o json | \
jq -r '.items[] | select(.spec.containers[].resources.limits == null) | .metadata.name'
# PersistentVolumes and Claims
kubectl get pv
kubectl get pvc
kubectl get pvc -A
kubectl describe pvc <name>
🌐 Services and Networking
# List services
kubectl get services
kubectl get svc -A
# Describe (check ports, selectors, endpoints)
kubectl describe svc <name>
# Check endpoints (actual pod IPs behind a service)
kubectl get endpoints <service-name>
# Quick port-forward to test a service locally
kubectl port-forward svc/<service-name> 8080:80
kubectl port-forward pod/<pod-name> 9000:8080
# Expose a deployment as a service
kubectl expose deployment <name> --port=80 --target-port=8080 --type=ClusterIP
# Get Ingress
kubectl get ingress -A
kubectl describe ingress <name>
⚙️ Apply, Diff, Delete
# Apply from file
kubectl apply -f deployment.yaml
kubectl apply -f ./k8s/ # all files in directory
kubectl apply -f ./k8s/ -R # recursive
# Diff before applying (see what will change)
kubectl diff -f deployment.yaml
# Delete
kubectl delete -f deployment.yaml
kubectl delete pod <name>
kubectl delete deployment <name>
kubectl delete all -l app=<label> # by label
# Dry run (what would happen without doing it)
kubectl apply -f deployment.yaml --dry-run=client
kubectl delete pod <name> --dry-run=client
🏷️ Labels and Selectors — Finding Things
# Filter by label
kubectl get pods -l app=api
kubectl get pods -l env=production,tier=backend
kubectl get pods -l 'env in (staging, production)'
# Show labels
kubectl get pods --show-labels
# Add label
kubectl label pod <name> env=production
# Remove label
kubectl label pod <name> env-
# Label a node (for scheduling)
kubectl label node <node-name> disktype=ssd
🛡️ RBAC — Quick Check
# What can I do?
kubectl auth can-i --list
kubectl auth can-i create pods
kubectl auth can-i delete deployments -n production
# What can a service account do?
kubectl auth can-i list pods \
--as=system:serviceaccount:<namespace>:<sa-name>
# List roles and bindings
kubectl get roles -A
kubectl get rolebindings -A
kubectl get clusterroles | grep -v system:
kubectl get clusterrolebindings | grep -v system:
🔄 Jobs and CronJobs
# List
kubectl get jobs
kubectl get cronjobs
# Watch job progress
kubectl get jobs -w
# Trigger a CronJob manually
kubectl create job --from=cronjob/<name> <job-name>
# Check job logs
kubectl logs job/<job-name>
# Delete completed jobs
kubectl delete jobs --field-selector status.successful=1
🧹 Cleanup — Housekeeping
# Delete completed pods
kubectl delete pods --field-selector=status.phase=Succeeded -A
kubectl delete pods --field-selector=status.phase=Failed -A
# Delete evicted pods
kubectl get pods -A | grep Evicted | \
awk '{print $2 " -n " $1}' | xargs -L1 kubectl delete pod
# Remove all in a namespace
kubectl delete all --all -n <namespace>
# Force delete stuck terminating namespace
kubectl get namespace <name> -o json | \
jq '.spec.finalizers = []' | \
kubectl replace --raw /api/v1/namespaces/<name>/finalize -f -
⚡ Essential Aliases
Add to ~/.bashrc or ~/.zshrc:
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgpa='kubectl get pods -A'
alias kgpw='kubectl get pods -w'
alias kgs='kubectl get services'
alias kgd='kubectl get deployments'
alias kgn='kubectl get nodes'
alias kd='kubectl describe'
alias kdp='kubectl describe pod'
alias kl='kubectl logs'
alias klf='kubectl logs -f'
alias ke='kubectl exec -it'
alias kaf='kubectl apply -f'
alias kdf='kubectl delete -f'
alias kns='kubectl config set-context --current --namespace'
alias kctx='kubectl config use-context'
alias kctxs='kubectl config get-contexts'
After adding: source ~/.bashrc
🔧 kubectl Plugins Worth Installing
Install krew (plugin manager) first:
(
set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew
)
Then install plugins:
| Plugin | Install | What it does |
|---|---|---|
| ctx | kubectl krew install ctx |
Fast context switching |
| ns | kubectl krew install ns |
Fast namespace switching |
| neat | kubectl krew install neat |
Clean YAML output (removes noise) |
| tree | kubectl krew install tree |
Show ownership tree of resources |
| stern | kubectl krew install stern |
Multi-pod log tailing with regex |
| top | built-in | Resource usage (requires metrics-server) |
🆘 The 5-Minute Debugging Checklist
Pod not starting or crashing? Go in order:
# 1. What state is it in?
kubectl get pod <name> -o wide
# 2. Why? (look at Events section at the bottom)
kubectl describe pod <name>
# 3. What is it saying?
kubectl logs <name>
kubectl logs <name> --previous # if already crashed
# 4. Can it reach its dependencies?
kubectl exec -it <name> -- curl http://<service-name>:<port>/health
# 5. What are its resource limits?
kubectl describe pod <name> | grep -A3 Limits
# 6. Check node pressure
kubectl describe node $(kubectl get pod <name> -o jsonpath='{.spec.nodeName}')
If describe shows OOMKilled → increase memory limits.
If ImagePullBackOff → check image name and registry credentials.
If Pending → check node capacity with kubectl describe node.
If CrashLoopBackOff → check logs with --previous flag.
April 2026. Tested against Kubernetes 1.32. Commands work on any managed cluster (EKS, GKE, AKS) and local setups (minikube, k3s, kind).
Top comments (0)