A complete step-by-step guide to solving common Kubernetes pod and node issues — from Pending pods to OOMKilled containers.
💥1. CrashLoopBackOff
⚠️Issue:
A pod starts → crashes → Kubernetes restarts it → crashes again. The cycle continues with backoff delays.
🔍Causes:
- Application error (bad code, runtime exception, misconfigured env vars).
- Port binding failure (app tries to bind to an already-used port).
- Dependency not available (DB, API, config file).
- Wrong command/entrypoint in container spec.
- Insufficient resources (CPU/memory OOMKilled).
🛠️Fixes:
Check logs:
kubectl logs <pod-name> -n <namespace>
- Validate command & args in the pod spec.
- Ensure required configs/secrets/env vars are mounted correctly.
- Verify resource requests/limits aren’t too low.
- Fix application bugs and redeploy.
- If OOMKilled → increase memory limit.
⛔2. ImagePullBackOff
⚠️Issue:
Pod can’t pull the container image from the registry.
🔍Causes:
- Incorrect image name or tag.
- Image doesn’t exist in the registry.
- Private registry requires credentials.
- Network/DNS issues preventing access to registry.
🛠️Fixes:
- Double-check image name & tag:
- image: myregistry/myimage:1.0.0
- If private registry → create secret:
kubectl create secret docker-registry regcred \ --docker-server=<registry-url> \ --docker-username=<username> \ --docker-password=<password> \ --docker-email=<email>
- and reference it in your deployment spec.
- Ensure cluster nodes have outbound internet/DNS resolution.
- Verify registry is reachable.
🚫3. NodeNotReady
⚠️Issue:
Kubernetes marks a node as NotReady, and pods can’t be scheduled there.
🔍Causes:
- Kubelet on the node is stopped or unhealthy.
- Node lost connection to control plane (network failure).
- Node out of resources (CPU, memory, disk pressure).
- Cloud VM is shut down or unreachable.
- Incorrect firewall/security group settings blocking traffic.
🛠️Fixes:
- Check node status
kubectl describe node <node-name>
- SSH into node and check kubelet service:
systemctl status kubelet
- Verify disk space, CPU, memory.
- Restart kubelet or Docker/Containerd if needed.
- Ensure node can reach control plane (API server).
- In cloud: confirm VM is running & networking is correct.
- Drain and remove bad nodes if they can’t be recovered:
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data kubectl delete node <node-name>
⏳1. Pending
⚠️Issue: Pod stuck in Pending state.
🔍Causes:
- No nodes meet scheduling requirements (taints, tolerations, affinity, node selectors).
- Insufficient cluster resources (CPU, memory).
- PVC requested but storage not provisioned.
🛠️Fixes:
- Check scheduling:
kubectl describe pod <pod>
- Free or add resources / scale cluster.
- Ensure PVC has a valid StorageClass.
- Adjust affinity/taints/tolerations as needed.
🐳 2. RunContainerError
⚠️Issue: Pod scheduled but container can’t start.
🔍Causes:
- Incorrect container command/entrypoint.
- Permission issue mounting volumes.
- SecurityContext preventing startup.
- Missing binary inside container.
🛠️Fixes:
- Inspect events:
kubectl describe pod <pod>
- Validate command and args in YAML.
- Check mounted volumes/permissions.
- Test image locally with docker run.
🔑 3. SecretNotFound
⚠️Issue: Pod references a missing secret.
🔍Causes:
- Secret not created.
- Typo in secret name.
- Wrong namespace (secrets are namespace-scoped).
🛠️Fixes:
- Verify secret exists:
kubectl get secret -n <namespace>
- Correct secret name in Deployment.
- Re-create the secret if deleted.
⚙️ 4. ConfigMapNotFound
⚠️Issue: Pod references a missing ConfigMap.
🔍Causes:
ConfigMap not created yet.
Wrong namespace.
Typo in name.
🛠️Fixes:
- Check existence:
kubectl get configmap -n <namespace>
- Fix YAML reference.
- Apply ConfigMap before deploying pods.
💥 5. OOMKilled
⚠️Issue: Pod restarted due to out-of-memory kill.
🔍Causes:
- Container exceeded memory limit.
- Application memory leak.
- Limit too strict.
🛠️Fixes:
- Check pod events:
kubectl describe pod <pod>
- Increase resources.limits.memory.
- Optimize app memory usage.
- Enable monitoring to detect leaks.
🌐 6. DNS Resolution Failure
⚠️Issue: Pod can’t resolve service names / external domains.
🔍 Causes:
- coredns pods not running/healthy.
- Network policies blocking DNS.
- Node network misconfiguration.
- Wrong /etc/resolv.conf.
🛠️Fixes:
- Check CoreDNS:
kubectl get pods -n kube-system -l k8s-app=kube-dns
- Restart CoreDNS if needed.
- Validate kubectl exec -- nslookup kubernetes.default.
- Review NetworkPolicies / firewall.
🖥️ 7. NodeNotReady / Node Lost
⚠️Issue: Node marked NotReady or Unknown.
🔍 Causes:
- Kubelet stopped/crashed.
- Node lost connectivity to control plane.
- Out of resources (disk, memory, CPU).
- VM/physical node powered off.
🛠️Fixes:
- Check node status:
kubectl describe node <node>
- SSH into node → check kubelet:
systemctl status kubelet
- Restart kubelet/container runtime.
- Ensure node can reach API server.
- Drain & replace node if unrecoverable:
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data kubectl delete node <node>
Top comments (0)