Aniket More

Posted on Sep 2

How to Fix Kubernetes Errors: CrashLoopBackOff, ImagePullBackOff, Pending, OOMKilled, and More

A complete step-by-step guide to solving common Kubernetes pod and node issues — from Pending pods to OOMKilled containers.

💥1. CrashLoopBackOff

⚠️Issue:

A pod starts → crashes → Kubernetes restarts it → crashes again. The cycle continues with backoff delays.

🔍Causes:

Application error (bad code, runtime exception, misconfigured env vars).
Port binding failure (app tries to bind to an already-used port).
Dependency not available (DB, API, config file).
Wrong command/entrypoint in container spec.
Insufficient resources (CPU/memory OOMKilled).

🛠️Fixes:

Check logs:

kubectl logs <pod-name> -n <namespace>
Validate command & args in the pod spec.
Ensure required configs/secrets/env vars are mounted correctly.
Verify resource requests/limits aren’t too low.
Fix application bugs and redeploy.
If OOMKilled → increase memory limit.

⛔2. ImagePullBackOff

⚠️Issue:
Pod can’t pull the container image from the registry.

🔍Causes:

Incorrect image name or tag.
Image doesn’t exist in the registry.
Private registry requires credentials.
Network/DNS issues preventing access to registry.

🛠️Fixes:

Double-check image name & tag:
image: myregistry/myimage:1.0.0
If private registry → create secret:
kubectl create secret docker-registry regcred \ --docker-server=<registry-url> \ --docker-username=<username> \ --docker-password=<password> \ --docker-email=<email>
and reference it in your deployment spec.
Ensure cluster nodes have outbound internet/DNS resolution.
Verify registry is reachable.

🚫3. NodeNotReady

⚠️Issue:
Kubernetes marks a node as NotReady, and pods can’t be scheduled there.

🔍Causes:

Kubelet on the node is stopped or unhealthy.
Node lost connection to control plane (network failure).
Node out of resources (CPU, memory, disk pressure).
Cloud VM is shut down or unreachable.
Incorrect firewall/security group settings blocking traffic.

🛠️Fixes:

Check node status
kubectl describe node <node-name>
SSH into node and check kubelet service:
systemctl status kubelet
Verify disk space, CPU, memory.
Restart kubelet or Docker/Containerd if needed.
Ensure node can reach control plane (API server).
In cloud: confirm VM is running & networking is correct.
Drain and remove bad nodes if they can’t be recovered:
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data kubectl delete node <node-name>

⏳1. Pending

⚠️Issue: Pod stuck in Pending state.

🔍Causes:

No nodes meet scheduling requirements (taints, tolerations, affinity, node selectors).
Insufficient cluster resources (CPU, memory).
PVC requested but storage not provisioned.

🛠️Fixes:

Check scheduling:
kubectl describe pod <pod>
Free or add resources / scale cluster.
Ensure PVC has a valid StorageClass.
Adjust affinity/taints/tolerations as needed.

🐳 2. RunContainerError

⚠️Issue: Pod scheduled but container can’t start.

🔍Causes:

Incorrect container command/entrypoint.
Permission issue mounting volumes.
SecurityContext preventing startup.
Missing binary inside container.

🛠️Fixes:

Inspect events:
kubectl describe pod <pod>
Validate command and args in YAML.
Check mounted volumes/permissions.
Test image locally with docker run.

🔑 3. SecretNotFound

⚠️Issue: Pod references a missing secret.

🔍Causes:

Secret not created.
Typo in secret name.
Wrong namespace (secrets are namespace-scoped).

🛠️Fixes:

Verify secret exists:
kubectl get secret -n <namespace>
Correct secret name in Deployment.
Re-create the secret if deleted.

⚙️ 4. ConfigMapNotFound

⚠️Issue: Pod references a missing ConfigMap.

🔍Causes:

ConfigMap not created yet.
Wrong namespace.
Typo in name.

🛠️Fixes:

Check existence:
kubectl get configmap -n <namespace>
Fix YAML reference.
Apply ConfigMap before deploying pods.

💥 5. OOMKilled

⚠️Issue: Pod restarted due to out-of-memory kill.

🔍Causes:

Container exceeded memory limit.
Application memory leak.
Limit too strict.

🛠️Fixes:

Check pod events:
kubectl describe pod <pod>
Increase resources.limits.memory.
Optimize app memory usage.
Enable monitoring to detect leaks.

🌐 6. DNS Resolution Failure

⚠️Issue: Pod can’t resolve service names / external domains.

🔍 Causes:

coredns pods not running/healthy.
Network policies blocking DNS.
Node network misconfiguration.
Wrong /etc/resolv.conf.

🛠️Fixes:

Check CoreDNS:
kubectl get pods -n kube-system -l k8s-app=kube-dns
Restart CoreDNS if needed.
Validate kubectl exec -- nslookup kubernetes.default.
Review NetworkPolicies / firewall.

🖥️ 7. NodeNotReady / Node Lost

⚠️Issue: Node marked NotReady or Unknown.

🔍 Causes:

Kubelet stopped/crashed.
Node lost connectivity to control plane.
Out of resources (disk, memory, CPU).
VM/physical node powered off.

🛠️Fixes:

Check node status:
kubectl describe node <node>
SSH into node → check kubelet:
systemctl status kubelet
Restart kubelet/container runtime.
Ensure node can reach API server.
Drain & replace node if unrecoverable:
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data kubectl delete node <node>