DEV Community

Aniket More
Aniket More

Posted on

How to Fix Kubernetes Errors: CrashLoopBackOff, ImagePullBackOff, Pending, OOMKilled, and More

A complete step-by-step guide to solving common Kubernetes pod and node issues — from Pending pods to OOMKilled containers.

💥1. CrashLoopBackOff

⚠️Issue:

A pod starts → crashes → Kubernetes restarts it → crashes again. The cycle continues with backoff delays.

🔍Causes:

  • Application error (bad code, runtime exception, misconfigured env vars).
  • Port binding failure (app tries to bind to an already-used port).
  • Dependency not available (DB, API, config file).
  • Wrong command/entrypoint in container spec.
  • Insufficient resources (CPU/memory OOMKilled).

🛠️Fixes:

Check logs:

  • kubectl logs <pod-name> -n <namespace>
  • Validate command & args in the pod spec.
  • Ensure required configs/secrets/env vars are mounted correctly.
  • Verify resource requests/limits aren’t too low.
  • Fix application bugs and redeploy.
  • If OOMKilled → increase memory limit.

⛔2. ImagePullBackOff

⚠️Issue:
Pod can’t pull the container image from the registry.

🔍Causes:

  • Incorrect image name or tag.
  • Image doesn’t exist in the registry.
  • Private registry requires credentials.
  • Network/DNS issues preventing access to registry.

🛠️Fixes:

  • Double-check image name & tag:
  • image: myregistry/myimage:1.0.0
  • If private registry → create secret:
  • kubectl create secret docker-registry regcred \ --docker-server=<registry-url> \ --docker-username=<username> \ --docker-password=<password> \ --docker-email=<email>
  • and reference it in your deployment spec.
  • Ensure cluster nodes have outbound internet/DNS resolution.
  • Verify registry is reachable.

🚫3. NodeNotReady

⚠️Issue:
Kubernetes marks a node as NotReady, and pods can’t be scheduled there.

🔍Causes:

  • Kubelet on the node is stopped or unhealthy.
  • Node lost connection to control plane (network failure).
  • Node out of resources (CPU, memory, disk pressure).
  • Cloud VM is shut down or unreachable.
  • Incorrect firewall/security group settings blocking traffic.

🛠️Fixes:

  • Check node status
  • kubectl describe node <node-name>
  • SSH into node and check kubelet service:
  • systemctl status kubelet
  • Verify disk space, CPU, memory.
  • Restart kubelet or Docker/Containerd if needed.
  • Ensure node can reach control plane (API server).
  • In cloud: confirm VM is running & networking is correct.
  • Drain and remove bad nodes if they can’t be recovered:
  • kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data kubectl delete node <node-name>

⏳1. Pending

⚠️Issue: Pod stuck in Pending state.

🔍Causes:

  • No nodes meet scheduling requirements (taints, tolerations, affinity, node selectors).
  • Insufficient cluster resources (CPU, memory).
  • PVC requested but storage not provisioned.

🛠️Fixes:

  • Check scheduling:
  • kubectl describe pod <pod>
  • Free or add resources / scale cluster.
  • Ensure PVC has a valid StorageClass.
  • Adjust affinity/taints/tolerations as needed.

🐳 2. RunContainerError

⚠️Issue: Pod scheduled but container can’t start.

🔍Causes:

  • Incorrect container command/entrypoint.
  • Permission issue mounting volumes.
  • SecurityContext preventing startup.
  • Missing binary inside container.

🛠️Fixes:

  • Inspect events:
  • kubectl describe pod <pod>
  • Validate command and args in YAML.
  • Check mounted volumes/permissions.
  • Test image locally with docker run.

🔑 3. SecretNotFound

⚠️Issue: Pod references a missing secret.

🔍Causes:

  • Secret not created.
  • Typo in secret name.
  • Wrong namespace (secrets are namespace-scoped).

🛠️Fixes:

  • Verify secret exists:
  • kubectl get secret -n <namespace>
  • Correct secret name in Deployment.
  • Re-create the secret if deleted.

⚙️ 4. ConfigMapNotFound

⚠️Issue: Pod references a missing ConfigMap.

🔍Causes:

ConfigMap not created yet.
Wrong namespace.
Typo in name.

🛠️Fixes:

  • Check existence:
  • kubectl get configmap -n <namespace>
  • Fix YAML reference.
  • Apply ConfigMap before deploying pods.

💥 5. OOMKilled

⚠️Issue: Pod restarted due to out-of-memory kill.

🔍Causes:

  • Container exceeded memory limit.
  • Application memory leak.
  • Limit too strict.

🛠️Fixes:

  • Check pod events:
  • kubectl describe pod <pod>
  • Increase resources.limits.memory.
  • Optimize app memory usage.
  • Enable monitoring to detect leaks.

🌐 6. DNS Resolution Failure

⚠️Issue: Pod can’t resolve service names / external domains.

🔍 Causes:

  • coredns pods not running/healthy.
  • Network policies blocking DNS.
  • Node network misconfiguration.
  • Wrong /etc/resolv.conf.

🛠️Fixes:

  • Check CoreDNS:
  • kubectl get pods -n kube-system -l k8s-app=kube-dns
  • Restart CoreDNS if needed.
  • Validate kubectl exec -- nslookup kubernetes.default.
  • Review NetworkPolicies / firewall.

🖥️ 7. NodeNotReady / Node Lost

⚠️Issue: Node marked NotReady or Unknown.

🔍 Causes:

  • Kubelet stopped/crashed.
  • Node lost connectivity to control plane.
  • Out of resources (disk, memory, CPU).
  • VM/physical node powered off.

🛠️Fixes:

  • Check node status:
  • kubectl describe node <node>
  • SSH into node → check kubelet:
  • systemctl status kubelet
  • Restart kubelet/container runtime.
  • Ensure node can reach API server.
  • Drain & replace node if unrecoverable:
  • kubectl drain <node> --ignore-daemonsets --delete-emptydir-data kubectl delete node <node>

Top comments (0)