DEV Community

AttractivePenguin
AttractivePenguin

Posted on

Kubernetes Pod Stuck in Pending? Here's How to Debug It Like a Pro

Kubernetes Pod Stuck in Pending? Here's How to Debug It Like a Pro

You've deployed your application to Kubernetes, but something's wrong. Your pod is just sitting there, stubbornly stuck in Pending state. No errors, no crashes—just... waiting. Sound familiar?

This is one of the most common frustrations for developers working with Kubernetes. The good news? Once you know where to look, the fix is usually straightforward. In this guide, we'll walk through exactly how to diagnose and resolve pending pods, with real commands and scenarios you can use today.


What Does "Pending" Actually Mean?

When a pod is in Pending state, it means the Kubernetes scheduler hasn't been able to assign it to a node. This isn't about your container crashing—it hasn't even started yet. The scheduler is essentially saying, "I can't find a suitable home for this pod."

The reasons usually fall into these categories:

  • Insufficient resources: Not enough CPU, memory, or storage on available nodes
  • Node selection constraints: nodeSelector, nodeAffinity, or taints/tolerations that don't match
  • Persistent volume issues: PVCs that can't bind to a PV
  • Resource quotas: Limits that prevent scheduling in a namespace

Let's debug each of these systematically.


Step 1: Check Pod Events with kubectl describe

Your first stop is always kubectl describe pod. This shows the Events section at the bottom, which tells you exactly why the scheduler rejected your pod.

kubectl describe pod <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Look for the Events section at the bottom:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  12s   default-scheduler  0/3 nodes are available: 3 Insufficient cpu.
Enter fullscreen mode Exit fullscreen mode

This output tells you the scheduler tried all 3 nodes and none had enough CPU. The message is your first clue—use it to guide your next steps.


Step 2: Check Node Resources

If the events mention insufficient CPU or memory, check your nodes' available resources:

kubectl describe nodes | grep -A 5 "Allocated resources"
Enter fullscreen mode Exit fullscreen mode

Or get a cleaner view with:

kubectl top nodes
Enter fullscreen mode Exit fullscreen mode

You'll see something like:

NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node-1     1800m        90%    14Gi            70%
node-2     1500m        75%    12Gi            60%
node-3     1900m        95%    15Gi            75%
Enter fullscreen mode Exit fullscreen mode

If nodes are heavily utilized, you have a few options:

  1. Scale down less critical workloads
  2. Add more nodes to the cluster
  3. Reduce your pod's resource requests (if possible)

Check what your pod is requesting:

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.requests}'
Enter fullscreen mode Exit fullscreen mode

Step 3: Check Node Selectors and Affinities

If your pod uses nodeSelector or nodeAffinity, ensure nodes with matching labels exist:

# Check your pod's node selector
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 nodeSelector

# List nodes with their labels
kubectl get nodes --show-labels
Enter fullscreen mode Exit fullscreen mode

For nodeAffinity, the check is similar:

# Your pod spec might have:
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values:
          - ssd
Enter fullscreen mode Exit fullscreen mode

If no node has disktype=ssd, your pod will stay pending forever.

Fix: Either add the label to a node:

kubectl label node <node-name> disktype=ssd
Enter fullscreen mode Exit fullscreen mode

Or remove/modify the affinity rule in your pod spec.


Step 4: Check PVC Binding Issues

If your pod uses a PersistentVolumeClaim (PVC), ensure it's bound:

kubectl get pvc -n <namespace>
Enter fullscreen mode Exit fullscreen mode

You want to see:

NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-pvc    Bound    pvc-abc123-...                            10Gi       RWO            standard       5m
Enter fullscreen mode Exit fullscreen mode

If the status is Pending, the PVC can't find a matching PV. Check the storage class and access modes:

kubectl describe pvc <pvc-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Common issues:

  • StorageClass doesn't exist: Ensure the StorageClass is created
  • No PV available: If using manual provisioning, create a PV matching the PVC's requirements
  • Access mode mismatch: PVC requests ReadWriteMany but only ReadWriteOnce PVs exist

Step 5: Check Taints and Tolerations

Nodes can have taints that repel pods unless the pods have matching tolerations:

# Check node taints
kubectl describe nodes | grep -A 5 Taints
Enter fullscreen mode Exit fullscreen mode

Common taints:

Taints: node.kubernetes.io/not-ready:NoSchedule
Taints: node.kubernetes.io/unschedulable:NoSchedule
Taints: dedicated=gpu:NoSchedule
Enter fullscreen mode Exit fullscreen mode

If you see NoSchedule taints, your pod needs tolerations:

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"
Enter fullscreen mode Exit fullscreen mode

Or to remove the taint:

kubectl taint nodes <node-name> dedicated:NoSchedule-
Enter fullscreen mode Exit fullscreen mode

Real-World Scenarios

Scenario 1: Cluster Over-Provisioned

Symptoms: Multiple deployments stuck in Pending, events show "Insufficient cpu/memory"

Root Cause: Your cluster is running too many workloads for its capacity.

Solutions:

  • Remove unused deployments
  • Add nodes (horizontal scaling)
  • Reduce pod resource requests (vertical optimization)
# Find pods using most resources
kubectl top pods --all-namespaces --sort-by=memory
kubectl top pods --all-namespaces --sort-by=cpu
Enter fullscreen mode Exit fullscreen mode

Scenario 2: Node Selector Mismatch

Symptoms: Pod pending with message like "0/3 nodes are available: 3 node(s) didn't match node selector"

Root Cause: Pod requires a node label that doesn't exist.

Solution: Add the label or remove the constraint.

# Add label to make it schedulable
kubectl label node node-1 zone=us-east-1a
Enter fullscreen mode Exit fullscreen mode

Scenario 3: PVC Not Binding

Symptoms: Pod stuck, PVC shows Pending status

Root Cause: No PersistentVolume matches the PVC's requirements.

Solution: Create a matching PV or use dynamic provisioning:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: manual-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"
Enter fullscreen mode Exit fullscreen mode

Scenario 4: Resource Quotas Blocking

Symptoms: Pod pending, events mention "exceeded quota"

Root Cause: Namespace has a ResourceQuota limiting total resources.

Solution: Check and adjust the quota:

kubectl get resourcequota -n <namespace>
kubectl describe resourcequota <quota-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Either increase the quota or reduce resource requests in your deployment.


FAQ

Q: Why does my pod work in dev but not prod?

Different environments often have different node counts, resource limits, and storage classes. Always check:

  • Node count and resources (kubectl top nodes)
  • StorageClasses available (kubectl get storageclass)
  • ResourceQuotas (kubectl get quota)

Q: How do I see scheduler logs?

# For kubeadm clusters
kubectl logs -n kube-system kube-scheduler-<master-node-name>

# Or check control plane logs directly
journalctl -u kube-scheduler
Enter fullscreen mode Exit fullscreen mode

Q: Can I force a pod onto a specific node?

Yes, but only use this for debugging:

spec:
  nodeName: <node-name>
Enter fullscreen mode Exit fullscreen mode

This bypasses the scheduler entirely. For production, use nodeAffinity instead.

Q: What if I don't have enough nodes?

If you're running locally (minikube, kind, Docker Desktop), you're limited to one node by default. Consider:

  • Reducing resource requests
  • Using cluster autoscaler on managed Kubernetes
  • Adding nodes to your local cluster

Conclusion

A pod stuck in Pending state is frustrating but always diagnosable. The key is to follow a systematic approach:

  1. Start with kubectl describe pod — the Events section is your friend
  2. Check node resources — ensure capacity for your pod's requests
  3. Verify selectors and affinities — labels must match
  4. Confirm PVC binding — storage must be available
  5. Review taints and tolerations — pods need tolerations for tainted nodes

Once you've diagnosed the issue, the fix is usually straightforward: adjust resource requests, add missing labels, provision storage, or remove taints. Keep this guide handy, and you'll never be stuck wondering why your pod won't schedule.

Happy debugging! 🚀


What's your most confusing Kubernetes scheduling issue? Drop a comment below and I'll help you debug it.

Top comments (0)