Muskan

Posted on Jun 15

Kubectl pod stuck in Pending state: 7 reasons and fixes

#kubernetes #devops

Quick take

A pod in Pending means the scheduler refused to place it on a node. Nine times out of ten, the answer is in the Events section of kubectl describe pod. The other one time, it is a quota, a PVC, or an admission webhook. Here are the seven reasons and how to fix each in under a minute.

If you only remember one command, remember this one:

kubectl describe pod <name>, then scroll to the Events block and read the FailedScheduling message verbatim.
That message maps directly to one of the seven scheduler vetoes in this post.
Skip Stack Overflow until you have read the events first.

Step zero: the scheduler veto list

The Kubernetes scheduler has exactly seven categorical reasons to refuse a pod. I call this the scheduler veto list, and once you have seen each one with your own eyes, debugging pending pods stops being a guessing game.

Run this every time, first:

kubectl describe pod <pod-name>

Look at the Events table at the bottom. The Reason column will say FailedScheduling and the Message column will tell you which veto fired. The rest of this post walks each veto in order of frequency.

Reason 1: not enough CPU or memory on any node

The single most common cause. The message looks like 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory.

The scheduler is telling you that no node in the cluster has enough of the resource the pod requested.

The fix

Three options, in order of preference:

Lower the pod's resource requests if they are inflated. Most apps ask for 4x what they actually use. Check the real usage with kubectl top pod over a few days.
Scale up the node group if requests are honest. On EKS, GKE, and AKS this is one line in the autoscaler config.
Run Karpenter if the cluster autoscaler is slow. Karpenter provisions nodes in 30 to 60 seconds in 2026, versus 3 to 5 minutes for the legacy autoscaler.

If you see Insufficient ephemeral-storage, the pod requested disk on the node that is not available. Same fix shape: lower the request or grow the node.

Reason 2: nodeSelector or affinity has no match

The message is 0/N nodes are available: N node(s) didn't match Pod's node affinity/selector.

This means the pod has a nodeSelector, nodeAffinity, or topologySpreadConstraints rule that no current node satisfies.

Common traps

Misspelled label values like gpu: true versus gpu: "true". K8s labels are strings.
Custom labels that were never applied to the node. Verify with kubectl get nodes --show-labels | grep <key>.
Zone constraints in single-zone clusters. A topology.kubernetes.io/zone: us-east-1c selector on a cluster with only 1a and 1b nodes will never schedule.

The fix is either relaxing the constraint, fixing the typo, or adding the label to a node with kubectl label node <name> <key>=<value>.

Reason 3: a taint without a matching toleration

The message is 0/N nodes are available: N node(s) had untolerated taint {<key>}: NoSchedule.

Taints are the scheduler's way of saying "only special pods land here." Common taints you will hit in 2026:

node.kubernetes.io/not-ready on a node that just rebooted
node.kubernetes.io/unschedulable on a cordoned node
nvidia.com/gpu on GPU nodes
karpenter.sh/disrupted during node consolidation

The fix

Either add a matching toleration to the pod spec, or untaint the node with kubectl taint nodes <name> <key>:NoSchedule-. The trailing minus is what removes the taint. I have spent embarrassing minutes hunting for that minus.

Reason 4: a PVC stuck in Pending

A pod that mounts a PersistentVolumeClaim cannot start until the PVC is Bound. If the PVC itself is Pending, the pod waits forever.

Check with kubectl get pvc -n <namespace>. If the STATUS column says Pending, the underlying problem is one of:

No storage class default in the cluster. Set one with kubectl patch storageclass <name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'.
The requested storage class has no provisioner. Check with kubectl get sc and look at the PROVISIONER column.
Quota on PVs in the namespace, often missed.
A cluster without dynamic provisioning at all. Some on-prem clusters require pre-created PVs that match the PVC's selectors.

The kubectl describe pvc output will tell you which of these fired.

Reason 5: image pull failure dressed up as Pending

This one is mislabeled in some Kubernetes versions. The pod is technically pending the first container, but the actual failure is ErrImagePull or ImagePullBackOff on the image.

The events will say Failed to pull image. The common causes:

Private registry without imagePullSecret on the ServiceAccount or pod spec.
Wrong image tag that does not exist in the registry. Check the registry directly.
Rate limiting on Docker Hub, which kicks in around 100 anonymous pulls per six hours in 2026.

The fix

Authenticate to the registry, fix the tag, or move the image to a registry that does not rate limit. Most teams I work with now mirror Docker Hub images into ECR or GCR for this exact reason.

Reason 6: a ResourceQuota or LimitRange in the namespace

The message will say something like exceeded quota: <quota-name>, requested: requests.cpu=2, used: requests.cpu=10, limited: requests.cpu=10.

This is a namespace-level admission rejection, not a scheduling rejection. The pod technically never reaches the scheduler. It still shows up as Pending in kubectl get pods though, which is confusing.

Two fixes, depending on intent:

If the quota is correct, kill some old pods or lower the new pod's request.
If the quota is stale, bump it with kubectl edit resourcequota <name>.

kubectl describe namespace <name> shows quota usage at a glance.

Reason 7: an admission webhook rejected the pod

The newest member of the list. Pod Security Standards, Kyverno, OPA Gatekeeper, and similar policy engines now reject pods before scheduling in most production clusters.

The message looks like admission webhook "validate.kyverno.svc" denied the request: <reason>.

Common 2026 admission blocks

Pod Security Standards in restricted mode rejecting runAsUser: 0 or missing seccompProfile.
Kyverno or Gatekeeper policies requiring specific labels, image registries, or resource limits.
Network policy controllers that pre-validate networking before allowing the pod.

The fix depends on the policy. The general pattern is to either make the pod compliant or get an exception added to the policy. Don't disable the policy.

When the seven do not cover it

This is the honest part. Three less common cases break the model.

The scheduler is itself down. If kube-system/kube-scheduler is crashlooping or partitioned from etcd, no pods schedule. Check kubectl get pods -n kube-system.

The node is full of system pods. Daemonsets and the kubelet reserve a slice of every node. On t3.small instances I have seen this leave only 200m CPU and 150Mi memory for workload pods, which fails for almost anything realistic.

Cluster autoscaler is misconfigured. It can refuse to add nodes because of cool-down periods, max-size limits, or unhealthy ASGs. The autoscaler logs are where you check, not the pod events.

Frequently asked questions

How long should a pod stay in Pending before I worry?
On a healthy cluster with autoscaling, 30 seconds is normal, 60 seconds is suspect, and 5 minutes means something is wrong. Always start with kubectl describe.

Why does my pod show Pending but the Events block is empty?
The events ring buffer in K8s has a default retention of about an hour. If the pod has been pending longer, the original FailedScheduling event has aged out. Delete the pod, recreate it, and watch the events live with kubectl get events -w.

Does Karpenter solve Pending pods automatically?
Only the resource-shortage case (Reason 1). Karpenter will provision a node that fits the pod within a minute. It does nothing for taints, affinity, PVCs, image pulls, quotas, or admission webhooks.

Can I get scheduler logs to see exactly why?
Yes, on managed clusters you can enable scheduler audit logging. On EKS this is via the audit log type, on GKE it is automatic in Cloud Logging. The events on the pod are usually enough though.

Is there a single command that diagnoses all seven?
Not in core K8s. kubectl describe pod covers six. For the autoscaler case you need autoscaler logs.

What is the last Pending pod you debugged?

If you remember a Pending pod that took an hour to diagnose, drop the FailedScheduling message in the comments. I will tell you which of the seven it was and what I would have checked first. The fastest debuggers I know have just seen each veto enough times to recognize it on sight.

DEV Community

Kubectl pod stuck in Pending state: 7 reasons and fixes

Quick take

Step zero: the scheduler veto list

Reason 1: not enough CPU or memory on any node

The fix

Reason 2: nodeSelector or affinity has no match

Common traps

Reason 3: a taint without a matching toleration

The fix

Reason 4: a PVC stuck in Pending

Reason 5: image pull failure dressed up as Pending

The fix

Reason 6: a ResourceQuota or LimitRange in the namespace

Reason 7: an admission webhook rejected the pod

Common 2026 admission blocks

When the seven do not cover it

Frequently asked questions

What is the last Pending pod you debugged?

Top comments (0)