sushma korati

Posted on Jan 29

When Kubernetes Ignores Your Pods: Understanding Scheduler Failures

#kubernetes #productionfailure #platformengineering

You’ve just created a Kubernetes Pod.
You’re happily waiting for it to become Running…

Seconds pass.
Nothing happens.

Those seconds turn into minutes.
Eventually, a considerable amount of time passes — okay, fine, 1/12th of an hour (6 minutes 😛) — which is basically forever in the Kubernetes world.

That’s when curiosity kicks in. You check the Pod events and Kubernetes greets you with a neatly drafted message:

0/18 nodes are available: 1 node(s) exceed max volume count, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 3 Insufficient memory, 3 node(s) had untolerated taint {dedicated: myapp}, 3 node(s) had untolerated taint {dedicated: mytestapp}, 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/18 nodes are available: 14 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod.

To a seasoned Kubernetes engineer, this message might look straightforward — something that can be decoded with a few familiar kubectl commands.
But for someone new to the Kubernetes world, this single line can feel overwhelming..
In just one error message, Kubernetes expects you to understand taints, preemption, node affinity, resource constraints — all at once.

In this post, we’ll break this message down piece by piece and map each part back to the actual cluster state. By the end, the next time Kubernetes says “0/X nodes are available”, you won’t panic — you might even appreciate how clearly the scheduler is telling you why your Pod doesn’t belong anywhere..

Let's Being with the Message....

Once again, here’s the message:

0/18 nodes are available: 1 node(s) exceed max volume count, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 3 Insufficient memory, 3 node(s) had untolerated taint {dedicated: myapp}, 3 node(s) had untolerated taint {dedicated: mytestapp}, 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/18 nodes are available: 14 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod.

This message can be divided into 2 sections, nodes and Preemption section.

Node section: The first thing we notice is 0/18, which simply means the cluster has 18 worker nodes, and none of them are currently suitable for scheduling this Pod.

Now, ignore the text for a moment and just add the numbers:

18 = 1 + 2 + 3 + 3 + 3 + 6

Perfect!!!
This tells us Kubernetes evaluated all nodes and classified each one under exactly one reason.

Preemption section:
This part answers a different question: If Kubernetes evicts some running Pods, will that help schedule this Pod?

Again, since we have 18 nodes:

18 = 14 + 4

Note: Preemption cannot fix taints, affinity rules, or volume limits.

Is this true? (Be curious, question k8s cluster 😉 )

The numbers add up, which is good.
But how do we verify these claims and fix our Pod spec if possible?

x node(s) exceed max volume count
x node(s) had untolerated taint
x Insufficient memory
x node(s) didn't match Pod's node affinity/selector.
x Preemption is not helpful for scheduling
x No preemption victims found for incoming pod.

Let’s go through the most common and actionable ones.

First get the list of nodes in the cluster , along with description of each node.

kubectl get nodes              
NAME            STATUS   ROLES           AGE   VERSION
192.168.0.51    Ready    master,worker   18h   v1.29.14
192.168.0.52    Ready    master,worker   17h   v1.29.14
192.168.1.52    Ready    master,worker   17h   v1.29.14
....

Issue: Untolerated taint

Means the node has taints, but your pods is not "tolerated"

Check the taints on each node.

kubectl describe node 192.168.0.51
....
Taints:             dedicated=mytestapp:NoExecute
...

Solution: For the scheduler to schedule your pod, add the appropriate toleration in pod spec.

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "mytestapp"
  effect: "NoExecute"

Issue: Insufficient memory

States that some of the nodes does not have memory required by pod.

Check the resources available in the node

kubectl describe node 192.168.xxx
...
Capacity:
  cpu:                4
  ephemeral-storage:  104742892Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16139256Ki
  pods:               110
Allocatable:
  cpu:                3910m
  ephemeral-storage:  101893885258
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             13695652659200m
  pods:               110
....

check the resources requested by the pod

kubectl describe pod my-pod
....
Containers:
    Limits:
      memory:  999990Gi
      cpu:  1
    Requests:
      memory:   199990Gi
      cpu:  1

Solution: There are two ways to solve the issue,

re-check if the pod needs such high memory, if not reduce.
Assess the importance of the pod and assign appropriate priority. Checking the priority.

kubectl describe pod my-pod
.....
Priority:         0

Issue: Didn't match Pod's node affinity/selector.

Check if the pod has any affinities or selectors assigned to it.

kubectl describe pod my-pod
....
affinity:
    nodeAffinity:
      .....
    podAffinity:
      .....
    podAntiAffinity:
      .....
Node-Selector:
....

Best Practice: Its always a good to assess if these constrains are actually needed, if not remove.

Closing thoughts..

A quick reality check before the actual closing, although we can verify all these messages, it’s worth noting that not all of them are realistically solvable.

For example, “exceed max volume count” is exactly what it sounds like. There are only two ways out: 1.Add a new node — which usually means extra cost ($$) or 2. Detach or reduce attached volumes — a truly genius solution, I know 😄 . Similarly, preemption-related issues. So for this post, I’m intentionally skipping these cases.

close to closing this now..
I hope you were able to understand why kuberenetes is rejecting to place the pods..

Kubernetes has given us immense control over scheduling — taints, tolerations, affinity, priorities, topology spread, and more.

But with great power comes great responsibility 😄
Before using these features, sit with your workload and ask:
What is mandatory?
What is nice to have?
What can be relaxed?

The scheduler is not broken — it’s doing exactly what we asked it to do.

If you’ve read this far, thanks!
Feel free to share feedback or your own scheduling stories 🙂