Sakthivel C

Posted on Apr 10

Why Your Kubernetes Pods Scale Slowly (And How to Fix It)

#devops #aws #cloud #kubernetes

The Problem
Why Autoscaling Feels Slow
The Fix: Placeholder Pods
How to Set It Up
What Happens During a Real Spike
Things to Keep in Mind
Wrapping Up

The Problem

You've set up the Horizontal Pod Autoscaler (HPA) in your cluster. Your app gets a sudden spike in traffic, and your existing pods start to throttle under the heavy load.

The HPA kicks in: "Hey, I need 3 more pods to service this traffic!"

But instead of scaling instantly, those pods sit in a Pending state for 4–5 minutes. In that window:

Requests are dropped.
Latency spikes.
You lose a huge number of customers.

Why are the pods stuck?

The Kubernetes scheduler can't place your pods because there is no room left on your existing nodes. This triggers the Cluster Autoscaler (CA) to provision a brand new node.

That process is slow:

VM Provisioning: The cloud provider has to spin up a new instance.
Node Bootstrapping: Joining the node to the cluster and installing dependencies.
Image Pulling: Downloading your container images to the new node.

By the time the node is ready, the damage is already done.

Why Autoscaling Feels Slow

Kubernetes autoscaling operates in two distinct layers:

HPA (Horizontal Pod Autoscaler): Scales pods based on metrics. This is fast (seconds).
CA (Cluster Autoscaler): Adds new nodes when pods can't be scheduled. This is slow (3–5 minutes).

HPA reacts in seconds, but CA reacts in minutes. That gap is where your availability suffers.

The Fix: Placeholder Pods

The Concept: Keep "dummy" pods running on your nodes to reserve space. They do nothing but hold capacity. When a real pod needs that space, Kubernetes evicts the dummy immediately, and your real pod schedules without waiting.

The evicted dummy then has nowhere to go, which signals the Cluster Autoscaler to provision a new node. The dummy lands there—restoring the buffer for the next spike.

This ensures you always have warm capacity ready. The slow provisioning happens in the background, not in your user's critical path.

How to Set It Up

Step 1: Create a Low-Priority Class

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: placeholder-pod-priority
value: -1
globalDefault: false
description: "Used for placeholder pods that can be evicted anytime"

A negative priority ensures any real pod—which defaults to priority 0—will always win. The scheduler will immediately evict the placeholder to make room for your application pod.

Step 2: Deploy the Placeholder Pods

apiVersion: apps/v1
kind: Deployment
metadata:
  name: placeholder
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: placeholder
  template:
    metadata:
      labels:
        app: placeholder
    spec:
      priorityClassName: placeholder-pod-priority
      terminationGracePeriodSeconds: 0
      containers:
        - name: placeholder
          image: registry.k8s.io/pause:3.9
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

Key details in this manifest:

pause image: This is the smallest possible container; it does nothing and consumes virtually no resources.

resources.requests: This tells Kubernetes to reserve this specific amount of space. Match this roughly to your app's requirements.

terminationGracePeriodSeconds: 0: Ensures the eviction is instant, handing the spot to your real pod without any shutdown delay.

Step 3: Verify Your App's Priority

If you haven't explicitly set a priorityClassName on your application deployment, it defaults to 0. Since 0 is higher than -1, your real pods will always preempt the placeholders automatically.

What Happens During a Real Spike

Traffic increases → HPA requests 3 new pods.
Scheduler looks for space → finds it (placeholder pods are holding it).
Placeholder pods get evicted instantly → real pods schedule in seconds.
Evicted placeholders are now in Pending state.
Cluster Autoscaler sees Pending pods → provisions a new node.
Placeholders land on the new node → buffer is restored for next time.

Things to Keep in Mind

Cost Trade-off: Placeholder pods reserve real node capacity, meaning you are essentially paying for "warm" standby nodes.
Namespace Scope: Deploy placeholders in the same namespace as your workloads, or tune them per-namespace based on criticality.
Works Best with CA: This pattern targets the node provisioning delay specifically. If your nodes already have massive amounts of spare capacity, you don't need this.

Wrapping Up

Cluster Autoscaler is not broken—it's just slow by design because provisioning VMs takes time. Placeholder pods let you work with that constraint. Your HPA scales instantly into pre-warmed capacity, and the slow provisioning happens in the background where it belongs.

DEV Community