Table Of Contents
- The Problem
- Why Autoscaling Feels Slow
- The Fix: Placeholder Pods
- How to Set It Up
- What Happens During a Real Spike
- Things to Keep in Mind
- Wrapping Up
The Problem
You've set up the Horizontal Pod Autoscaler (HPA) in your cluster. Your app gets a sudden spike in traffic, and your existing pods start to throttle under the heavy load.
The HPA kicks in: "Hey, I need 3 more pods to service this traffic!"
But instead of scaling instantly, those pods sit in a Pending state for 4–5 minutes. In that window:
- Requests are dropped.
- Latency spikes.
- You lose a huge number of customers.
Why are the pods stuck?
The Kubernetes scheduler can't place your pods because there is no room left on your existing nodes. This triggers the Cluster Autoscaler (CA) to provision a brand new node.
That process is slow:
- VM Provisioning: The cloud provider has to spin up a new instance.
- Node Bootstrapping: Joining the node to the cluster and installing dependencies.
- Image Pulling: Downloading your container images to the new node.
By the time the node is ready, the damage is already done.
Why Autoscaling Feels Slow
Kubernetes autoscaling operates in two distinct layers:
- HPA (Horizontal Pod Autoscaler): Scales pods based on metrics. This is fast (seconds).
- CA (Cluster Autoscaler): Adds new nodes when pods can't be scheduled. This is slow (3–5 minutes).
HPA reacts in seconds, but CA reacts in minutes. That gap is where your availability suffers.
The Fix: Placeholder Pods
The evicted dummy then has nowhere to go, which signals the Cluster Autoscaler to provision a new node. The dummy lands there—restoring the buffer for the next spike.
This ensures you always have warm capacity ready. The slow provisioning happens in the background, not in your user's critical path.
How to Set It Up
Step 1: Create a Low-Priority Class
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: placeholder-pod-priority
value: -1
globalDefault: false
description: "Used for placeholder pods that can be evicted anytime"
A negative priority ensures any real pod—which defaults to priority 0—will always win. The scheduler will immediately evict the placeholder to make room for your application pod.
Step 2: Deploy the Placeholder Pods
apiVersion: apps/v1
kind: Deployment
metadata:
name: placeholder
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: placeholder
template:
metadata:
labels:
app: placeholder
spec:
priorityClassName: placeholder-pod-priority
terminationGracePeriodSeconds: 0
containers:
- name: placeholder
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "500m"
memory: "512Mi"
Key details in this manifest:
pause image: This is the smallest possible container; it does nothing and consumes virtually no resources.
resources.requests: This tells Kubernetes to reserve this specific amount of space. Match this roughly to your app's requirements.
terminationGracePeriodSeconds: 0: Ensures the eviction is instant, handing the spot to your real pod without any shutdown delay.
Step 3: Verify Your App's Priority
If you haven't explicitly set a priorityClassName on your application deployment, it defaults to 0. Since 0 is higher than -1, your real pods will always preempt the placeholders automatically.
What Happens During a Real Spike
- Traffic increases → HPA requests 3 new pods.
- Scheduler looks for space → finds it (placeholder pods are holding it).
- Placeholder pods get evicted instantly → real pods schedule in seconds.
- Evicted placeholders are now in Pending state.
- Cluster Autoscaler sees Pending pods → provisions a new node.
- Placeholders land on the new node → buffer is restored for next time.
Things to Keep in Mind
- Cost Trade-off: Placeholder pods reserve real node capacity, meaning you are essentially paying for "warm" standby nodes.
- Namespace Scope: Deploy placeholders in the same namespace as your workloads, or tune them per-namespace based on criticality.
- Works Best with CA: This pattern targets the node provisioning delay specifically. If your nodes already have massive amounts of spare capacity, you don't need this.
Wrapping Up
Cluster Autoscaler is not broken—it's just slow by design because provisioning VMs takes time. Placeholder pods let you work with that constraint. Your HPA scales instantly into pre-warmed capacity, and the slow provisioning happens in the background where it belongs.
Top comments (0)