VPA vs HPA in Kubernetes: Why Most Teams Choose the Wrong Autoscaler

#kubernetes #devops #cloudnative #platform

Most Kubernetes teams reach for HPA first. It's visible, familiar, and the CPU dashboard makes the decision feel obvious. When traffic spikes, pods scale out. Clean mental model.

The problem: HPA solves one specific failure mode — traffic-driven throughput degradation. An under-resourced pod doesn't need more replicas. It needs more CPU. More replicas of a starved pod just gives you more starved pods.

The Core Distinction

HPA and VPA are not two ways to do the same thing. They scale different dimensions:

HPA — Horizontal Pod Autoscaler
Scales replica count. Trigger: load (CPU, memory, custom metrics).
Solves: traffic-driven saturation. Risk: cold start amplification,
latency spikes during scale-out.

VPA — Vertical Pod Autoscaler
Scales resource requests and limits. Trigger: resource efficiency gap.
Solves: OOM kills, CPU throttling, mis-sized pods. Risk: eviction disruption, node fragmentation at scale.

HPA doesn't prevent OOM kills.
VPA doesn't absorb traffic bursts.

Applying the wrong one means you're solving for a failure that isn't happening while leaving the actual failure mode unaddressed.

The Trap Nobody Documents

Running both without coordination creates oscillation:

VPA recommends larger CPU request → evicts pod to apply it
HPA sees replica count drop → interprets as scale-in signal
HPA removes a replica
VPA recalculates on a smaller pool
Cycle repeats

The result is instability driven entirely by the autoscalers fighting each other — not by any real workload condition. Nodes fragment. Scheduler pressure builds.

The coordination rule: VPA must not operate in Auto mode on any resource dimension HPA is also watching. In practice — VPA handles memory right-sizing, HPA handles CPU-driven replica scaling. Different axes, no interaction.

The Decision Framework

Use HPA when:

Stateless workloads with interchangeable replicas
Traffic-driven, burst-shaped load patterns
CPU is a reliable proxy for demand
Individual pod sizing is already correct

Use VPA when:

Steady, predictable load patterns
Pods are consistently OOM-killed or CPU-throttled
Resource requests were set by guesswork
Right-sizing over time matters more than burst absorption

Use both — with constraints:

VPA in Recommendation or Initial mode only (not Auto)
VPA establishes correct baseline sizing
HPA handles burst scaling above that baseline
Never let their trigger dimensions overlap

Scaling Decisions Are Cost Decisions

HPA adds pods — more replicas means more node capacity and more compute
spend. Aggressive scale-in thresholds mean you're often paying for idle
capacity during transition periods.

VPA's value is bin-packing efficiency — right-sized pods fit more
workloads on fewer nodes. But a stale VPA recommendation window produces
oversized requests that waste capacity cluster-wide.

The autoscaler is the last decision. Diagnose the failure mode first. Then pick the tool.

Full post with decision framework, failure mode breakdown, and
coordination rules: https://www.rack2cloud.com/vpa-vs-hpa-kubernetes/