Most Kubernetes teams reach for HPA first. It's visible, familiar, and the CPU dashboard makes the decision feel obvious. When traffic spikes, pods scale out. Clean mental model.
The problem: HPA solves one specific failure mode — traffic-driven throughput degradation. An under-resourced pod doesn't need more replicas. It needs more CPU. More replicas of a starved pod just gives you more starved pods.
The Core Distinction
HPA and VPA are not two ways to do the same thing. They scale different dimensions:
HPA — Horizontal Pod Autoscaler
Scales replica count. Trigger: load (CPU, memory, custom metrics).
Solves: traffic-driven saturation. Risk: cold start amplification,
latency spikes during scale-out.
VPA — Vertical Pod Autoscaler
Scales resource requests and limits. Trigger: resource efficiency gap.
Solves: OOM kills, CPU throttling, mis-sized pods. Risk: eviction disruption, node fragmentation at scale.
HPA doesn't prevent OOM kills.
VPA doesn't absorb traffic bursts.
Applying the wrong one means you're solving for a failure that isn't happening while leaving the actual failure mode unaddressed.
The Trap Nobody Documents
Running both without coordination creates oscillation:
- VPA recommends larger CPU request → evicts pod to apply it
- HPA sees replica count drop → interprets as scale-in signal
- HPA removes a replica
- VPA recalculates on a smaller pool
- Cycle repeats
The result is instability driven entirely by the autoscalers fighting each other — not by any real workload condition. Nodes fragment. Scheduler pressure builds.
The coordination rule: VPA must not operate in Auto mode on any resource dimension HPA is also watching. In practice — VPA handles memory right-sizing, HPA handles CPU-driven replica scaling. Different axes, no interaction.
The Decision Framework
Use HPA when:
- Stateless workloads with interchangeable replicas
- Traffic-driven, burst-shaped load patterns
- CPU is a reliable proxy for demand
- Individual pod sizing is already correct
Use VPA when:
- Steady, predictable load patterns
- Pods are consistently OOM-killed or CPU-throttled
- Resource requests were set by guesswork
- Right-sizing over time matters more than burst absorption
Use both — with constraints:
- VPA in Recommendation or Initial mode only (not Auto)
- VPA establishes correct baseline sizing
- HPA handles burst scaling above that baseline
- Never let their trigger dimensions overlap
Scaling Decisions Are Cost Decisions
HPA adds pods — more replicas means more node capacity and more compute
spend. Aggressive scale-in thresholds mean you're often paying for idle
capacity during transition periods.
VPA's value is bin-packing efficiency — right-sized pods fit more
workloads on fewer nodes. But a stale VPA recommendation window produces
oversized requests that waste capacity cluster-wide.
The autoscaler is the last decision. Diagnose the failure mode first. Then pick the tool.
Full post with decision framework, failure mode breakdown, and
coordination rules: https://www.rack2cloud.com/vpa-vs-hpa-kubernetes/


Top comments (0)