DEV Community

NTCTech
NTCTech

Posted on • Originally published at rack2cloud.com

VPA vs HPA in Kubernetes: Why Most Teams Choose the Wrong Autoscaler

Most Kubernetes teams reach for HPA first. It's visible, familiar, and the CPU dashboard makes the decision feel obvious. When traffic spikes, pods scale out. Clean mental model.

The problem: HPA solves one specific failure mode — traffic-driven throughput degradation. An under-resourced pod doesn't need more replicas. It needs more CPU. More replicas of a starved pod just gives you more starved pods.

The Core Distinction

VPA vs HPA scaling dimensions — throughput vs stability tradeoff diagram

HPA and VPA are not two ways to do the same thing. They scale different dimensions:

HPA — Horizontal Pod Autoscaler
Scales replica count. Trigger: load (CPU, memory, custom metrics).
Solves: traffic-driven saturation. Risk: cold start amplification,
latency spikes during scale-out.

VPA — Vertical Pod Autoscaler
Scales resource requests and limits. Trigger: resource efficiency gap.
Solves: OOM kills, CPU throttling, mis-sized pods. Risk: eviction disruption, node fragmentation at scale.

HPA doesn't prevent OOM kills.
VPA doesn't absorb traffic bursts.

Applying the wrong one means you're solving for a failure that isn't happening while leaving the actual failure mode unaddressed.

The Trap Nobody Documents

Running both without coordination creates oscillation:

  1. VPA recommends larger CPU request → evicts pod to apply it
  2. HPA sees replica count drop → interprets as scale-in signal
  3. HPA removes a replica
  4. VPA recalculates on a smaller pool
  5. Cycle repeats

The result is instability driven entirely by the autoscalers fighting each other — not by any real workload condition. Nodes fragment. Scheduler pressure builds.

The coordination rule: VPA must not operate in Auto mode on any resource dimension HPA is also watching. In practice — VPA handles memory right-sizing, HPA handles CPU-driven replica scaling. Different axes, no interaction.

The Decision Framework

Use HPA when:

  • Stateless workloads with interchangeable replicas
  • Traffic-driven, burst-shaped load patterns
  • CPU is a reliable proxy for demand
  • Individual pod sizing is already correct

Use VPA when:

  • Steady, predictable load patterns
  • Pods are consistently OOM-killed or CPU-throttled
  • Resource requests were set by guesswork
  • Right-sizing over time matters more than burst absorption

Use both — with constraints:

  • VPA in Recommendation or Initial mode only (not Auto)
  • VPA establishes correct baseline sizing
  • HPA handles burst scaling above that baseline
  • Never let their trigger dimensions overlap

VPA and HPA combined mode architecture showing feedback loop risk

Scaling Decisions Are Cost Decisions

HPA adds pods — more replicas means more node capacity and more compute
spend. Aggressive scale-in thresholds mean you're often paying for idle
capacity during transition periods.

VPA's value is bin-packing efficiency — right-sized pods fit more
workloads on fewer nodes. But a stale VPA recommendation window produces
oversized requests that waste capacity cluster-wide.

The autoscaler is the last decision. Diagnose the failure mode first. Then pick the tool.


Full post with decision framework, failure mode breakdown, and
coordination rules: https://www.rack2cloud.com/vpa-vs-hpa-kubernetes/

Top comments (0)