HPA vs VPA vs KEDA: when to use which (decision tree)

#kubernetes #devops #tutorial #cloud

Quick Answer (TL;DR)

HPA (Horizontal Pod Autoscaler) adds or removes pod replicas based on CPU or memory load. VPA (Vertical Pod Autoscaler) resizes an existing pod's CPU and memory requests to fit actual usage. KEDA (Kubernetes Event-Driven Autoscaling) scales replicas based on external event sources like queue depth, database load, or cron schedules. Use HPA for stateless load response, VPA for right-sizing steady-state workloads, and KEDA for anything that scales on a non-CPU signal. They can be combined, with one important rule about not stacking HPA and VPA on the same metric.

Why they exist as separate tools

Kubernetes' original autoscaling story was horizontal only: more pods when CPU is high. That covers the web-tier case where each request is roughly the same cost. It fails on two other common shapes.

Right-sizing is a vertical problem. A pod with requests of 500m CPU / 512Mi when it actually uses 50m CPU / 200Mi is wasting cluster capacity, not needing more replicas. VPA fixes this by adjusting the requests.
Event-driven scaling cannot use CPU because the trigger is external. A worker consuming from an SQS queue should scale on queue depth, not on the CPU of an idle worker waiting for a message. KEDA hooks up 60+ external metric sources natively.

The three tools together cover the shapes CPU-based HPA cannot. The decision tree below picks which one fits.

Fix #1: Use HPA when load is proportional to replicas

Best fit: stateless web tier, API servers, gRPC services, any workload where each replica handles independent traffic and CPU or memory scales linearly with load.

Setup:

Define resource requests on the pod (HPA needs a baseline).
Create an HPA manifest targeting the Deployment, with a metric like Utilization: 70.
Test with load: kubectl run -it --rm load-generator --image=busybox -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://myservice; done"

Trap to avoid: HPA needs metrics-server running, and it takes 30 to 90 seconds to react. Do not set the CPU target too high (85%+) or you will be behind the curve on real traffic spikes.

Fix #2: Use VPA when replicas are stable but requests are wrong

Best fit: internal batch workers, databases running as pods, ML inference pods with predictable resource needs, any single-replica or fixed-replica workload where the question is "how big should this pod be" not "how many should there be."

Setup:

Install VPA (three components: recommender, updater, admission controller).
Create a VPA object with updateMode: Auto (rewrites requests) or Off (recommend only, apply manually).
Let it observe for at least 24 hours before applying recommendations.

Trap to avoid: updateMode: Auto restarts pods when it changes requests. For long-running or stateful workloads, use Off mode and apply recommendations manually during a maintenance window.

Fix #3: Use KEDA when the trigger is external

Best fit: queue workers (SQS, RabbitMQ, Kafka), cron-based batch jobs, workloads that scale on database queue depth, custom metrics from Prometheus, or scale-to-zero patterns.

Setup:

Install KEDA via Helm: helm install keda kedacore/keda --namespace keda-system.
Create a ScaledObject that references your Deployment and points at the external metric source.
KEDA creates and manages the underlying HPA for you.

Trap to avoid: KEDA scales to zero by default when the source is idle. Cold-start time for the first pod on a new event matters. If cold-start is above 30 seconds, keep a minReplicaCount: 1 floor.

How to prevent conflicts between the three

The single most common mistake is running HPA and VPA on the same metric. If HPA scales replicas by CPU and VPA changes CPU requests, they fight each other and the workload thrashes.

Safe combinations:

HPA + VPA on different metrics. HPA on CPU, VPA on memory only (or vice versa) works. Configure VPA to only manage memory with resourcePolicy.containerPolicies[].controlledResources: ["memory"].
HPA + KEDA on the same Deployment. KEDA under the hood is an HPA, so you effectively run one HPA with multiple metric sources. This is a fully supported pattern.
VPA in recommend-only mode. updateMode: Off combined with HPA is safe because VPA just publishes numbers, and you apply them manually.

Never combine:

HPA + VPA on the same resource (both on CPU or both on memory). This is the anti-pattern that thrashes workloads.

FAQ

Which one should I use if I only pick one?
HPA. It covers the majority of stateless workloads and is the lowest-effort. Add VPA once you have visible waste in pod requests, and KEDA when a real event-driven workload arrives.

Does KEDA replace HPA?
No. KEDA wraps HPA and adds event sources. Under the hood there is still an HPA managing the pod count.

Can VPA and HPA both target CPU safely?
Only if VPA is set to recommend-only mode (updateMode: Off). Otherwise they conflict.

What about the in-place resize KEP (Kubernetes 1.27+)?
In-place pod resize lets VPA change requests without restarting the pod. It is beta in 2026 and stable enough for non-critical workloads. Enable the feature gate and set resizePolicy on the container.

How does this interact with Karpenter?
HPA and KEDA add pod replicas; Karpenter provisions nodes for the new replicas. They work together cleanly. VPA changes pod requests; Karpenter sees the new requests and adjusts node bin-packing accordingly.

Related guides

The Kubernetes autoscaling docs cover the HPA API in detail.
KEDA's own docs at keda.sh have the current list of 60+ scaler types.
The VPA project lives at github.com/kubernetes/autoscaler with the recommender component's algorithm explained.