Kubernetes Autoscaling: HPA vs VPA vs KEDA — Which One When?
Your pod is spiking to 8GB memory. HPA won't help. Neither will VPA alone. You need a decision framework, not three tools fighting each other.
HPA: Reactive scaling by metrics
Horizontal Pod Autoscaler watches CPU/memory and adds more replicas. It's the default, it works, and it's best for stateless workloads with unpredictable traffic.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
averageUtilization: 70
When to use: Web APIs, microservices, event processors scaling with user load.
Gotcha: Doesn't solve resource inefficiency. If your pod only needs 256Mi but requests 512Mi, HPA scales at the wrong threshold.
VPA: Right-sizing without replica chaos
Vertical Pod Autoscaler recommends CPU/memory requests, then evicts and restarts pods with corrected specs. It's about efficiency and cost, not handling spikes.
When to use: Batch jobs, ML inference, long-running services where you've been guessing at requests.
Gotcha: Causes brief downtime on updates. Pair with PodDisruptionBudgets. Not for latency-sensitive workloads.
KEDA: Events and custom metrics at scale
Kubernetes Event Autoscaling for Applications scales on literally anything—Kafka lag, AWS SQS depth, HTTP requests per second, Prometheus custom metrics.
When to use: Event-driven architecture. Queue-backed workers. Application-specific scaling logic. "Scale to zero" scenarios.
Gotcha: Most powerful, most complex. Requires custom metric providers and operators.
The decision matrix
| Scenario | Tool |
|---|---|
| REST API with traffic spikes | HPA |
| Over-resourced batch jobs | VPA |
| Kafka consumer backlog | KEDA |
| Bursty ML inference | HPA + VPA together |
| Scheduled workloads | KEDA + cron |
The real answer
Use HPA as your baseline. Add VPA in dev to find right-sizing. Deploy KEDA when HPA metrics don't match your business—like scaling workers off queue depth instead of CPU.
HashInfra handles the orchestration layer for you. Instead of juggling three autoscalers across environments, our managed platform auto-configures scaling policies based on workload patterns and cost targets. Worth exploring if you're scaling beyond one or two clusters: hashinfra.com.
TL;DR
- HPA = more replicas when load spikes (stateless services)
- VPA = right-size requests to cut waste (batch/background jobs)
- KEDA = scale on events/custom metrics (queue workers, event-driven apps)
Originally published on the ClockHash Engineering Blog.
ClockHash Technologies — DevOps · AI · Cloud · Built for Engineers
Products:
HashInfra · HashSecured · HashNodes · AlphaInterface
Free Tools:
AutoCI/CD · CloudAsh · DockHash
Services:
DevOps Consulting · AI/ML Development · App Development · Remote Tech Teams
Top comments (0)