Kubernetes autoscaling: HPA vs VPA vs KEDA — which one when?

#devops #cloudnative #kubernetes #cloud

Kubernetes Autoscaling: HPA vs VPA vs KEDA — Which One When?

Your pod is spiking to 8GB memory. HPA won't help. Neither will VPA alone. You need a decision framework, not three tools fighting each other.

HPA: Reactive scaling by metrics

Horizontal Pod Autoscaler watches CPU/memory and adds more replicas. It's the default, it works, and it's best for stateless workloads with unpredictable traffic.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 70

When to use: Web APIs, microservices, event processors scaling with user load.

Gotcha: Doesn't solve resource inefficiency. If your pod only needs 256Mi but requests 512Mi, HPA scales at the wrong threshold.

VPA: Right-sizing without replica chaos

Vertical Pod Autoscaler recommends CPU/memory requests, then evicts and restarts pods with corrected specs. It's about efficiency and cost, not handling spikes.

When to use: Batch jobs, ML inference, long-running services where you've been guessing at requests.

Gotcha: Causes brief downtime on updates. Pair with PodDisruptionBudgets. Not for latency-sensitive workloads.

KEDA: Events and custom metrics at scale

Kubernetes Event Autoscaling for Applications scales on literally anything—Kafka lag, AWS SQS depth, HTTP requests per second, Prometheus custom metrics.

When to use: Event-driven architecture. Queue-backed workers. Application-specific scaling logic. "Scale to zero" scenarios.

Gotcha: Most powerful, most complex. Requires custom metric providers and operators.

The decision matrix

Scenario	Tool
REST API with traffic spikes	HPA
Over-resourced batch jobs	VPA
Kafka consumer backlog	KEDA
Bursty ML inference	HPA + VPA together
Scheduled workloads	KEDA + cron

The real answer

Use HPA as your baseline. Add VPA in dev to find right-sizing. Deploy KEDA when HPA metrics don't match your business—like scaling workers off queue depth instead of CPU.

HashInfra handles the orchestration layer for you. Instead of juggling three autoscalers across environments, our managed platform auto-configures scaling policies based on workload patterns and cost targets. Worth exploring if you're scaling beyond one or two clusters: hashinfra.com.

TL;DR