DEV Community

Justin Joseph
Justin Joseph

Posted on • Originally published at clockhash.com

Kubernetes autoscaling: HPA vs VPA vs KEDA — which one when?

Kubernetes Autoscaling: HPA vs VPA vs KEDA — Which One When?

Your pod is spiking to 8GB memory. HPA won't help. Neither will VPA alone. You need a decision framework, not three tools fighting each other.

HPA: Reactive scaling by metrics

Horizontal Pod Autoscaler watches CPU/memory and adds more replicas. It's the default, it works, and it's best for stateless workloads with unpredictable traffic.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 70
Enter fullscreen mode Exit fullscreen mode

When to use: Web APIs, microservices, event processors scaling with user load.

Gotcha: Doesn't solve resource inefficiency. If your pod only needs 256Mi but requests 512Mi, HPA scales at the wrong threshold.

VPA: Right-sizing without replica chaos

Vertical Pod Autoscaler recommends CPU/memory requests, then evicts and restarts pods with corrected specs. It's about efficiency and cost, not handling spikes.

When to use: Batch jobs, ML inference, long-running services where you've been guessing at requests.

Gotcha: Causes brief downtime on updates. Pair with PodDisruptionBudgets. Not for latency-sensitive workloads.

KEDA: Events and custom metrics at scale

Kubernetes Event Autoscaling for Applications scales on literally anything—Kafka lag, AWS SQS depth, HTTP requests per second, Prometheus custom metrics.

When to use: Event-driven architecture. Queue-backed workers. Application-specific scaling logic. "Scale to zero" scenarios.

Gotcha: Most powerful, most complex. Requires custom metric providers and operators.

The decision matrix

Scenario Tool
REST API with traffic spikes HPA
Over-resourced batch jobs VPA
Kafka consumer backlog KEDA
Bursty ML inference HPA + VPA together
Scheduled workloads KEDA + cron

The real answer

Use HPA as your baseline. Add VPA in dev to find right-sizing. Deploy KEDA when HPA metrics don't match your business—like scaling workers off queue depth instead of CPU.

HashInfra handles the orchestration layer for you. Instead of juggling three autoscalers across environments, our managed platform auto-configures scaling policies based on workload patterns and cost targets. Worth exploring if you're scaling beyond one or two clusters: hashinfra.com.


TL;DR

  • HPA = more replicas when load spikes (stateless services)
  • VPA = right-size requests to cut waste (batch/background jobs)
  • KEDA = scale on events/custom metrics (queue workers, event-driven apps)

Originally published on the ClockHash Engineering Blog.


ClockHash Technologies — DevOps · AI · Cloud · Built for Engineers

Products:
HashInfra · HashSecured · HashNodes · AlphaInterface

Free Tools:
AutoCI/CD · CloudAsh · DockHash

Services:
DevOps Consulting · AI/ML Development · App Development · Remote Tech Teams

Top comments (0)