DEV Community

Cover image for Day 17/40 — Kubernetes Autoscaling: HPA vs VPA Explained With Hands-On Practice
Adeoye Malumi
Adeoye Malumi

Posted on

Day 17/40 — Kubernetes Autoscaling: HPA vs VPA Explained With Hands-On Practice

If you've ever wondered how Kubernetes knows when to spin up more pods or give a pod more memory, that's autoscaling — and it's one of those concepts that sounds intimidating until you actually do it yourself. Day 17 of the #40DaysOfKubernetes challenge is where it clicked for me.

What is Autoscaling in Kubernetes?

At its core, autoscaling means Kubernetes adjusts resources automatically based on demand. You don't manually intervene every time traffic spikes. There are two main types:

  • HPA (Horizontal Pod Autoscaler) — adds or removes pods based on CPU/memory usage
  • VPA (Vertical Pod Autoscaler) — adjusts the resources (CPU/memory) of existing pods

Think of HPA as hiring more staff when the shop gets busy. VPA is more like giving one staff member more tools to handle the workload alone.

What I Did — Setting Up HPA

First I deployed the sample php-apache app with defined CPU requests and limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m

apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache
Enter fullscreen mode Exit fullscreen mode

Applying the yaml file

The pod is up and running

The key part is setting resources.requests.cpu — HPA needs this to calculate utilization. Without it, the autoscaler has nothing to measure against.

Then I created the HPA object targeting 50% average CPU utilization, with a minimum of 1 pod and maximum of 10. This is the declarative method:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
Enter fullscreen mode Exit fullscreen mode

While this is the imperative method:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
Enter fullscreen mode Exit fullscreen mode

autoscale applied

Autoscale complete

Generating Load to Watch It Scale

This is the fun part. I ran a load generator in a separate pod — basically a loop hammering the apache service with requests:

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- \
  /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Enter fullscreen mode Exit fullscreen mode

Then watched the HPA respond in real time:

kubectl get hpa php-apache --watch
Enter fullscreen mode Exit fullscreen mode

Watching the replica count climb from 1 to several pods as CPU utilization crossed 50% made the whole concept land in a way that reading documentation never does.

HPA vs VPA — When Do You Use Which?

HPA VPA
Scales Number of pods Pod resource limits
Best for Stateless apps with variable traffic Apps where sizing is hard to predict upfront
Works with CPU, memory, custom metrics CPU and memory

In practice, most production workloads use HPA. VPA is useful during early deployment when you're still figuring out the right resource requests for an app.

Key Takeaway

Don't skip setting resources.requests in your deployment spec. HPA is blind without it. That one line is what connects your workload to the autoscaler.

Top comments (0)