Day 17/40 — Kubernetes Autoscaling: HPA vs VPA Explained With Hands-On Practice

#kubernetes #devops #cka

If you've ever wondered how Kubernetes knows when to spin up more pods or give a pod more memory, that's autoscaling — and it's one of those concepts that sounds intimidating until you actually do it yourself. Day 17 of the #40DaysOfKubernetes challenge is where it clicked for me.

What is Autoscaling in Kubernetes?

At its core, autoscaling means Kubernetes adjusts resources automatically based on demand. You don't manually intervene every time traffic spikes. There are two main types:

HPA (Horizontal Pod Autoscaler) — adds or removes pods based on CPU/memory usage
VPA (Vertical Pod Autoscaler) — adjusts the resources (CPU/memory) of existing pods

Think of HPA as hiring more staff when the shop gets busy. VPA is more like giving one staff member more tools to handle the workload alone.

What I Did — Setting Up HPA

First I deployed the sample php-apache app with defined CPU requests and limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m

apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

The key part is setting resources.requests.cpu — HPA needs this to calculate utilization. Without it, the autoscaler has nothing to measure against.

Then I created the HPA object targeting 50% average CPU utilization, with a minimum of 1 pod and maximum of 10. This is the declarative method:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

While this is the imperative method:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

Generating Load to Watch It Scale

This is the fun part. I ran a load generator in a separate pod — basically a loop hammering the apache service with requests:

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- \
  /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Then watched the HPA respond in real time:

kubectl get hpa php-apache --watch

Watching the replica count climb from 1 to several pods as CPU utilization crossed 50% made the whole concept land in a way that reading documentation never does.

HPA vs VPA — When Do You Use Which?

	HPA	VPA
Scales	Number of pods	Pod resource limits
Best for	Stateless apps with variable traffic	Apps where sizing is hard to predict upfront
Works with	CPU, memory, custom metrics	CPU and memory

In practice, most production workloads use HPA. VPA is useful during early deployment when you're still figuring out the right resource requests for an app.

Key Takeaway

Don't skip setting resources.requests in your deployment spec. HPA is blind without it. That one line is what connects your workload to the autoscaler.