Post 4/10 — Smart Scaling & Cost Control: HPA/KEDA + Cluster Autoscaler vs Karpenter

#kubernetes #devops #techtalks #automation

All businesses need a means to minimize cash leaks in its infrastructure. A DevOps engineer with the understand of autoscaling can help with this significantly. While the aspect of writing lightweight applications is still crucial, there still might be cases where the business use case is expecting inconsistent spikes in traffic, example: on a ticketing platform when tickets are live for a popular music tour. This is where we need a reliable system that can manage this sudden spike gracefully. “Scaling is easy — until it isn’t. Then it’s your cloud bill that reminds you.”

1️⃣ Executive Summary

This post is about smart elasticity — how Kubernetes scales workloads and clusters while keeping cost in check.
I’ll walk through the difference between workload-level autoscaling (HPA/KEDA) and cluster-level autoscaling (CA vs Karpenter), with a hands-on demo and a practical comparison.

By the end, you’ll know:

How requests + limits shape node packing and spend.
When to use HPA vs KEDA.
Why Karpenter often replaces the traditional Cluster Autoscaler.
How to avoid the most common scaling pitfalls — thrash, cold starts, and QoS chaos.

2️⃣ Prereqs

A basic Kubernetes cluster (minikube/kind/EKS/GKE).
kubectl, helm, kubectl top.
Sample deployment (e.g., an Nginx or Go web app).
Optional: KEDA + Metrics Server installed.

3️⃣ Concepts

Requests / Limits & Rightsizing

Each container declares:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 200m
    memory: 256Mi

Requests → scheduler guarantee.
Limits → runtime cap.
Under-request = OOM; over-request = waste.

Decision cue: profile your pods with kubectl top pod and rightsize before enabling autoscaling; otherwise, autoscalers just magnify inefficiency.

HPA (Horizontal Pod Autoscaler)

HPA scales replicas based on metrics — typically CPU or memory:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 70
        type: Utilization

✅ Before: Fixed 3 replicas → wasted CPU at night.
✅ After: HPA 2-10 replicas → 40 % cost reduction, stable latency.

🔁 KEDA ( Kubernetes Event-Driven Autoscaler )

KEDA extends HPA with external triggers (Queue, Kafka, Prometheus, etc.):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-scaledobject
spec:
  scaleTargetRef:
    name: worker
  triggers:
  - type: azure-queue
    metadata:
      queueName: orders
      connection: AzureWebJobsStorage
      queueLength: '5'

Decision cue: use KEDA when metrics are outside Kubernetes — SQS depth, Kafka lag, HTTP requests/sec, etc.

Cluster Autoscaler (CA) Overview

CA watches unschedulable pods and adds/removes nodes via your cloud provider’s ASG or NodePool.
It’s conservative — scales in minutes, not seconds — and tied to fixed instance groups.

Decision cue: CA is fine for homogeneous workloads with predictable demand.

Karpenter

Karpenter is the next-gen autoscaler for AWS (and now CNCF incubating):

Launches any instance type on demand, not limited by node groups.
Supports Spot + On-Demand mix.
Consolidates underutilized nodes automatically.

Example Provisioner:

apiVersion: karpenter.sh/v1beta1
kind: Provisioner
metadata:
  name: default
spec:
  limits:
    resources:
      cpu: 1000
  requirements:
  - key: "node.kubernetes.io/instance-type"
    operator: In
    values: ["m6i.large", "m6a.large"]
  consolidation:
    enabled: true

Decision cue: choose Karpenter when you need rapid scale-out (seconds), mixed instance types, or Spot savings.

4️⃣ Mini-Lab — HPA + CA vs Karpenter

Deploy a sample app

   kubectl create deploy web --image=nginx --replicas=2
   kubectl expose deploy web --port=80

Apply HPA

   kubectl autoscale deploy web --cpu-percent=50 --min=2 --max=10

Generate load

   kubectl run load --image=busybox -- /bin/sh -c \
     "while true; do wget -q -O- http://web; done"

Observe scale-up

   kubectl get hpa
   kubectl get pods -w

Compare cluster-level response

With Cluster Autoscaler, new nodes join slowly, via ASG.
With Karpenter, new instance spins up in < 60 s.

If you can’t run Karpenter (non-EKS), compare plan latency and node variety conceptually.

5️⃣ Cheatsheet

Task	Command
View metrics	`kubectl top pods`
Deploy HPA	`kubectl autoscale deploy <app> --cpu-percent=X --min=A --max=B`
Install KEDA	`helm install keda kedacore/keda`
Inspect ScaledObjects	`kubectl get scaledobject`
Check node scale	`kubectl get nodes -w`
View CA logs	`kubectl -n kube-system logs deploy/cluster-autoscaler`
Describe Karpenter provisioners	`kubectl get provisioner`

6️⃣ Pitfalls

Thrash: aggressive min/max or low cooldown = scale in/out loops.
Cold starts: node boot latency + image pull → avoid per-request scaling.
QoS classes: BestEffort pods may be evicted first; use Guaranteed for critical services.
Over-requests: rightsizing before autoscaling saves more than any tuning flag.

7️⃣ Wrap-Up

Kubernetes scaling is now less about can it scale and more about how smartly it scales.
HPA/KEDA handle micro elasticity.
Karpenter redefines macro elasticity and cloud efficiency.

In the next post — Modern Traffic Shaping (Post 5) — we’ll explore how to handle that scale with smart routing: Ingress, Service mesh, and load-balancing patterns.