All businesses need a means to minimize cash leaks in its infrastructure. A DevOps engineer with the understand of autoscaling can help with this significantly. While the aspect of writing lightweight applications is still crucial, there still might be cases where the business use case is expecting inconsistent spikes in traffic, example: on a ticketing platform when tickets are live for a popular music tour. This is where we need a reliable system that can manage this sudden spike gracefully. “Scaling is easy — until it isn’t. Then it’s your cloud bill that reminds you.”
1️⃣ Executive Summary
This post is about smart elasticity — how Kubernetes scales workloads and clusters while keeping cost in check.
I’ll walk through the difference between workload-level autoscaling (HPA/KEDA) and cluster-level autoscaling (CA vs Karpenter), with a hands-on demo and a practical comparison.
By the end, you’ll know:
- How
requests
+limits
shape node packing and spend. - When to use HPA vs KEDA.
- Why Karpenter often replaces the traditional Cluster Autoscaler.
- How to avoid the most common scaling pitfalls — thrash, cold starts, and QoS chaos.
2️⃣ Prereqs
- A basic Kubernetes cluster (minikube/kind/EKS/GKE).
-
kubectl
,helm
,kubectl top
. - Sample deployment (e.g., an Nginx or Go web app).
- Optional: KEDA + Metrics Server installed.
3️⃣ Concepts
Requests / Limits & Rightsizing
Each container declares:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
- Requests → scheduler guarantee.
- Limits → runtime cap.
- Under-request = OOM; over-request = waste.
Decision cue: profile your pods with kubectl top pod
and rightsize before enabling autoscaling; otherwise, autoscalers just magnify inefficiency.
HPA (Horizontal Pod Autoscaler)
HPA scales replicas based on metrics — typically CPU or memory:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
averageUtilization: 70
type: Utilization
✅ Before: Fixed 3 replicas → wasted CPU at night.
✅ After: HPA 2-10 replicas → 40 % cost reduction, stable latency.
🔁 KEDA ( Kubernetes Event-Driven Autoscaler )
KEDA extends HPA with external triggers (Queue, Kafka, Prometheus, etc.):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queue-scaledobject
spec:
scaleTargetRef:
name: worker
triggers:
- type: azure-queue
metadata:
queueName: orders
connection: AzureWebJobsStorage
queueLength: '5'
Decision cue: use KEDA when metrics are outside Kubernetes — SQS depth, Kafka lag, HTTP requests/sec, etc.
Cluster Autoscaler (CA) Overview
CA watches unschedulable pods and adds/removes nodes via your cloud provider’s ASG or NodePool.
It’s conservative — scales in minutes, not seconds — and tied to fixed instance groups.
Decision cue: CA is fine for homogeneous workloads with predictable demand.
Karpenter
Karpenter is the next-gen autoscaler for AWS (and now CNCF incubating):
- Launches any instance type on demand, not limited by node groups.
- Supports Spot + On-Demand mix.
- Consolidates underutilized nodes automatically.
Example Provisioner:
apiVersion: karpenter.sh/v1beta1
kind: Provisioner
metadata:
name: default
spec:
limits:
resources:
cpu: 1000
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["m6i.large", "m6a.large"]
consolidation:
enabled: true
Decision cue: choose Karpenter when you need rapid scale-out (seconds), mixed instance types, or Spot savings.
4️⃣ Mini-Lab — HPA + CA vs Karpenter
- Deploy a sample app
kubectl create deploy web --image=nginx --replicas=2
kubectl expose deploy web --port=80
- Apply HPA
kubectl autoscale deploy web --cpu-percent=50 --min=2 --max=10
- Generate load
kubectl run load --image=busybox -- /bin/sh -c \
"while true; do wget -q -O- http://web; done"
- Observe scale-up
kubectl get hpa
kubectl get pods -w
- Compare cluster-level response
- With Cluster Autoscaler, new nodes join slowly, via ASG.
- With Karpenter, new instance spins up in < 60 s.
If you can’t run Karpenter (non-EKS), compare plan latency and node variety conceptually.
5️⃣ Cheatsheet
Task | Command |
---|---|
View metrics | kubectl top pods |
Deploy HPA | kubectl autoscale deploy <app> --cpu-percent=X --min=A --max=B |
Install KEDA | helm install keda kedacore/keda |
Inspect ScaledObjects | kubectl get scaledobject |
Check node scale | kubectl get nodes -w |
View CA logs | kubectl -n kube-system logs deploy/cluster-autoscaler |
Describe Karpenter provisioners | kubectl get provisioner |
6️⃣ Pitfalls
- Thrash: aggressive min/max or low cooldown = scale in/out loops.
- Cold starts: node boot latency + image pull → avoid per-request scaling.
- QoS classes: BestEffort pods may be evicted first; use Guaranteed for critical services.
- Over-requests: rightsizing before autoscaling saves more than any tuning flag.
7️⃣ Wrap-Up
Kubernetes scaling is now less about can it scale and more about how smartly it scales.
HPA/KEDA handle micro elasticity.
Karpenter redefines macro elasticity and cloud efficiency.
In the next post — Modern Traffic Shaping (Post 5) — we’ll explore how to handle that scale with smart routing: Ingress, Service mesh, and load-balancing patterns.
Top comments (0)