Hrishikesh Dalal

Posted on Dec 29, 2025

EP 5: Autoscaling - Building Systems that Breathe

#webdev #kubernetes #aws

The Nightmare: The 3 AM Crash

Imagine your app gets featured by a major tech influencer at 3 AM. Traffic spikes from 100 users to 100,000 in minutes. If you are running on a fixed set of virtual machines, your CPU usage hits 100%, requests start queuing, and eventually, the database connections saturate. Your app crashes while you’re asleep.

In modern system design, you don't just need a horizontal system; you need one that scales itself.

Scaling 101: Vertical vs. Horizontal

Before we automate, we must understand the direction of growth:

Vertical Scaling (Scaling Up): Adding more power (CPU, RAM) to an existing machine. It’s easy but has a "ceiling" and involves downtime.
Horizontal Scaling (Scaling Out): Adding more machines to the pool. This is the foundation of cloud-native design.

Autoscaling is the automation of Horizontal Scaling.

How it Works: The "Control Loop"

Autoscaling isn't magic; it’s a feedback loop, much like a thermostat in a house.

Metric Collection: The system monitors "Signals." While CPU and RAM are standard, sophisticated systems use Custom Metrics like "Requests Per Second" or "Messages in SQS Queue."
The Aggregator: The Autoscaler doesn't react to a single 1-second spike. It calculates averages over a "window" (e.g., a 3-minute moving average) to avoid jitter.
The Threshold: You define a target state (e.g., "Maintain average CPU at 60%").
The Action: The cloud provider or Kubernetes (HPA) triggers a deployment. The Load Balancer is notified of the new IP addresses and begins shifting traffic to the new "pods" or instances.

The Three Flavors of Autoscaling

1. Reactive Scaling (The "On-Demand" Approach)

The most common type. It reacts to live events.

Use case: Unexpected viral growth.
Downside: There is a "warm-up" delay. It takes time to pull a Docker image and start a container.

2. Scheduled Scaling (The "Black Friday" Approach)

If you know traffic spikes every Friday at 6 PM, why wait for the CPU to spike? You can tell the system: "At 5:55 PM, spin up 50 extra instances."

Use case: Known events, TV ad spots, or daily peak hours.

3. Predictive Scaling (The "AI" Approach)

Advanced platforms use Machine Learning to analyze historical data. If the system "sees" a pattern that traffic always rises on rainy Tuesdays, it begins scaling up before the traffic arrives.

The Kubernetes Layer: HPA vs. VPA vs. CA

If you’re designing for containers, you need to know the "Three Pillars" of scaling:

Horizontal Pod Autoscaler (HPA): Adds more pods (clones of your app). This is the bread and butter of K8s.
Vertical Pod Autoscaler (VPA): Keeps the pod count the same but gives the existing pods more "juice" (memory/CPU).
Cluster Autoscaler (CA): What happens when your physical nodes are full? CA talks to the cloud provider (AWS/GCP) to physically add a new Virtual Machine to the cluster so more pods can fit.

The "Gotchas": Why Autoscaling Can Fail

Autoscaling is powerful, but if misconfigured, it can be a "Nightmare 2.0."

The "Thrashing" Effect (Flapping): If your scale-in policy is too aggressive, the system might add a server, see the CPU drop, and immediately remove it—only for the CPU to spike again.
Solution: Use Cooldown Periods (e.g., "Wait 5 minutes after a scale-up before scaling down").
The Database Bottleneck: You can scale your Web Layer to 1,000 instances, but if they all hit a single, non-scaling Postgres database, the database will melt. Autoscaling the frontend often requires Connection Pooling or Read Replicas.
The "Death Spiral": If one server fails and traffic shifts to the remaining ones, their CPU spikes, causing them to fail. If your autoscaler isn't fast enough, your entire fleet can collapse like dominoes.

Why this is a "System Design" Win

Cost Efficiency: You only pay for what you use. When traffic drops at 4 AM, the system "scales in," saving you money.
Reliability: It creates a "Self-Healing" system. If an instance becomes unhealthy, the autoscaler terminates it and replaces it.
Operational Freedom: You can handle a "thundering herd" of traffic without a single engineer touching a keyboard.

Takeaway

In the 0-to-1 journey, moving from a static server to an autoscaled environment is the "Coming of Age" moment for an application. A modern system isn't just a static collection of boxes; it’s a living entity that breathes, expanding and contracting with the pulse of your users.

DEV Community