Charan Koppuravuri

Posted on Jan 14 • Edited on Jan 15

🚀 "Load Shedding": How to Be the Fire Marshal of Your Infrastructure 🚒🚫

#systemdesign #architecture #backend #softwareengineering

Welcome back! Over the last two days, we’ve built quite the defense system. We’ve stopped a Thundering Herd from trampling our database, and we’ve figured out how to keep Celebrities from hogging all our cache space.

But what happens when the problem isn’t a specific "hot key" or a retry storm? What if the volume of traffic is just... too much?

Imagine a stadium with 50,000 seats. If 100,000 people try to shove through the gates at once, the gates might break, people get hurt, and nobody gets to watch the game. In software, this is a System Collapse.

Today, we’re talking about the final line of defense: Load Shedding.

The "Burning Building" Paradox

When a system is overloaded, it enters a dangerous cycle. Because the CPU is pegged at 100%, every request takes longer to process. Long requests mean more requests are "in-flight," which consumes more memory, which makes things even slower.

Soon, your latency is so high that the user's browser times out. But here’s the kicker: Your server is still working on that request! You are burning expensive CPU cycles to generate a response that the user will never see.

Load Shedding is the art of saying: "I would rather fail 20% of my users instantly so that the other 80% have a perfect experience."

Wait, isn't this just Rate Limiting?

Not exactly. They are cousins, but they have different "personalities":

Rate Limiting: Is about Who. (e.g., "User A has sent too many requests, block them.")

Load Shedding: Is about Me. (e.g., "I am feeling overwhelmed, I need to drop some traffic to survive, regardless of who sent it.")

Rate limiting is a bouncer at a club checking IDs. Load shedding is the fire marshal shutting the doors because the building is literally at capacity.

How to Shed Load Like a Pro

If you just drop requests randomly, you might drop a "Complete Purchase" request while successfully processing a "View About Us Page" request. That’s bad for business. To do it right, you need a Criticality Scale.

1. Prioritize by Request Type

At companies like Stripe or Netflix, requests are often bucketed:

Critical: Charging a credit card, starting a video stream.

Important: Viewing a billing history, searching for a movie.

Background/Best-Effort: Analytics pings, pre-fetching images for the next page.

When the CPU hits 90%, you "shed" the Best-Effort traffic first. If it hits 95%, you drop the Important stuff. You protect the "Critical" bucket with your life.

2. Watch the "In-Flight" Requests

Instead of just looking at CPU (which can be a lagging indicator), look at Concurrent Requests. If you know your server starts to choke when it handles more than 500 requests at once, you set a hard limit. Request #501 gets a fast HTTP 503 (Service Unavailable) immediately.

Why a "fast" 503? Because rejecting a request in 2ms costs almost nothing. Trying to process it and failing after 10 seconds is what kills your server.

3. The "Retry-After" Header

When you shed a request, don't just slam the door. Give the client a hint. Using the Retry-After: 30 header tells the mobile app or browser: "Hey, I'm busy. Don't even try again for 30 seconds." This helps calm the storm instead of encouraging users to hit "Refresh" repeatedly.

Real World Example: The "Limp Mode"

Have you ever noticed that during a massive sale or a site crash, some parts of an app still work?

Amazon might show you the product page but hide the "Recommended for You" section to save database cycles.

Netflix might let you hit "Play" but lower the bitrate or hide the "Trending Now" row.

This is Graceful Degradation — the ultimate form of load shedding. You aren't just dropping requests; you are strategically turning off features to keep the core "Goodput" high.

The Takeaway

In a perfect world, we just "auto-scale" our way out of trouble. But scaling takes minutes, and a traffic spike takes milliseconds.

Load Shedding is your emergency brake. It's not a failure; it's a controlled survival tactic that separates the top tier systems from the ones that go offline for hours.

What’s Next?

We’ve covered how to protect our internal systems. But what happens when the problem is outside our walls? How do we handle External Dependencies that go slow and threaten to pull our whole system down with them?

Join me for the next part of this series tomorrow, where we wrap up the "Resiliency Week" with: Circuit Breakers — Stopping the Poison from Spreading.

📖 The System Design Resiliency Series:

We’ve covered a lot of ground this week! From database stampedes to handling global celebrities, we've explored the core patterns that keep the world's largest platforms online. If you're just joining the 'Resiliency Week' journey, here is the full roadmap:
Part 1: The Thundering Herd: Why Your App Might Crash When It Wakes Up 🐂

Part 2: The Celebrity Problem: How to Handle the Taylor Swifts of Your Database 🎤

Part 3: Load Shedding: How to Be the Fire Marshal of Your Infrastructure 🚒 (You are here)

Part 4: Circuit Breakers: The Safety Switch That Prevents Cascading Failures 🛡️

Let’s Connect! 🤝
If you’re enjoying this series, please follow me here on Dev.to! I’m a Project Technical Lead sharing everything I’ve learned about building systems that don't break.

Question for you: Have you ever had to "shed load" in a production crisis? I’d love to hear your "war stories" or questions in the comments below!👇

DEV Community