Rate Limiting: Picking the Right Algorithm for Your Scale

#systemdesign #architecture #scalability #backend

1. Rate Limiting Isn’t Optional

Scaling without rate limiting is like leaving your front door open during a zombie apocalypse. Yeah, you could do it, but don’t be surprised when chaos spills everywhere.
- Without rate limiting, one overly enthusiastic or malicious user can ruin the party for everyone else.
Real World: Twitter rate limits API requests to stop bots from flooding their servers every millisecond.

2. Fixed Window: The Training Wheels

Think of fixed window as the beginner's bike. Easy to set up, but your knees will scrape when things get messy.
- Process requests in fixed time slots (e.g., 60 requests per minute). Simple but prone to “edge attacks.”
- If a user sends 60 requests at the 59th second, they can send another 60 in the next second, spamming your system.
Real World: Small-time hobby apps or PoCs can survive on this—you’re not Netflix. Yet.

3. Sliding Window: When You Want Smooth, Not Chunky

A smoother operator. Instead of hard slots, it uses a rolling time window to calculate limits.
- Feels “fair.” Rate-checks requests based on the last N seconds rather than fixed intervals.
- Slightly complex to implement compared to fixed windows—but let’s face it, you'll need this sooner than later.
Real World: Rolling counters work wonderfully for systems where user experience matters more than URGENT fairness—like social media or real-time dashboards.

4. Token Bucket: Be Generous, But Set Limits

It’s like handing out “you can annoy me later” tokens to your users.
- Users get a bucket filled with tokens they can use for requests. Once they’re out of tokens, they chill until the bucket refills (at a set rate).
- Great for bursty traffic because you define how many tokens they can burn through before the brakes slam down.
Real World: Payment gateways love token buckets because they mitigate spikes in transaction requests.

5. Leaky Bucket: Drip, Don’t Flood

Imagine a bucket with a tiny hole. Requests are constantly “dripping” out at a fixed rate, no matter how fervently users try to fill the bucket.
- It completely absorbs bursty traffic, but it can bottleneck even legitimate high-speed requests.
- Less fairness: If I’m slow and you’re fast, I might get to sip water while you drown out all your thirsty neighbors.
Real World: Web servers often use leaky buckets to avoid backend meltdowns during traffic tsunamis.

6. Distributed Rate Limiting: The Big Guns

When one server can’t hold the line, enter distributed systems. But fair warning: it’s as complex as it sounds.
- Think of it as fencing off the playground at planetary scale with consistent hashing, shared state, etc.
- Easy to screw up, so make sure you’ve got observability in place—or enjoy debugging distributed counters at midnight.
Real World: Global API platforms like Stripe or AWS implement distributed rate limiting for obvious reasons—you try managing millions of users.

7. Which One Should You Use? Be Pragmatic

Choose fixed windows first, then upgrade. No shame in crawling before you run.
- When in doubt? Sliding windows are the most balanced for general use cases.
- Building Netflix-scale services? Start with token/leaky buckets + distributed systems, and don’t forget protection against abuse.
Real World: If your app is still running on a $10 VPS, maybe just solve the scale problem after you’ve hit scale.

Final Takeaway: Build with the pessimism of someone who’s been paged at 3 AM.

Cheers🥂

DEV Community