Rate limiting is one of those things you don't think about until your API gets hammered at 3 AM.
Whether it's a DDoS attack, a buggy client sending 10,000 requests per second, or a partner integration gone wrong — rate limiting is your first line of defense.
Why Rate Limiting Matters
Without rate limiting:
- A single client can starve others of resources
- Your database gets overwhelmed during traffic spikes
- You can't enforce fair usage across API consumers
- Cost of serving requests spirals out of control
The 4 Core Algorithms
1. Token Bucket
The most widely used algorithm. Think of it as a bucket that fills with tokens at a steady rate. Each request removes a token. No tokens? Request denied.
Why it works: Allows short bursts while enforcing long-term rate. Used by AWS, Stripe, and most major APIs.
2. Leaky Bucket
Requests enter a queue (the bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped.
Why it works: Provides a perfectly smooth output rate. Great for downstream services that can't handle bursts.
3. Fixed Window Counter
Divide time into fixed windows (e.g., 1 minute). Count requests per window. Reset at the boundary.
Why it works: Dead simple to implement. But has the "boundary problem" — 2x the allowed rate can hit at window edges.
4. Sliding Window Log/Counter
Tracks the exact timestamp of each request (log) or uses a weighted combination of current and previous windows (counter).
Why it works: Most accurate. No boundary problem. But uses more memory.
Quick Comparison
| Algorithm | Burst Handling | Memory | Accuracy |
|---|---|---|---|
| Token Bucket | ✅ Allows bursts | Low | Good |
| Leaky Bucket | ❌ Smooths all | Low | Good |
| Fixed Window | ⚠️ Boundary issue | Very Low | Fair |
| Sliding Window | ✅ Accurate | Higher | Best |
Where to Rate Limit
- API Gateway — First line of defense (per-client limits)
- Application layer — Business logic limits (e.g., 5 password attempts)
- Database layer — Connection pooling and query limits
What Happens When Limits Are Hit?
Return 429 Too Many Requests with these headers:
-
X-RateLimit-Limit— Max requests allowed -
X-RateLimit-Remaining— Requests left in window -
Retry-After— Seconds until the client can retry
Deep Dive
I wrote a comprehensive guide covering distributed rate limiting, Redis implementations, and real-world patterns used by companies like Stripe and Cloudflare:
👉 Rate Limiting: The Complete Guide
This is part of my system design series at SWE Helper — free tools, guides, and interview prep for software engineers.
Top comments (0)