How Systems Protect Themselves from Too Many Requests
Modern systems don’t fail because they are badly written.
They fail because they receive more requests than they can safely handle.
Whether it’s an API, a backend service, or a public-facing platform, uncontrolled traffic can lead to:
- High latency
- Resource exhaustion
- Cascading failures
- Complete downtime
This is why rate limiting exists.
At a high level, rate limiting answers one simple question:
How many requests should we allow, and when should we say “no”?
In this article, we’ll take a simple and intuitive approach to understand the most common rate limiting strategies used in real systems. For each strategy, we’ll look at:
- How it works conceptually
- Its strengths and weaknesses
- When it should be used
- A small JavaScript example to see it in action
Rate Limiting 101
What Is Rate Limiting, Really?
Rate limiting is not about blocking users.
It is about protecting systems.
Instead of letting traffic grow uncontrollably, rate limiting enforces rules like:
- “Only 100 requests per minute”
- “Only 10 concurrent requests at a time”
- “Allow short bursts, but limit long-term usage”
Different systems need different rules, which is why multiple strategies exist.
1. Fixed Window Rate Limiting
How It Works
You allow N requests per fixed time window.
Example:
- Window size: 1 minute
- Limit: 100 requests
12:00 – 12:01 → max 100 requests
12:01 – 12:02 → counter resets
The counter resets at the start of each new window.
JavaScript Example
class FixedWindowLimiter {
constructor(limit, windowMs) {
this.limit = limit; // Max requests per window
this.windowMs = windowMs; // Window size in milliseconds
this.count = 0; // Current request count
this.windowStart = Date.now();
}
allowRequest() {
const now = Date.now();
// Reset window if time has passed
if (now - this.windowStart >= this.windowMs) {
this.windowStart = now;
this.count = 0;
}
// Allow request if under limit
if (this.count < this.limit) {
this.count++;
return true;
}
// Otherwise reject request
return false;
}
}
Pros
- Very easy to implement
- Low memory usage
Cons
- Burst problem at window edges
Example:
- 99 requests at 12:00:59
- 99 requests at 12:01:01
System gets hit with 198 requests almost instantly.
Good For
- Internal tools
- Simple APIs
- Low-risk systems
2. Sliding Window (Log-Based)
How It Works
Instead of fixed windows, we track timestamps of every request.
A request is allowed only if the number of requests in the last X seconds is below the limit.
JavaScript Example
class SlidingWindowLogLimiter {
constructor(limit, windowMs) {
this.limit = limit; // Max requests allowed
this.windowMs = windowMs; // Time window in milliseconds
this.requests = \[\]; // Store timestamps of requests
}
allowRequest() {
const now = Date.now();
// Remove expired timestamps
this.requests = this.requests.filter(
timestamp => now - timestamp < this.windowMs
);
// Check if under the limit
if (this.requests.length < this.limit) {
this.requests.push(now);
return true;
}
// Otherwise reject
return false;
}
}
Pros
- Very accurate
- No burst edge problem
Cons
- High memory usage
- Slower for high traffic
Good For
- Low-traffic but precision-critical systems
3. Sliding Window (Counter-Based)
How It Works
This approach blends two adjacent fixed windows.
Instead of tracking every request, we:
- Count requests in the current window
- Use a weighted average from the previous window
This smooths out burst behavior.
JavaScript Example
class SlidingWindowCounterLimiter {
constructor(limit, windowMs) {
this.limit = limit; // Max requests allowed
this.windowMs = windowMs; // Window size in milliseconds
this.current = 0; // Requests in current window
this.previous = 0; // Requests in previous window
this.windowStart = Date.now();
}
allowRequest() {
const now = Date.now();
const elapsed = now - this.windowStart;
// If current window expired, shift counters
if (elapsed >= this.windowMs) {
this.previous = this.current;
this.current = 0;
this.windowStart = now;
}
// Calculate weighted estimate
const weight = 1 - (elapsed / this.windowMs);
const estimatedCount =
this.previous \* weight + this.current;
// Allow request if under limit
if (estimatedCount < this.limit) {
this.current++;
return true;
}
return false;
}
}
Pros
- Smoother than fixed window
- Cheaper than log-based sliding window
Cons
- Slightly approximate
Good For
- APIs needing smooth limits without heavy overhead
4. Token Bucket ⭐ (Most Popular)
How It Works
- Tokens are added to a bucket at a fixed rate
- Each request consumes one token
- Bucket has a maximum capacity (burst allowance)
If no tokens are available, requests are rejected.
JavaScript Example
class TokenBucket {
constructor(rate, capacity) {
this.rate = rate; // Tokens added per second
this.capacity = capacity; // Maximum bucket size
this.tokens = capacity; // Current tokens available
this.lastRefill = Date.now(); // Last refill timestamp
}
allowRequest() {
const now = Date.now();
// Calculate time elapsed (in seconds)
const elapsed = (now - this.lastRefill) / 1000;
// Refill tokens based on elapsed time
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed \* this.rate
);
this.lastRefill = now;
// Allow request if at least 1 token is available
if (this.tokens >= 1) {
this.tokens -= 1;
return true;
}
return false;
}
}
Pros
- Allows controlled bursts
- Smooth and flexible
- Widely used in industry
Cons
- Slightly more complex
Good For
- Public APIs
- User-facing services
5. Leaky Bucket
How It Works
Requests enter a queue and are processed at a constant rate.
Think of water leaking from a bucket at a steady speed.
JavaScript Example
class LeakyBucket {
constructor(rate) {
this.rate = rate; // Requests processed per second
this.queue = \[\]; // Queue of incoming requests
// Process queue at fixed interval
this.interval = setInterval(
() => this.process(),
1000 / this.rate
);
}
allowRequest(request) {
// Add request to queue
this.queue.push(request);
}
process() {
if (this.queue.length > 0) {
const request = this.queue.shift();
// Here you would actually handle the request
// e.g., request.resolve() or call a handler
}
}
stop() {
clearInterval(this.interval);
}
}
Pros
- Very smooth traffic
- Predictable system load
Cons
- Bursts get delayed or dropped
Good For
- Traffic shaping
- Network-level systems
6. Concurrency Limiting
How It Works
Instead of limiting request rate, we limit the number of in-flight requests.
JavaScript Example
class ConcurrencyLimiter {
constructor(maxConcurrent) {
this.maxConcurrent = maxConcurrent; // Maximum allowed concurrent executions
this.current = 0; // Currently running tasks
}
async handleRequest(fn) {
if (this.current >= this.maxConcurrent) {
throw new Error("Too many concurrent requests");
}
this.current++;
try {
return await fn();
} finally {
this.current--;
}
}
}
Pros
- Protects backend resources
- Great for slow endpoints
Cons
- Doesn’t limit request rate directly
Good For
- Databases
- ML inference
- Long-running requests
7. Distributed Rate Limiting
How It Works
Rate limit data is stored in a shared system like Redis.
All servers check and update the same counters.
Pros
- Works across multiple instances
- Horizontally scalable
Cons
- Extra latency
- Requires atomic operations
Good For
- Microservices
- Cloud-native systems
8. Adaptive / Dynamic Rate Limiting
How It Works
Limits change dynamically based on:
- CPU usage
- Latency
- Error rate
The system tightens or relaxes limits automatically.
Pros
- Very resilient
- Maximizes throughput safely
Cons
- Hard to tune
- Complex to reason about
Good For
- Large-scale platforms
- High-traffic systems
Final Thought
Rate limiting is not about blocking users — it is about maintaining system stability.
There is no “best” rate limiting strategy.
Only the right strategy for the right problem.
Understanding these trade-offs is what separates simple APIs from truly reliable systems.

Top comments (0)