DEV Community

Akshat Jain
Akshat Jain

Posted on • Originally published at akshatjme.Medium

Rate Limiting 101: How to Protect Your APIs at Scale

How Systems Protect Themselves from Too Many Requests

Modern systems don’t fail because they are badly written.

They fail because they receive more requests than they can safely handle.

Whether it’s an API, a backend service, or a public-facing platform, uncontrolled traffic can lead to:

  • High latency
  • Resource exhaustion
  • Cascading failures
  • Complete downtime

This is why rate limiting exists.

At a high level, rate limiting answers one simple question:

How many requests should we allow, and when should we say “no”?

In this article, we’ll take a simple and intuitive approach to understand the most common rate limiting strategies used in real systems. For each strategy, we’ll look at:

  • How it works conceptually
  • Its strengths and weaknesses
  • When it should be used
  • A small JavaScript example to see it in action

Rate Limiting 101

What Is Rate Limiting, Really?

Rate limiting is not about blocking users.

It is about protecting systems.

Instead of letting traffic grow uncontrollably, rate limiting enforces rules like:

  • “Only 100 requests per minute”
  • “Only 10 concurrent requests at a time”
  • “Allow short bursts, but limit long-term usage”

Different systems need different rules, which is why multiple strategies exist.

1. Fixed Window Rate Limiting

How It Works

You allow N requests per fixed time window.

Example:

  • Window size: 1 minute
  • Limit: 100 requests
12:00 – 12:01 → max 100 requests  
12:01 – 12:02 → counter resets
Enter fullscreen mode Exit fullscreen mode

The counter resets at the start of each new window.

JavaScript Example

class FixedWindowLimiter {  
  constructor(limit, windowMs) {  
    this.limit = limit;          // Max requests per window  
    this.windowMs = windowMs;    // Window size in milliseconds  
    this.count = 0;              // Current request count  
    this.windowStart = Date.now();  
  }  

  allowRequest() {  
    const now = Date.now();  

    // Reset window if time has passed  
    if (now - this.windowStart >= this.windowMs) {  
      this.windowStart = now;  
      this.count = 0;  
    }  

    // Allow request if under limit  
    if (this.count < this.limit) {  
      this.count++;  
      return true;  
    }  

    // Otherwise reject request  
    return false;  
  }  
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • Very easy to implement
  • Low memory usage

Cons

  • Burst problem at window edges

Example:

  • 99 requests at 12:00:59
  • 99 requests at 12:01:01

System gets hit with 198 requests almost instantly.

Good For

  • Internal tools
  • Simple APIs
  • Low-risk systems

2. Sliding Window (Log-Based)

How It Works

Instead of fixed windows, we track timestamps of every request.

A request is allowed only if the number of requests in the last X seconds is below the limit.

JavaScript Example

class SlidingWindowLogLimiter {  
  constructor(limit, windowMs) {  
    this.limit = limit;        // Max requests allowed  
    this.windowMs = windowMs;  // Time window in milliseconds  
    this.requests = \[\];        // Store timestamps of requests  
  }  

  allowRequest() {  
    const now = Date.now();  

    // Remove expired timestamps  
    this.requests = this.requests.filter(  
      timestamp => now - timestamp < this.windowMs  
    );  

    // Check if under the limit  
    if (this.requests.length < this.limit) {  
      this.requests.push(now);  
      return true;  
    }  

    // Otherwise reject  
    return false;  
  }  
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • Very accurate
  • No burst edge problem

Cons

  • High memory usage
  • Slower for high traffic

Good For

  • Low-traffic but precision-critical systems

3. Sliding Window (Counter-Based)

How It Works

This approach blends two adjacent fixed windows.

Instead of tracking every request, we:

  • Count requests in the current window
  • Use a weighted average from the previous window

This smooths out burst behavior.

JavaScript Example

class SlidingWindowCounterLimiter {  
  constructor(limit, windowMs) {  
    this.limit = limit;         // Max requests allowed  
    this.windowMs = windowMs;   // Window size in milliseconds  
    this.current = 0;           // Requests in current window  
    this.previous = 0;          // Requests in previous window  
    this.windowStart = Date.now();  
  }  

  allowRequest() {  
    const now = Date.now();  
    const elapsed = now - this.windowStart;  

    // If current window expired, shift counters  
    if (elapsed >= this.windowMs) {  
      this.previous = this.current;  
      this.current = 0;  
      this.windowStart = now;  
    }  

    // Calculate weighted estimate  
    const weight = 1 - (elapsed / this.windowMs);  
    const estimatedCount =  
      this.previous \* weight + this.current;  

    // Allow request if under limit  
    if (estimatedCount < this.limit) {  
      this.current++;  
      return true;  
    }  

    return false;  
  }  
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • Smoother than fixed window
  • Cheaper than log-based sliding window

Cons

  • Slightly approximate

Good For

  • APIs needing smooth limits without heavy overhead

4. Token Bucket ⭐ (Most Popular)

How It Works

  • Tokens are added to a bucket at a fixed rate
  • Each request consumes one token
  • Bucket has a maximum capacity (burst allowance)

If no tokens are available, requests are rejected.

JavaScript Example

class TokenBucket {  
  constructor(rate, capacity) {  
    this.rate = rate;             // Tokens added per second  
    this.capacity = capacity;     // Maximum bucket size  
    this.tokens = capacity;       // Current tokens available  
    this.lastRefill = Date.now(); // Last refill timestamp  
  }  

  allowRequest() {  
    const now = Date.now();  

    // Calculate time elapsed (in seconds)  
    const elapsed = (now - this.lastRefill) / 1000;  

    // Refill tokens based on elapsed time  
    this.tokens = Math.min(  
      this.capacity,  
      this.tokens + elapsed \* this.rate  
    );  

    this.lastRefill = now;  

    // Allow request if at least 1 token is available  
    if (this.tokens >= 1) {  
      this.tokens -= 1;  
      return true;  
    }  

    return false;  
  }  
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • Allows controlled bursts
  • Smooth and flexible
  • Widely used in industry

Cons

  • Slightly more complex

Good For

  • Public APIs
  • User-facing services

5. Leaky Bucket

How It Works

Requests enter a queue and are processed at a constant rate.

Think of water leaking from a bucket at a steady speed.

JavaScript Example

class LeakyBucket {  
  constructor(rate) {  
    this.rate = rate;     // Requests processed per second  
    this.queue = \[\];      // Queue of incoming requests  

    // Process queue at fixed interval  
    this.interval = setInterval(  
      () => this.process(),  
      1000 / this.rate  
    );  
  }  

  allowRequest(request) {  
    // Add request to queue  
    this.queue.push(request);  
  }  

  process() {  
    if (this.queue.length > 0) {  
      const request = this.queue.shift();  
      // Here you would actually handle the request  
      // e.g., request.resolve() or call a handler  
    }  
  }  

  stop() {  
    clearInterval(this.interval);  
  }  
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • Very smooth traffic
  • Predictable system load

Cons

  • Bursts get delayed or dropped

Good For

  • Traffic shaping
  • Network-level systems

6. Concurrency Limiting

How It Works

Instead of limiting request rate, we limit the number of in-flight requests.

JavaScript Example

class ConcurrencyLimiter {  
  constructor(maxConcurrent) {  
    this.maxConcurrent = maxConcurrent; // Maximum allowed concurrent executions  
    this.current = 0;                   // Currently running tasks  
  }  

  async handleRequest(fn) {  
    if (this.current >= this.maxConcurrent) {  
      throw new Error("Too many concurrent requests");  
    }  

    this.current++;  

    try {  
      return await fn();  
    } finally {  
      this.current--;  
    }  
  }  
}
Enter fullscreen mode Exit fullscreen mode

Pros

  • Protects backend resources
  • Great for slow endpoints

Cons

  • Doesn’t limit request rate directly

Good For

  • Databases
  • ML inference
  • Long-running requests

7. Distributed Rate Limiting

How It Works

Rate limit data is stored in a shared system like Redis.

All servers check and update the same counters.

Pros

  • Works across multiple instances
  • Horizontally scalable

Cons

  • Extra latency
  • Requires atomic operations

Good For

  • Microservices
  • Cloud-native systems

8. Adaptive / Dynamic Rate Limiting

How It Works

Limits change dynamically based on:

  • CPU usage
  • Latency
  • Error rate

The system tightens or relaxes limits automatically.

Pros

  • Very resilient
  • Maximizes throughput safely

Cons

  • Hard to tune
  • Complex to reason about

Good For

  • Large-scale platforms
  • High-traffic systems

Final Thought

Rate limiting is not about blocking users — it is about maintaining system stability.

There is no “best” rate limiting strategy.

Only the right strategy for the right problem.

Understanding these trade-offs is what separates simple APIs from truly reliable systems.

Top comments (0)