Akshat Jain

Posted on Feb 25 • Originally published at akshatjme.Medium

Rate Limiting 101: How to Protect Your APIs at Scale

#backenddevelopment #systemdesignconcepts #softwareengineering #apidesign

How Systems Protect Themselves from Too Many Requests

Modern systems don’t fail because they are badly written.

They fail because they receive more requests than they can safely handle.

Whether it’s an API, a backend service, or a public-facing platform, uncontrolled traffic can lead to:

High latency
Resource exhaustion
Cascading failures
Complete downtime

This is why rate limiting exists.

At a high level, rate limiting answers one simple question:

How many requests should we allow, and when should we say “no”?

In this article, we’ll take a simple and intuitive approach to understand the most common rate limiting strategies used in real systems. For each strategy, we’ll look at:

How it works conceptually
Its strengths and weaknesses
When it should be used
A small JavaScript example to see it in action

Rate Limiting 101

What Is Rate Limiting, Really?

Rate limiting is not about blocking users.

It is about protecting systems.

Instead of letting traffic grow uncontrollably, rate limiting enforces rules like:

“Only 100 requests per minute”
“Only 10 concurrent requests at a time”
“Allow short bursts, but limit long-term usage”

Different systems need different rules, which is why multiple strategies exist.

1. Fixed Window Rate Limiting

How It Works

You allow N requests per fixed time window.

Example:

Window size: 1 minute
Limit: 100 requests

12:00 – 12:01 → max 100 requests  
12:01 – 12:02 → counter resets

The counter resets at the start of each new window.

JavaScript Example

class FixedWindowLimiter {  
  constructor(limit, windowMs) {  
    this.limit = limit;          // Max requests per window  
    this.windowMs = windowMs;    // Window size in milliseconds  
    this.count = 0;              // Current request count  
    this.windowStart = Date.now();  
  }  

  allowRequest() {  
    const now = Date.now();  

    // Reset window if time has passed  
    if (now - this.windowStart >= this.windowMs) {  
      this.windowStart = now;  
      this.count = 0;  
    }  

    // Allow request if under limit  
    if (this.count < this.limit) {  
      this.count++;  
      return true;  
    }  

    // Otherwise reject request  
    return false;  
  }  
}

Pros

Very easy to implement
Low memory usage

Cons

Burst problem at window edges

Example:

99 requests at 12:00:59
99 requests at 12:01:01

System gets hit with 198 requests almost instantly.

Good For

Internal tools
Simple APIs
Low-risk systems

2. Sliding Window (Log-Based)

How It Works

Instead of fixed windows, we track timestamps of every request.

A request is allowed only if the number of requests in the last X seconds is below the limit.

JavaScript Example

class SlidingWindowLogLimiter {  
  constructor(limit, windowMs) {  
    this.limit = limit;        // Max requests allowed  
    this.windowMs = windowMs;  // Time window in milliseconds  
    this.requests = \[\];        // Store timestamps of requests  
  }  

  allowRequest() {  
    const now = Date.now();  

    // Remove expired timestamps  
    this.requests = this.requests.filter(  
      timestamp => now - timestamp < this.windowMs  
    );  

    // Check if under the limit  
    if (this.requests.length < this.limit) {  
      this.requests.push(now);  
      return true;  
    }  

    // Otherwise reject  
    return false;  
  }  
}

Pros

Very accurate
No burst edge problem

Cons

High memory usage
Slower for high traffic

Good For

Low-traffic but precision-critical systems

3. Sliding Window (Counter-Based)

How It Works

This approach blends two adjacent fixed windows.

Instead of tracking every request, we:

Count requests in the current window
Use a weighted average from the previous window

This smooths out burst behavior.

JavaScript Example

class SlidingWindowCounterLimiter {  
  constructor(limit, windowMs) {  
    this.limit = limit;         // Max requests allowed  
    this.windowMs = windowMs;   // Window size in milliseconds  
    this.current = 0;           // Requests in current window  
    this.previous = 0;          // Requests in previous window  
    this.windowStart = Date.now();  
  }  

  allowRequest() {  
    const now = Date.now();  
    const elapsed = now - this.windowStart;  

    // If current window expired, shift counters  
    if (elapsed >= this.windowMs) {  
      this.previous = this.current;  
      this.current = 0;  
      this.windowStart = now;  
    }  

    // Calculate weighted estimate  
    const weight = 1 - (elapsed / this.windowMs);  
    const estimatedCount =  
      this.previous \* weight + this.current;  

    // Allow request if under limit  
    if (estimatedCount < this.limit) {  
      this.current++;  
      return true;  
    }  

    return false;  
  }  
}

Pros

Smoother than fixed window
Cheaper than log-based sliding window

Cons

Slightly approximate

Good For

APIs needing smooth limits without heavy overhead

4. Token Bucket ⭐ (Most Popular)

How It Works

Tokens are added to a bucket at a fixed rate
Each request consumes one token
Bucket has a maximum capacity (burst allowance)

If no tokens are available, requests are rejected.

JavaScript Example

class TokenBucket {  
  constructor(rate, capacity) {  
    this.rate = rate;             // Tokens added per second  
    this.capacity = capacity;     // Maximum bucket size  
    this.tokens = capacity;       // Current tokens available  
    this.lastRefill = Date.now(); // Last refill timestamp  
  }  

  allowRequest() {  
    const now = Date.now();  

    // Calculate time elapsed (in seconds)  
    const elapsed = (now - this.lastRefill) / 1000;  

    // Refill tokens based on elapsed time  
    this.tokens = Math.min(  
      this.capacity,  
      this.tokens + elapsed \* this.rate  
    );  

    this.lastRefill = now;  

    // Allow request if at least 1 token is available  
    if (this.tokens >= 1) {  
      this.tokens -= 1;  
      return true;  
    }  

    return false;  
  }  
}

Pros

Allows controlled bursts
Smooth and flexible
Widely used in industry

Cons

Slightly more complex

Good For

Public APIs
User-facing services

5. Leaky Bucket

How It Works

Requests enter a queue and are processed at a constant rate.

Think of water leaking from a bucket at a steady speed.

JavaScript Example

class LeakyBucket {  
  constructor(rate) {  
    this.rate = rate;     // Requests processed per second  
    this.queue = \[\];      // Queue of incoming requests  

    // Process queue at fixed interval  
    this.interval = setInterval(  
      () => this.process(),  
      1000 / this.rate  
    );  
  }  

  allowRequest(request) {  
    // Add request to queue  
    this.queue.push(request);  
  }  

  process() {  
    if (this.queue.length > 0) {  
      const request = this.queue.shift();  
      // Here you would actually handle the request  
      // e.g., request.resolve() or call a handler  
    }  
  }  

  stop() {  
    clearInterval(this.interval);  
  }  
}

Pros

Very smooth traffic
Predictable system load

Cons

Bursts get delayed or dropped

Good For

Traffic shaping
Network-level systems

6. Concurrency Limiting

How It Works

Instead of limiting request rate, we limit the number of in-flight requests.

JavaScript Example

class ConcurrencyLimiter {  
  constructor(maxConcurrent) {  
    this.maxConcurrent = maxConcurrent; // Maximum allowed concurrent executions  
    this.current = 0;                   // Currently running tasks  
  }  

  async handleRequest(fn) {  
    if (this.current >= this.maxConcurrent) {  
      throw new Error("Too many concurrent requests");  
    }  

    this.current++;  

    try {  
      return await fn();  
    } finally {  
      this.current--;  
    }  
  }  
}

Pros

Protects backend resources
Great for slow endpoints

Cons

Doesn’t limit request rate directly

Good For

Databases
ML inference
Long-running requests

7. Distributed Rate Limiting

How It Works

Rate limit data is stored in a shared system like Redis.

All servers check and update the same counters.

Pros

Works across multiple instances
Horizontally scalable

Cons

Extra latency
Requires atomic operations

Good For

Microservices
Cloud-native systems

8. Adaptive / Dynamic Rate Limiting

How It Works

Limits change dynamically based on:

CPU usage
Latency
Error rate

The system tightens or relaxes limits automatically.

Pros

Very resilient
Maximizes throughput safely

Cons

Hard to tune
Complex to reason about

Good For

Large-scale platforms
High-traffic systems

Final Thought

Rate limiting is not about blocking users — it is about maintaining system stability.

There is no “best” rate limiting strategy.

Only the right strategy for the right problem.

Understanding these trade-offs is what separates simple APIs from truly reliable systems.