DEV Community

Abdullahi Yusuf
Abdullahi Yusuf

Posted on

Building a Scalable Rate Limiting System: Token Bucket vs Leaky Bucket

Introduction

In modern distributed systems, rate limiting is not just a nice-to-have feature—it's a critical component for ensuring system stability, preventing abuse, and maintaining quality of service. Whether you're building a REST API, a microservices architecture, or a high-traffic web application, implementing effective rate limiting can mean the difference between a resilient system and one that crumbles under load.

This comprehensive guide explores two of the most popular rate limiting algorithms: Token Bucket and Leaky Bucket. We'll dive deep into how they work, when to use each one, and provide production-ready implementation examples.

What You'll Learn:

  • The fundamental principles behind rate limiting algorithms
  • Detailed Token Bucket and Leaky Bucket implementations
  • Performance characteristics and trade-offs
  • Real-world architecture patterns with Redis
  • Distributed rate limiting strategies

Why Rate Limiting Matters

Before diving into specific algorithms, let's understand why rate limiting is essential:

  • Prevent Resource Exhaustion: Protect your servers from being overwhelmed by too many requests, whether from legitimate traffic spikes or malicious attacks.

  • Ensure Fair Usage: Prevent a single user or client from consuming disproportionate resources and degrading service for others.

  • Cost Control: Limit expensive operations like database queries, external API calls, or compute-intensive tasks.

  • Security: Mitigate brute-force attacks, credential stuffing, and DDoS attempts.

  • Compliance: Meet SLA requirements and enforce tier-based access in monetized APIs.


Token Bucket Algorithm

The Token Bucket algorithm is one of the most widely used rate limiting techniques, favored by companies like Amazon AWS, Stripe, and GitHub. It provides a simple yet powerful way to allow burst traffic while maintaining an average rate limit.

How It Works

Think of a bucket that can hold a certain number of tokens. Tokens are added to the bucket at a constant rate (the refill rate). When a request arrives:

  • If the bucket has at least 1 token, the request is allowed, and 1 token is removed
  • If the bucket is empty, the request is rejected
  • The bucket has a maximum capacity—excess tokens are discarded

Token bucket

Key Characteristics

  • Allows Burst Traffic: If tokens have accumulated during quiet periods, a sudden burst of requests can be processed immediately
  • Memory Efficient: Only needs to store token count and last refill timestamp
  • Simple Implementation: Easy to understand and implement with minimal overhead
  • Configurable Burst Size: The bucket capacity controls how large bursts can be

Implementation in Node.js

Here's a production-ready implementation of the Token Bucket algorithm:

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillRate = refillRate;
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const tokensToAdd = elapsed * this.refillRate;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + tokensToAdd
    );
    this.lastRefill = now;
  }

  consume(tokens = 1) {
    this.refill();
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;
    }
    return false;
  }
}

// Usage Example
const rateLimiter = new TokenBucket(100, 10); // 100 capacity, 10 tokens/sec

app.use((req, res, next) => {
  if (rateLimiter.consume()) {
    next();
  } else {
    res.status(429).json({ error: 'Too many requests' });
  }
});
Enter fullscreen mode Exit fullscreen mode

Leaky Bucket Algorithm

The Leaky Bucket algorithm takes a different approach: it enforces a constant output rate, smoothing out burst traffic. It's particularly useful when you need predictable, steady processing rates.

How It Works

Imagine a bucket with a hole at the bottom that leaks at a constant rate. Incoming requests are added to the bucket (queue), and they leak out (get processed) at a fixed rate:

  • Requests arrive and enter the queue (bucket)
  • The bucket processes requests at a constant rate (the leak rate)
  • If the bucket is full, new requests are rejected or dropped
  • Requests wait in the queue until they can be processed

Leaky bucket

Key Characteristics

  • Constant Output Rate: Guarantees requests are processed at a predictable pace
  • Traffic Smoothing: Converts bursty traffic into a smooth, constant stream
  • Queue-Based: Requests wait in FIFO order, ensuring fairness
  • Memory Overhead: Requires storage for queued requests (higher than Token Bucket)

Implementation in Node.js

class LeakyBucket {
  constructor(capacity, leakRate) {
    this.capacity = capacity;
    this.leakRate = leakRate; // requests per second
    this.queue = [];
    this.lastLeak = Date.now();
    this.startLeaking();
  }

  startLeaking() {
    setInterval(() => {
      if (this.queue.length > 0) {
        const request = this.queue.shift();
        request.resolve(true);
      }
    }, 1000 / this.leakRate);
  }

  async addRequest() {
    if (this.queue.length >= this.capacity) {
      return false; // Queue full, reject request
    }

    return new Promise((resolve) => {
      this.queue.push({ resolve });
    });
  }
}

// Usage Example
const leakyBucket = new LeakyBucket(100, 10); // 100 capacity, 10 req/sec

app.use(async (req, res, next) => {
  const allowed = await leakyBucket.addRequest();
  if (allowed) {
    next();
  } else {
    res.status(429).json({ error: 'Too many requests' });
  }
});
Enter fullscreen mode Exit fullscreen mode

Token Bucket vs Leaky Bucket: Visual Comparison

Understanding the behavioral differences between these algorithms is crucial for choosing the right one for your use case. Let's compare how they handle the same traffic pattern:

token bucket vs leaky bucket

When to Use Token Bucket

  • API Rate Limiting: When you want to allow legitimate burst traffic (e.g., mobile app sync)
  • User-Facing Services: Provides better user experience by allowing occasional bursts
  • Memory-Constrained Systems: Lower memory footprint than queue-based approaches
  • Real-World Examples: AWS API Gateway, Stripe API, GitHub API

When to Use Leaky Bucket

  • Traffic Shaping: When you need absolutely constant output rate
  • Network Protocols: QoS implementations in routers and network devices
  • Resource Protection: Protecting downstream services with strict rate requirements
  • Real-World Examples: Video streaming rate control, network packet scheduling

Production Architecture: Distributed Rate Limiting

In production environments, rate limiting must work across multiple servers. Here's a scalable architecture using Redis:

rate limit architecture

Redis-Based Token Bucket Implementation

// Using Redis for distributed rate limiting
const Redis = require('ioredis');
const redis = new Redis();

async function checkRateLimit(userId, capacity, refillRate) {
  const key = `rate_limit:${userId}`;
  const now = Date.now();

  // Lua script for atomic token bucket check
  const script = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refillRate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local lastRefill = tonumber(bucket[2]) or now

    -- Calculate refill
    local elapsed = (now - lastRefill) / 1000
    local tokensToAdd = elapsed * refillRate
    tokens = math.min(capacity, tokens + tokensToAdd)

    -- Check and consume
    if tokens >= 1 then
      tokens = tokens - 1
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 60)
      return 1
    else
      return 0
    end
  `;

  const result = await redis.eval(
    script, 1, key, capacity, refillRate, now
  );

  return result === 1;
}

// Express middleware
app.use(async (req, res, next) => {
  const userId = req.user.id;
  const allowed = await checkRateLimit(userId, 100, 10);

  if (allowed) {
    res.setHeader('X-RateLimit-Remaining', '...');
    next();
  } else {
    res.status(429)
      .setHeader('Retry-After', 60)
      .json({ error: 'Rate limit exceeded' });
  }
});
Enter fullscreen mode Exit fullscreen mode

Best Practices and Production Tips

1. Choose the Right Granularity

  • Per-User: Most common, prevents individual abuse
  • Per-IP: Useful for anonymous endpoints, but watch for NAT/proxy scenarios
  • Per-API-Key: Essential for third-party integrations
  • Global: Protect overall system capacity

2. Implement Proper Headers

Always return rate limit information in response headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1640995200
Retry-After: 60
Enter fullscreen mode Exit fullscreen mode

3. Use Exponential Backoff

When clients hit rate limits, implement exponential backoff with jitter to prevent thundering herd problems.

async function retryWithBackoff(fn, maxRetries = 5) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.min(1000 * Math.pow(2, i) + Math.random() * 1000, 30000);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

4. Monitor and Alert

  • Track rate limit hit percentage
  • Monitor P95/P99 latencies
  • Alert on unusual patterns (potential attacks)
  • Log rejected requests for analysis

5. Consider Cost-Based Rate Limiting

Not all requests are equal. Assign different costs to expensive operations:

// Different operations consume different numbers of tokens
app.post('/api/complex-query', (req, res, next) => {
  if (rateLimiter.consume(10)) { // Expensive operation
    next();
  } else {
    res.status(429).json({ error: 'Rate limit exceeded' });
  }
});

app.get('/api/simple-read', (req, res, next) => {
  if (rateLimiter.consume(1)) { // Simple operation
    next();
  } else {
    res.status(429).json({ error: 'Rate limit exceeded' });
  }
});
Enter fullscreen mode Exit fullscreen mode

Performance Comparison

Here's how these algorithms compare on key metrics:

Metric Token Bucket Leaky Bucket
Latency O(1) constant time O(1) with queue overhead
Memory ~50 bytes per user ~1KB per user (queue)
Throughput >100K requests/sec Depends on leak rate
Burst Handling Excellent Limited by queue size
Traffic Smoothing No Excellent
Implementation Simple Moderate complexity

Conclusion

Both Token Bucket and Leaky Bucket algorithms have their place in modern system design. Token Bucket's flexibility and burst allowance make it the default choice for most API rate limiting scenarios, while Leaky Bucket's constant output rate is perfect for traffic shaping and network applications.

Key Takeaways:

  • Token Bucket is ideal for user-facing APIs with variable traffic
  • Leaky Bucket excels at smoothing traffic and providing constant output
  • Use Redis for distributed rate limiting in production
  • Always implement proper monitoring and headers
  • Consider hybrid approaches for complex requirements

With these implementations and patterns, you're well-equipped to build robust rate limiting systems that protect your infrastructure while providing excellent user experience.


Found this helpful? Follow for more deep dives into system design and backend architecture!

Top comments (0)