Abdullahi Yusuf

Posted on Dec 22

Building a Scalable Rate Limiting System: Token Bucket vs Leaky Bucket

#systemdesign #ratelimiting #programming #distributedsystems

Introduction

In modern distributed systems, rate limiting is not just a nice-to-have feature—it's a critical component for ensuring system stability, preventing abuse, and maintaining quality of service. Whether you're building a REST API, a microservices architecture, or a high-traffic web application, implementing effective rate limiting can mean the difference between a resilient system and one that crumbles under load.

This comprehensive guide explores two of the most popular rate limiting algorithms: Token Bucket and Leaky Bucket. We'll dive deep into how they work, when to use each one, and provide production-ready implementation examples.

What You'll Learn:

The fundamental principles behind rate limiting algorithms
Detailed Token Bucket and Leaky Bucket implementations
Performance characteristics and trade-offs
Real-world architecture patterns with Redis
Distributed rate limiting strategies

Why Rate Limiting Matters

Before diving into specific algorithms, let's understand why rate limiting is essential:

Prevent Resource Exhaustion: Protect your servers from being overwhelmed by too many requests, whether from legitimate traffic spikes or malicious attacks.
Ensure Fair Usage: Prevent a single user or client from consuming disproportionate resources and degrading service for others.
Cost Control: Limit expensive operations like database queries, external API calls, or compute-intensive tasks.
Security: Mitigate brute-force attacks, credential stuffing, and DDoS attempts.
Compliance: Meet SLA requirements and enforce tier-based access in monetized APIs.

Token Bucket Algorithm

The Token Bucket algorithm is one of the most widely used rate limiting techniques, favored by companies like Amazon AWS, Stripe, and GitHub. It provides a simple yet powerful way to allow burst traffic while maintaining an average rate limit.

How It Works

Think of a bucket that can hold a certain number of tokens. Tokens are added to the bucket at a constant rate (the refill rate). When a request arrives:

If the bucket has at least 1 token, the request is allowed, and 1 token is removed
If the bucket is empty, the request is rejected
The bucket has a maximum capacity—excess tokens are discarded

Key Characteristics

Allows Burst Traffic: If tokens have accumulated during quiet periods, a sudden burst of requests can be processed immediately
Memory Efficient: Only needs to store token count and last refill timestamp
Simple Implementation: Easy to understand and implement with minimal overhead
Configurable Burst Size: The bucket capacity controls how large bursts can be

Implementation in Node.js

Here's a production-ready implementation of the Token Bucket algorithm:

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillRate = refillRate;
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const tokensToAdd = elapsed * this.refillRate;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + tokensToAdd
    );
    this.lastRefill = now;
  }

  consume(tokens = 1) {
    this.refill();
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;
    }
    return false;
  }
}

// Usage Example
const rateLimiter = new TokenBucket(100, 10); // 100 capacity, 10 tokens/sec

app.use((req, res, next) => {
  if (rateLimiter.consume()) {
    next();
  } else {
    res.status(429).json({ error: 'Too many requests' });
  }
});

Leaky Bucket Algorithm

The Leaky Bucket algorithm takes a different approach: it enforces a constant output rate, smoothing out burst traffic. It's particularly useful when you need predictable, steady processing rates.

How It Works

Imagine a bucket with a hole at the bottom that leaks at a constant rate. Incoming requests are added to the bucket (queue), and they leak out (get processed) at a fixed rate:

Requests arrive and enter the queue (bucket)
The bucket processes requests at a constant rate (the leak rate)
If the bucket is full, new requests are rejected or dropped
Requests wait in the queue until they can be processed

Key Characteristics

Constant Output Rate: Guarantees requests are processed at a predictable pace
Traffic Smoothing: Converts bursty traffic into a smooth, constant stream
Queue-Based: Requests wait in FIFO order, ensuring fairness
Memory Overhead: Requires storage for queued requests (higher than Token Bucket)

Implementation in Node.js

class LeakyBucket {
  constructor(capacity, leakRate) {
    this.capacity = capacity;
    this.leakRate = leakRate; // requests per second
    this.queue = [];
    this.lastLeak = Date.now();
    this.startLeaking();
  }

  startLeaking() {
    setInterval(() => {
      if (this.queue.length > 0) {
        const request = this.queue.shift();
        request.resolve(true);
      }
    }, 1000 / this.leakRate);
  }

  async addRequest() {
    if (this.queue.length >= this.capacity) {
      return false; // Queue full, reject request
    }

    return new Promise((resolve) => {
      this.queue.push({ resolve });
    });
  }
}

// Usage Example
const leakyBucket = new LeakyBucket(100, 10); // 100 capacity, 10 req/sec

app.use(async (req, res, next) => {
  const allowed = await leakyBucket.addRequest();
  if (allowed) {
    next();
  } else {
    res.status(429).json({ error: 'Too many requests' });
  }
});

Token Bucket vs Leaky Bucket: Visual Comparison

Understanding the behavioral differences between these algorithms is crucial for choosing the right one for your use case. Let's compare how they handle the same traffic pattern:

When to Use Token Bucket

API Rate Limiting: When you want to allow legitimate burst traffic (e.g., mobile app sync)
User-Facing Services: Provides better user experience by allowing occasional bursts
Memory-Constrained Systems: Lower memory footprint than queue-based approaches
Real-World Examples: AWS API Gateway, Stripe API, GitHub API

When to Use Leaky Bucket

Traffic Shaping: When you need absolutely constant output rate
Network Protocols: QoS implementations in routers and network devices
Resource Protection: Protecting downstream services with strict rate requirements
Real-World Examples: Video streaming rate control, network packet scheduling

Production Architecture: Distributed Rate Limiting

In production environments, rate limiting must work across multiple servers. Here's a scalable architecture using Redis:

Redis-Based Token Bucket Implementation

// Using Redis for distributed rate limiting
const Redis = require('ioredis');
const redis = new Redis();

async function checkRateLimit(userId, capacity, refillRate) {
  const key = `rate_limit:${userId}`;
  const now = Date.now();

  // Lua script for atomic token bucket check
  const script = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refillRate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local lastRefill = tonumber(bucket[2]) or now

    -- Calculate refill
    local elapsed = (now - lastRefill) / 1000
    local tokensToAdd = elapsed * refillRate
    tokens = math.min(capacity, tokens + tokensToAdd)

    -- Check and consume
    if tokens >= 1 then
      tokens = tokens - 1
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 60)
      return 1
    else
      return 0
    end
  `;

  const result = await redis.eval(
    script, 1, key, capacity, refillRate, now
  );

  return result === 1;
}

// Express middleware
app.use(async (req, res, next) => {
  const userId = req.user.id;
  const allowed = await checkRateLimit(userId, 100, 10);

  if (allowed) {
    res.setHeader('X-RateLimit-Remaining', '...');
    next();
  } else {
    res.status(429)
      .setHeader('Retry-After', 60)
      .json({ error: 'Rate limit exceeded' });
  }
});

Best Practices and Production Tips

1. Choose the Right Granularity

Per-User: Most common, prevents individual abuse
Per-IP: Useful for anonymous endpoints, but watch for NAT/proxy scenarios
Per-API-Key: Essential for third-party integrations
Global: Protect overall system capacity

2. Implement Proper Headers

Always return rate limit information in response headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1640995200
Retry-After: 60

3. Use Exponential Backoff

When clients hit rate limits, implement exponential backoff with jitter to prevent thundering herd problems.

async function retryWithBackoff(fn, maxRetries = 5) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.min(1000 * Math.pow(2, i) + Math.random() * 1000, 30000);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
}

4. Monitor and Alert

Track rate limit hit percentage
Monitor P95/P99 latencies
Alert on unusual patterns (potential attacks)
Log rejected requests for analysis

5. Consider Cost-Based Rate Limiting

Not all requests are equal. Assign different costs to expensive operations:

// Different operations consume different numbers of tokens
app.post('/api/complex-query', (req, res, next) => {
  if (rateLimiter.consume(10)) { // Expensive operation
    next();
  } else {
    res.status(429).json({ error: 'Rate limit exceeded' });
  }
});

app.get('/api/simple-read', (req, res, next) => {
  if (rateLimiter.consume(1)) { // Simple operation
    next();
  } else {
    res.status(429).json({ error: 'Rate limit exceeded' });
  }
});

Performance Comparison

Here's how these algorithms compare on key metrics:

Metric	Token Bucket	Leaky Bucket
Latency	O(1) constant time	O(1) with queue overhead
Memory	~50 bytes per user	~1KB per user (queue)
Throughput	>100K requests/sec	Depends on leak rate
Burst Handling	Excellent	Limited by queue size
Traffic Smoothing	No	Excellent
Implementation	Simple	Moderate complexity

Conclusion

Both Token Bucket and Leaky Bucket algorithms have their place in modern system design. Token Bucket's flexibility and burst allowance make it the default choice for most API rate limiting scenarios, while Leaky Bucket's constant output rate is perfect for traffic shaping and network applications.

Key Takeaways:

Token Bucket is ideal for user-facing APIs with variable traffic
Leaky Bucket excels at smoothing traffic and providing constant output
Use Redis for distributed rate limiting in production
Always implement proper monitoring and headers
Consider hybrid approaches for complex requirements

With these implementations and patterns, you're well-equipped to build robust rate limiting systems that protect your infrastructure while providing excellent user experience.

Found this helpful? Follow for more deep dives into system design and backend architecture!

DEV Community

Building a Scalable Rate Limiting System: Token Bucket vs Leaky Bucket

Introduction

What You'll Learn:

Why Rate Limiting Matters

Token Bucket Algorithm

How It Works

Key Characteristics

Implementation in Node.js

Leaky Bucket Algorithm

How It Works

Key Characteristics

Implementation in Node.js

Token Bucket vs Leaky Bucket: Visual Comparison

When to Use Token Bucket

When to Use Leaky Bucket

Production Architecture: Distributed Rate Limiting

Redis-Based Token Bucket Implementation

Best Practices and Production Tips

1. Choose the Right Granularity

2. Implement Proper Headers

3. Use Exponential Backoff

4. Monitor and Alert

5. Consider Cost-Based Rate Limiting

Performance Comparison

Conclusion

Key Takeaways:

Top comments (0)