DEV Community

Cover image for Protect Your API with Token Bucket Rate Limiting
Fahim Ahammed Firoz
Fahim Ahammed Firoz

Posted on

Protect Your API with Token Bucket Rate Limiting

When building APIs, chat services, or real-time systems, one of the biggest challenges is preventing clients from overwhelming your server with too many requests. Without protection, a flood of traffic can slow down performance or even crash the system.

This is where rate limiting comes in. Among the many techniques available, the Token Bucket Algorithm is widely used because it is simple, efficient, and allows for bursts of traffic without losing overall control.


What is Rate Limiting?

Rate limiting is the process of controlling how many requests a client can send to a server in a given period of time.

Example:

  • A client may be allowed 10 requests per second.
  • If they exceed that, their additional requests are rejected until the next second begins.

Rate limiting ensures:

  • Fair resource usage across users
  • Protection against abuse, brute-force attacks, or spam
  • Improved server stability and reliability

The Token Bucket Algorithm

The Token Bucket Algorithm works like this:

  • Each client is assigned a bucket.
  • The bucket has a capacity (for example, 10 tokens).
  • Tokens are refilled at a fixed rate (for example, 1 token per second).
  • Each request consumes one token.
  • If the bucket is empty, the request is rejected.

This approach allows short bursts of requests when tokens are available, while still enforcing a long-term average request rate.


Token Bucket Flow

Token Bucket Rate Limiter SD

The steps are as follows:

  1. Check if the client is new
    If yes, create a bucket for them with full capacity.

  2. Refill tokens
    Based on the time elapsed since the last refill.

  3. Check if tokens are available

  • If yes, consume one token and accept the request.
  • If no, reject the request.

This balance ensures that clients can make quick bursts of requests but cannot exceed the average allowed rate.


Implementing Token Bucket in Node.js

Let’s build a simple HTTP server with Token Bucket Rate Limiting.

Step 1: Setup

const http = require('http');

// Configuration
const bucketCapacity = 10;   // maximum tokens per user
const refillRate = 1;        // tokens per second
const ipBuckets = new Map(); // store buckets for each IP
Enter fullscreen mode Exit fullscreen mode

We define the bucket size, refill rate, and a map to store each user’s token bucket.


Step 2: Refill Function

function refillTokens(bucket) {
    const now = Date.now();
    const elapsed = (now - bucket.lastRefillTime) / 1000; // seconds
    const refill = Math.floor(elapsed * refillRate);

    if (refill > 0) {
        bucket.tokens = Math.min(bucketCapacity, bucket.tokens + refill);
        bucket.lastRefillTime = now;
    }
}
Enter fullscreen mode Exit fullscreen mode

This function calculates how many tokens should be added based on the elapsed time since the last refill, and updates the bucket without exceeding its capacity.


Step 3: Rate Limiting Middleware

function rateLimitMiddleware(req, res) {
    const ip = req.socket.remoteAddress;

    // If user is new, create a bucket
    if (!ipBuckets.has(ip)) {
        ipBuckets.set(ip, { tokens: bucketCapacity, lastRefillTime: Date.now() });
    }

    const bucket = ipBuckets.get(ip);
    refillTokens(bucket);

    if (bucket.tokens > 0) {
        bucket.tokens -= 1; // consume one token
        res.writeHead(200, { 'Content-Type': 'text/plain' });
        res.end('Request accepted\n');
    } else {
        res.writeHead(429, { 'Content-Type': 'text/plain' });
        res.end('Too Many Requests\n');
    }
}
Enter fullscreen mode Exit fullscreen mode

This function manages the bucket for each IP, consumes tokens when available, and rejects requests if tokens are empty.


Step 4: Start the Server

const server = http.createServer(rateLimitMiddleware);

server.listen(3000, () => {
    console.log('Server running at http://localhost:3000/');
});
Enter fullscreen mode Exit fullscreen mode

This starts the server on port 3000, applying the rate limiting logic to every request.


Full Code

const http = require('http');

// Configuration
const bucketCapacity = 10;   // maximum tokens per user
const refillRate = 1;        // tokens per second
const ipBuckets = new Map(); // store buckets for each IP

// Refill function
function refillTokens(bucket) {
    const now = Date.now();
    const elapsed = (now - bucket.lastRefillTime) / 1000; // seconds
    const refill = Math.floor(elapsed * refillRate);

    if (refill > 0) {
        bucket.tokens = Math.min(bucketCapacity, bucket.tokens + refill);
        bucket.lastRefillTime = now;
    }
}

// Middleware
function rateLimitMiddleware(req, res) {
    const ip = req.socket.remoteAddress;

    // If user is new, create a bucket
    if (!ipBuckets.has(ip)) {
        ipBuckets.set(ip, { tokens: bucketCapacity, lastRefillTime: Date.now() });
    }

    const bucket = ipBuckets.get(ip);
    refillTokens(bucket);

    if (bucket.tokens > 0) {
        bucket.tokens -= 1; // consume one token
        res.writeHead(200, { 'Content-Type': 'text/plain' });
        res.end('Request accepted\n');
    } else {
        res.writeHead(429, { 'Content-Type': 'text/plain' });
        res.end('Too Many Requests\n');
    }
}

// Start server
const server = http.createServer(rateLimitMiddleware);

server.listen(3000, () => {
    console.log('Server running at http://localhost:3000/');
});
Enter fullscreen mode Exit fullscreen mode

Use Cases

  • API Gateways: prevent abuse by limiting requests per client
  • Chat Applications: stop spamming by controlling message frequency
  • Authentication Systems: slow down brute-force login attempts
  • IoT Devices: manage bursts of data from sensors and devices

Benefits of Token Bucket

  • Allows bursts of requests up to the bucket capacity
  • Maintains a steady long-term request rate
  • Simple and efficient implementation
  • Predictable refill behavior

Conclusion

The Token Bucket Algorithm is a practical and effective way to implement rate limiting. It combines flexibility and control by allowing temporary bursts while maintaining a predictable average request rate.

If you are building APIs, chat systems, or real-time applications, Token Bucket rate limiting can help you protect your server, ensure fairness, and improve system reliability.

Top comments (0)