Fahim Ahammed Firoz

Posted on Sep 20

Protect Your API with Token Bucket Rate Limiting

#ratelimiting #tokenbucket #api #backend

When building APIs, chat services, or real-time systems, one of the biggest challenges is preventing clients from overwhelming your server with too many requests. Without protection, a flood of traffic can slow down performance or even crash the system.

This is where rate limiting comes in. Among the many techniques available, the Token Bucket Algorithm is widely used because it is simple, efficient, and allows for bursts of traffic without losing overall control.

What is Rate Limiting?

Rate limiting is the process of controlling how many requests a client can send to a server in a given period of time.

Example:

A client may be allowed 10 requests per second.
If they exceed that, their additional requests are rejected until the next second begins.

Rate limiting ensures:

Fair resource usage across users
Protection against abuse, brute-force attacks, or spam
Improved server stability and reliability

The Token Bucket Algorithm

The Token Bucket Algorithm works like this:

Each client is assigned a bucket.
The bucket has a capacity (for example, 10 tokens).
Tokens are refilled at a fixed rate (for example, 1 token per second).
Each request consumes one token.
If the bucket is empty, the request is rejected.

This approach allows short bursts of requests when tokens are available, while still enforcing a long-term average request rate.

Token Bucket Flow

The steps are as follows:

Check if the client is new
If yes, create a bucket for them with full capacity.
Refill tokens
Based on the time elapsed since the last refill.
Check if tokens are available

If yes, consume one token and accept the request.
If no, reject the request.

This balance ensures that clients can make quick bursts of requests but cannot exceed the average allowed rate.

Implementing Token Bucket in Node.js

Let’s build a simple HTTP server with Token Bucket Rate Limiting.

Step 1: Setup

const http = require('http');

// Configuration
const bucketCapacity = 10;   // maximum tokens per user
const refillRate = 1;        // tokens per second
const ipBuckets = new Map(); // store buckets for each IP

We define the bucket size, refill rate, and a map to store each user’s token bucket.

Step 2: Refill Function

function refillTokens(bucket) {
    const now = Date.now();
    const elapsed = (now - bucket.lastRefillTime) / 1000; // seconds
    const refill = Math.floor(elapsed * refillRate);

    if (refill > 0) {
        bucket.tokens = Math.min(bucketCapacity, bucket.tokens + refill);
        bucket.lastRefillTime = now;
    }
}

This function calculates how many tokens should be added based on the elapsed time since the last refill, and updates the bucket without exceeding its capacity.

Step 3: Rate Limiting Middleware

function rateLimitMiddleware(req, res) {
    const ip = req.socket.remoteAddress;

    // If user is new, create a bucket
    if (!ipBuckets.has(ip)) {
        ipBuckets.set(ip, { tokens: bucketCapacity, lastRefillTime: Date.now() });
    }

    const bucket = ipBuckets.get(ip);
    refillTokens(bucket);

    if (bucket.tokens > 0) {
        bucket.tokens -= 1; // consume one token
        res.writeHead(200, { 'Content-Type': 'text/plain' });
        res.end('Request accepted\n');
    } else {
        res.writeHead(429, { 'Content-Type': 'text/plain' });
        res.end('Too Many Requests\n');
    }
}

This function manages the bucket for each IP, consumes tokens when available, and rejects requests if tokens are empty.

Step 4: Start the Server

const server = http.createServer(rateLimitMiddleware);

server.listen(3000, () => {
    console.log('Server running at http://localhost:3000/');
});

This starts the server on port 3000, applying the rate limiting logic to every request.

Full Code

const http = require('http');

// Configuration
const bucketCapacity = 10;   // maximum tokens per user
const refillRate = 1;        // tokens per second
const ipBuckets = new Map(); // store buckets for each IP

// Refill function
function refillTokens(bucket) {
    const now = Date.now();
    const elapsed = (now - bucket.lastRefillTime) / 1000; // seconds
    const refill = Math.floor(elapsed * refillRate);

    if (refill > 0) {
        bucket.tokens = Math.min(bucketCapacity, bucket.tokens + refill);
        bucket.lastRefillTime = now;
    }
}

// Middleware
function rateLimitMiddleware(req, res) {
    const ip = req.socket.remoteAddress;

    // If user is new, create a bucket
    if (!ipBuckets.has(ip)) {
        ipBuckets.set(ip, { tokens: bucketCapacity, lastRefillTime: Date.now() });
    }

    const bucket = ipBuckets.get(ip);
    refillTokens(bucket);

    if (bucket.tokens > 0) {
        bucket.tokens -= 1; // consume one token
        res.writeHead(200, { 'Content-Type': 'text/plain' });
        res.end('Request accepted\n');
    } else {
        res.writeHead(429, { 'Content-Type': 'text/plain' });
        res.end('Too Many Requests\n');
    }
}

// Start server
const server = http.createServer(rateLimitMiddleware);

server.listen(3000, () => {
    console.log('Server running at http://localhost:3000/');
});

Use Cases

API Gateways: prevent abuse by limiting requests per client
Chat Applications: stop spamming by controlling message frequency
Authentication Systems: slow down brute-force login attempts
IoT Devices: manage bursts of data from sensors and devices

Benefits of Token Bucket

Allows bursts of requests up to the bucket capacity
Maintains a steady long-term request rate
Simple and efficient implementation
Predictable refill behavior

Conclusion

The Token Bucket Algorithm is a practical and effective way to implement rate limiting. It combines flexibility and control by allowing temporary bursts while maintaining a predictable average request rate.

If you are building APIs, chat systems, or real-time applications, Token Bucket rate limiting can help you protect your server, ensure fairness, and improve system reliability.

DEV Community