Building a Production-Ready Rate Limiter in Node.js
Rate limiting is one of those things developers ignore until they get hit by a botnet, a runaway script, or a competitor scraping their API. By then, it's too late — your server is melting, your database is overwhelmed, and legitimate users are getting errors.
In this guide, we'll build a production-ready rate limiter from scratch in Node.js. We'll implement the token bucket algorithm, integrate Redis for distributed rate limiting across multiple servers, add a sliding window counter for precision, and package everything as a reusable Express middleware.
This isn't a "just install express-rate-limit" tutorial. We're going deep — understanding the algorithms, their tradeoffs, and how to make rate limiting work reliably at scale.
Why Rate Limiting Matters
Before we write code, let's be clear about what we're protecting against:
- DDoS mitigation — Limit how fast any single IP can send requests
- API abuse — Prevent one customer from consuming all your capacity
- Brute force protection — Slow down password-guessing attacks
- Cost control — Cap expensive operations (AI calls, SMS, emails) per user
- Fair usage — Ensure all users get their fair share of resources
Different threats need different strategies. A login endpoint needs aggressive limits (10 attempts/minute). A read API can be more generous (1000 requests/minute). We'll build a flexible system that handles both.
Algorithm Overview: Four Approaches
1. Fixed Window Counter
Simplest approach: count requests in fixed time windows (e.g., 0:00-0:01, 0:01-0:02).
Problem: A user can send 1000 requests at 0:00:59 and another 1000 at 0:01:01 — 2000 requests in 2 seconds, bypassing a "1000/minute" limit.
2. Sliding Window Log
Store timestamps of every request. Count how many fall within the last N seconds.
Problem: Memory-intensive. Storing timestamps for millions of users is expensive.
3. Sliding Window Counter (Hybrid)
Best of both worlds. Use two fixed windows and interpolate based on position within the current window. Very accurate, memory-efficient.
4. Token Bucket
Each user has a "bucket" of tokens. Each request consumes one token. Tokens refill at a fixed rate. Users can burst up to the bucket size.
Advantage: Allows natural bursting while enforcing average rate limits. Great for APIs.
We'll implement token bucket as our primary algorithm and sliding window as an alternative.
Setting Up the Project
mkdir rate-limiter-demo && cd rate-limiter-demo
npm init -y
npm install express redis ioredis
npm install -D typescript @types/express @types/node ts-node
Part 1: In-Memory Token Bucket
Let's start with a pure in-memory implementation to understand the algorithm:
// src/algorithms/tokenBucket.ts
interface BucketState {
tokens: number;
lastRefill: number;
}
interface TokenBucketOptions {
capacity: number; // Max tokens (burst limit)
refillRate: number; // Tokens added per second
refillInterval?: number; // How often to refill (ms), default 1000
}
export class TokenBucket {
private buckets = new Map<string, BucketState>();
private options: Required<TokenBucketOptions>;
constructor(options: TokenBucketOptions) {
this.options = {
refillInterval: 1000,
...options
};
// Clean up stale buckets periodically
setInterval(() => this.cleanup(), 60_000);
}
/**
* Attempt to consume tokens for a given key.
* Returns { allowed: true } if within limit, or
* { allowed: false, retryAfter: ms } if limited.
*/
consume(key: string, tokens = 1): { allowed: boolean; remaining: number; retryAfter?: number } {
const now = Date.now();
const bucket = this.buckets.get(key) ?? { tokens: this.options.capacity, lastRefill: now };
// Calculate tokens to add since last refill
const elapsed = now - bucket.lastRefill;
const tokensToAdd = (elapsed / 1000) * this.options.refillRate;
// Refill the bucket (don't exceed capacity)
const currentTokens = Math.min(
this.options.capacity,
bucket.tokens + tokensToAdd
);
if (currentTokens >= tokens) {
// Allow the request
this.buckets.set(key, {
tokens: currentTokens - tokens,
lastRefill: now
});
return {
allowed: true,
remaining: Math.floor(currentTokens - tokens)
};
} else {
// Deny the request
// Calculate how long until they have enough tokens
const deficit = tokens - currentTokens;
const waitMs = Math.ceil((deficit / this.options.refillRate) * 1000);
// Update lastRefill even when denied (to track time accurately)
this.buckets.set(key, {
tokens: currentTokens,
lastRefill: now
});
return {
allowed: false,
remaining: Math.floor(currentTokens),
retryAfter: waitMs
};
}
}
// Remove stale entries to prevent memory leaks
private cleanup(): void {
const now = Date.now();
const staleThreshold = 5 * 60 * 1000; // 5 minutes
for (const [key, bucket] of this.buckets) {
if (now - bucket.lastRefill > staleThreshold) {
this.buckets.delete(key);
}
}
}
getStats(key: string): BucketState | null {
return this.buckets.get(key) ?? null;
}
}
Let's test this works correctly:
// Quick test
const bucket = new TokenBucket({ capacity: 10, refillRate: 2 }); // 2 tokens/sec, burst of 10
// Burst: first 10 requests succeed
for (let i = 0; i < 10; i++) {
const result = bucket.consume('user-123');
console.log(`Request ${i + 1}: ${result.allowed}, remaining: ${result.remaining}`);
}
// 11th request fails
const limited = bucket.consume('user-123');
console.log('11th:', limited); // { allowed: false, remaining: 0, retryAfter: 500 }
// After 1 second, 2 new tokens available
setTimeout(() => {
const result = bucket.consume('user-123');
console.log('After 1s:', result); // { allowed: true, remaining: 1 }
}, 1000);
Part 2: Redis-Backed Token Bucket for Production
In-memory rate limiting breaks the moment you run multiple server instances. If you have 3 servers, a user can send 3x the limit. Redis solves this — all servers share the same rate limit state.
The key is making the token consumption atomic using Redis Lua scripts:
// src/algorithms/redisTokenBucket.ts
import Redis from 'ioredis';
const TOKEN_BUCKET_SCRIPT = `
-- KEYS[1]: the bucket key (e.g., "ratelimit:user:123")
-- ARGV[1]: bucket capacity
-- ARGV[2]: refill rate (tokens per second)
-- ARGV[3]: tokens to consume
-- ARGV[4]: current timestamp (milliseconds)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
-- Get current bucket state
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])
-- Initialize if first request
if current_tokens == nil then
current_tokens = capacity
last_refill = now
end
-- Calculate token refill
local elapsed = (now - last_refill) / 1000 -- convert to seconds
local tokens_to_add = elapsed * refill_rate
current_tokens = math.min(capacity, current_tokens + tokens_to_add)
local allowed = 0
local retry_after = 0
if current_tokens >= requested then
-- Allow the request
current_tokens = current_tokens - requested
allowed = 1
else
-- Deny: calculate wait time
local deficit = requested - current_tokens
retry_after = math.ceil((deficit / refill_rate) * 1000)
end
-- Save updated state with TTL (auto-cleanup)
redis.call('HMSET', key,
'tokens', current_tokens,
'last_refill', now
)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000) + 5000)
return {allowed, math.floor(current_tokens), retry_after}
`;
interface RateLimitResult {
allowed: boolean;
remaining: number;
retryAfter?: number;
resetAt?: Date;
}
interface RedisTokenBucketOptions {
capacity: number;
refillRate: number;
keyPrefix?: string;
}
export class RedisTokenBucket {
private redis: Redis;
private options: Required<RedisTokenBucketOptions>;
private scriptSha?: string;
constructor(redis: Redis, options: RedisTokenBucketOptions) {
this.redis = redis;
this.options = {
keyPrefix: 'ratelimit',
...options
};
}
async initialize(): Promise<void> {
// Load the Lua script and cache its SHA for efficiency
this.scriptSha = await this.redis.script('LOAD', TOKEN_BUCKET_SCRIPT) as string;
console.log('Rate limiter script loaded, SHA:', this.scriptSha);
}
async consume(identifier: string, tokens = 1): Promise<RateLimitResult> {
const key = `${this.options.keyPrefix}:${identifier}`;
const now = Date.now();
try {
let result: [number, number, number];
if (this.scriptSha) {
try {
result = await this.redis.evalsha(
this.scriptSha,
1,
key,
this.options.capacity,
this.options.refillRate,
tokens,
now
) as [number, number, number];
} catch (err: any) {
// Script may have been flushed, reload and retry
if (err.message.includes('NOSCRIPT')) {
this.scriptSha = undefined;
return this.consume(identifier, tokens);
}
throw err;
}
} else {
// Fallback: load inline (slower but safe)
result = await this.redis.eval(
TOKEN_BUCKET_SCRIPT,
1,
key,
this.options.capacity,
this.options.refillRate,
tokens,
now
) as [number, number, number];
}
const [allowed, remaining, retryAfter] = result;
return {
allowed: allowed === 1,
remaining,
retryAfter: retryAfter > 0 ? retryAfter : undefined,
resetAt: retryAfter > 0 ? new Date(now + retryAfter) : undefined
};
} catch (err) {
// On Redis failure, fail open (allow the request) rather than
// block all traffic. Log the error for monitoring.
console.error('Rate limiter Redis error:', err);
return { allowed: true, remaining: -1 };
}
}
}
Part 3: Sliding Window Counter
For endpoints where precise rate limiting matters (login, payment, sensitive operations), use the sliding window algorithm:
// src/algorithms/slidingWindow.ts
import Redis from 'ioredis';
const SLIDING_WINDOW_SCRIPT = `
-- Sliding window rate limiter using two fixed windows
-- KEYS[1]: current window key
-- KEYS[2]: previous window key
-- ARGV[1]: max requests per window
-- ARGV[2]: window size in seconds
-- ARGV[3]: current timestamp (seconds)
-- ARGV[4]: window index (current_time // window_size)
local current_key = KEYS[1]
local previous_key = KEYS[2]
local max_requests = tonumber(ARGV[1])
local window_size = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_window = tonumber(ARGV[4])
-- Get counts from both windows
local current_count = tonumber(redis.call('GET', current_key) or '0')
local previous_count = tonumber(redis.call('GET', previous_key) or '0')
-- Calculate weight of previous window
-- If we're 30% into the current window, previous window contributes 70%
local window_start = current_window * window_size
local elapsed_in_window = now - window_start
local previous_weight = 1 - (elapsed_in_window / window_size)
-- Weighted request count
local weighted_count = math.floor(
previous_count * previous_weight + current_count
)
if weighted_count >= max_requests then
-- Calculate when the window resets enough to allow a request
local next_window_start = (current_window + 1) * window_size
local retry_after = next_window_start - now
return {0, max_requests - weighted_count, retry_after}
end
-- Increment current window
local new_count = redis.call('INCR', current_key)
if new_count == 1 then
-- Set TTL on first increment (2x window to keep previous window)
redis.call('EXPIRE', current_key, window_size * 2)
end
return {1, max_requests - weighted_count - 1, 0}
`;
interface SlidingWindowOptions {
windowSize: number; // seconds
maxRequests: number;
keyPrefix?: string;
}
export class SlidingWindowRateLimiter {
private redis: Redis;
private options: Required<SlidingWindowOptions>;
constructor(redis: Redis, options: SlidingWindowOptions) {
this.redis = redis;
this.options = { keyPrefix: 'sw', ...options };
}
async consume(identifier: string): Promise<{
allowed: boolean;
remaining: number;
retryAfter?: number;
}> {
const { windowSize, maxRequests, keyPrefix } = this.options;
const now = Math.floor(Date.now() / 1000);
const currentWindow = Math.floor(now / windowSize);
const currentKey = `${keyPrefix}:${identifier}:${currentWindow}`;
const previousKey = `${keyPrefix}:${identifier}:${currentWindow - 1}`;
const result = await this.redis.eval(
SLIDING_WINDOW_SCRIPT,
2,
currentKey,
previousKey,
maxRequests,
windowSize,
now,
currentWindow
) as [number, number, number];
const [allowed, remaining, retryAfter] = result;
return {
allowed: allowed === 1,
remaining: Math.max(0, remaining),
retryAfter: retryAfter > 0 ? retryAfter : undefined
};
}
}
Part 4: Express Middleware
Now let's wrap everything in a clean, configurable Express middleware:
// src/middleware/rateLimiter.ts
import { Request, Response, NextFunction } from 'express';
import Redis from 'ioredis';
import { RedisTokenBucket } from '../algorithms/redisTokenBucket';
import { SlidingWindowRateLimiter } from '../algorithms/slidingWindow';
type Algorithm = 'token-bucket' | 'sliding-window';
interface RateLimiterMiddlewareOptions {
// Algorithm selection
algorithm?: Algorithm;
// Token bucket options
capacity?: number;
refillRate?: number;
// Sliding window options
windowSize?: number;
maxRequests?: number;
// Key extraction: what identifies a "user"?
keyExtractor?: (req: Request) => string;
// Skip rate limiting for certain requests
skip?: (req: Request) => boolean;
// Custom response when limited
onLimited?: (req: Request, res: Response) => void;
// Redis connection
redis: Redis;
// Namespace for this limiter (allows multiple limiters)
name?: string;
}
// Default key extractor: use IP address
const defaultKeyExtractor = (req: Request): string => {
const ip =
req.headers['x-forwarded-for']?.toString().split(',')[0] ||
req.headers['x-real-ip']?.toString() ||
req.socket.remoteAddress ||
'unknown';
return `ip:${ip}`;
};
export function createRateLimiter(options: RateLimiterMiddlewareOptions) {
const {
algorithm = 'token-bucket',
capacity = 100,
refillRate = 10,
windowSize = 60,
maxRequests = 100,
keyExtractor = defaultKeyExtractor,
skip,
onLimited,
redis,
name = 'default'
} = options;
// Initialize the rate limiter based on algorithm
let limiter: RedisTokenBucket | SlidingWindowRateLimiter;
if (algorithm === 'token-bucket') {
const bucket = new RedisTokenBucket(redis, {
capacity,
refillRate,
keyPrefix: `ratelimit:${name}`
});
// Initialize async (load Lua script)
bucket.initialize().catch(console.error);
limiter = bucket;
} else {
limiter = new SlidingWindowRateLimiter(redis, {
windowSize,
maxRequests,
keyPrefix: `ratelimit:${name}`
});
}
// Return the Express middleware
return async (req: Request, res: Response, next: NextFunction): Promise<void> => {
// Skip if configured
if (skip?.(req)) {
return next();
}
const key = keyExtractor(req);
try {
const result = await limiter.consume(key);
// Set rate limit headers (standard RFC 6585)
res.setHeader('X-RateLimit-Limit', capacity || maxRequests);
res.setHeader('X-RateLimit-Remaining', result.remaining);
if (result.retryAfter) {
res.setHeader('Retry-After', Math.ceil(result.retryAfter / 1000));
res.setHeader('X-RateLimit-Reset', Date.now() + result.retryAfter);
}
if (!result.allowed) {
if (onLimited) {
onLimited(req, res);
} else {
res.status(429).json({
error: 'Too Many Requests',
message: 'You have exceeded the rate limit. Please slow down.',
retryAfter: result.retryAfter
? Math.ceil(result.retryAfter / 1000)
: undefined
});
}
return;
}
next();
} catch (err) {
// Log but don't block traffic on limiter failure
console.error(`Rate limiter error for key ${key}:`, err);
next();
}
};
}
Part 5: Real-World Usage
Here's how to use this in a real Express application:
// src/app.ts
import express from 'express';
import Redis from 'ioredis';
import { createRateLimiter } from './middleware/rateLimiter';
const app = express();
const redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');
// Global rate limiter: 1000 requests/minute per IP
const globalLimiter = createRateLimiter({
redis,
name: 'global',
algorithm: 'token-bucket',
capacity: 1000,
refillRate: 16.67, // ~1000/minute
});
// Auth limiter: strict sliding window for login
const authLimiter = createRateLimiter({
redis,
name: 'auth',
algorithm: 'sliding-window',
windowSize: 900, // 15 minutes
maxRequests: 10, // 10 attempts per 15 min
keyExtractor: (req) => {
// Rate limit by IP AND username together
const ip = req.socket.remoteAddress || 'unknown';
const username = req.body?.username || 'unknown';
return `${ip}:${username}`;
},
onLimited: (req, res) => {
res.status(429).json({
error: 'Account temporarily locked',
message: 'Too many failed login attempts. Please wait 15 minutes.',
unlockAt: new Date(Date.now() + 900_000).toISOString()
});
}
});
// API limiter: per-user token bucket
const apiLimiter = createRateLimiter({
redis,
name: 'api',
algorithm: 'token-bucket',
capacity: 500,
refillRate: 8.33, // 500/minute
keyExtractor: (req) => {
// Use authenticated user ID if available, fall back to IP
const userId = (req as any).user?.id;
return userId ? `user:${userId}` : `ip:${req.socket.remoteAddress}`;
},
skip: (req) => {
// Don't rate limit health checks
return req.path === '/health';
}
});
// AI/expensive operation limiter: very strict
const aiLimiter = createRateLimiter({
redis,
name: 'ai',
algorithm: 'sliding-window',
windowSize: 3600, // 1 hour
maxRequests: 20, // 20 AI calls per hour per user
keyExtractor: (req) => `user:${(req as any).user?.id || 'anon'}`
});
// Apply limiters
app.use(globalLimiter);
app.post('/auth/login', authLimiter, async (req, res) => {
// Login logic here
});
app.use('/api', apiLimiter);
app.post('/api/ai/generate', aiLimiter, async (req, res) => {
// Expensive AI operation
});
app.listen(3000, () => console.log('Server running on port 3000'));
Part 6: Tiered Rate Limits
Real APIs have different limits for different plan tiers:
// src/middleware/tieredRateLimiter.ts
interface PlanLimits {
capacity: number;
refillRate: number;
}
const PLAN_LIMITS: Record<string, PlanLimits> = {
free: { capacity: 100, refillRate: 1.67 }, // 100/min
starter: { capacity: 1000, refillRate: 16.67 }, // 1000/min
pro: { capacity: 10000, refillRate: 166.7 }, // 10k/min
enterprise: { capacity: 100000, refillRate: 1667 } // 100k/min
};
export function createTieredRateLimiter(redis: Redis) {
// Create a bucket per plan
const limiters = new Map<string, RedisTokenBucket>();
for (const [plan, limits] of Object.entries(PLAN_LIMITS)) {
const bucket = new RedisTokenBucket(redis, {
...limits,
keyPrefix: `ratelimit:${plan}`
});
bucket.initialize().catch(console.error);
limiters.set(plan, bucket);
}
return async (req: Request, res: Response, next: NextFunction) => {
const user = (req as any).user;
const plan = user?.plan || 'free';
const limiter = limiters.get(plan) || limiters.get('free')!;
const result = await limiter.consume(`user:${user?.id || req.ip}`);
res.setHeader('X-RateLimit-Plan', plan);
res.setHeader('X-RateLimit-Limit', PLAN_LIMITS[plan].capacity);
res.setHeader('X-RateLimit-Remaining', result.remaining);
if (!result.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
plan,
upgradeUrl: 'https://myapp.com/pricing'
});
}
next();
};
}
Part 7: Monitoring Rate Limits
Rate limiting generates valuable signals. Track them:
// src/middleware/rateLimitMonitor.ts
export function wrapWithMonitoring(limiter: RateLimiterMiddleware) {
const metrics = {
totalRequests: 0,
blockedRequests: 0,
blocksByKey: new Map<string, number>()
};
return async (req: Request, res: Response, next: NextFunction) => {
metrics.totalRequests++;
const originalJson = res.json.bind(res);
res.json = (body: any) => {
if (res.statusCode === 429) {
metrics.blockedRequests++;
const key = req.ip || 'unknown';
metrics.blocksByKey.set(key, (metrics.blocksByKey.get(key) || 0) + 1);
// Alert on suspicious patterns (same IP blocked 100+ times)
const blockCount = metrics.blocksByKey.get(key)!;
if (blockCount > 100 && blockCount % 100 === 0) {
console.warn(`Possible attack: ${key} has been blocked ${blockCount} times`);
// Could trigger IP ban, alert Slack, etc.
}
}
return originalJson(body);
};
return limiter(req, res, next);
};
}
// Expose metrics endpoint
app.get('/internal/rate-limit-stats', (req, res) => {
const topBlockedIPs = Array.from(metrics.blocksByKey.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, 10);
res.json({
totalRequests: metrics.totalRequests,
blockedRequests: metrics.blockedRequests,
blockRate: metrics.blockedRequests / metrics.totalRequests,
topBlockedIPs
});
});
Testing the Rate Limiter
// tests/rateLimiter.test.ts
import { createClient } from 'redis';
import { TokenBucket } from '../src/algorithms/tokenBucket';
describe('TokenBucket', () => {
let bucket: TokenBucket;
beforeEach(() => {
bucket = new TokenBucket({ capacity: 10, refillRate: 2 });
});
it('should allow requests up to capacity', () => {
for (let i = 0; i < 10; i++) {
const result = bucket.consume('test-user');
expect(result.allowed).toBe(true);
}
});
it('should block requests over capacity', () => {
for (let i = 0; i < 10; i++) bucket.consume('test-user');
const result = bucket.consume('test-user');
expect(result.allowed).toBe(false);
expect(result.retryAfter).toBeGreaterThan(0);
});
it('should refill tokens over time', async () => {
// Exhaust tokens
for (let i = 0; i < 10; i++) bucket.consume('test-user');
// Wait for refill (mock or real time)
await new Promise(resolve => setTimeout(resolve, 1000));
const result = bucket.consume('test-user');
expect(result.allowed).toBe(true);
// After 1s at 2 tokens/sec, should have ~2 tokens
expect(result.remaining).toBeCloseTo(1, 0);
});
it('should track different users independently', () => {
for (let i = 0; i < 10; i++) bucket.consume('user-a');
// user-b should not be affected
const result = bucket.consume('user-b');
expect(result.allowed).toBe(true);
});
});
Common Pitfalls
1. Not accounting for clock skew in distributed systems
Redis uses server time, not your application server time. This is generally fine since Redis is authoritative, but be aware of potential clock drift issues.
2. Forgetting to fail open
If Redis is down, should you block all requests? Almost certainly not. Fail open (allow requests) and log the error. The alternative — blocking all traffic when Redis restarts — is much worse.
3. Rate limiting by IP behind a load balancer
req.socket.remoteAddress gives you the load balancer's IP, not the user's. Always use X-Forwarded-For (and trust only your own load balancer's value).
4. Not setting proper TTLs
Without TTLs on Redis keys, your rate limit data grows forever. Every Lua script above sets appropriate TTLs.
Conclusion
A production rate limiter is more than a simple counter. The token bucket algorithm handles bursting gracefully; Redis atomicity ensures correctness under concurrent load; the sliding window gives precision for sensitive endpoints.
The middleware pattern we built is:
- Algorithm-agnostic — swap implementations without changing your routes
- Fail-safe — Redis failures don't bring down your API
- Observable — rate limit events generate actionable metrics
- Flexible — per-plan, per-endpoint, per-user limits with custom key extraction
Start with the in-memory version for local development, add Redis for staging and production. Layer in monitoring early — the block patterns you see will tell you a lot about how your API is being used (and abused).
Wilson Xu is a backend engineer who builds distributed systems and developer tools. He writes about Node.js, Redis, and API design.
Top comments (0)