Every production API hits the same inflection point: traffic grows, abuse appears, and suddenly you need to answer the question "how many requests should I allow, and for whom?" Rate limiting sounds simple until you run multiple servers, need sub-second accuracy, and have endpoints with wildly different costs.
This is the third installment in the Production Backend Patterns series. We will walk through four major rate limiting algorithms, implement each in TypeScript with Redis, and then tackle the hard parts: distributed coordination, burst handling, cost-based limits, and the headers your clients actually need.
The Four Algorithms, Visualized
Before writing any code, let's build intuition for how each algorithm behaves. Imagine a limit of 10 requests per minute.
Fixed Window
Minute 1 Minute 2 Minute 3
[|||||||| ] [||||||||||] [||| ]
8 allowed 10 (full) 3 so far
^ boundary: counter resets
The window is aligned to clock boundaries (e.g., 12:00:00 - 12:00:59). A counter increments per request and resets at the boundary. The flaw is obvious: a user can send 10 requests at 12:00:59 and another 10 at 12:01:00 -- 20 requests in two seconds while the limit is "10 per minute."
Sliding Window Log
Timeline: --[--r---r--r-----r--r---r--]-->
^
window slides with each request
only requests within the trailing 60s count
Every request timestamp is stored. On each new request, you count how many timestamps fall within the last 60 seconds. Accurate, but storing every timestamp is memory-expensive at scale.
Sliding Window Counter (Hybrid)
Previous window weight: 40% Current window weight: 60%
[ 7 reqs ] [ 4 reqs so far ]
^-- 36s into 60s window
Estimated count = 7 * 0.40 + 4 = 6.8 --> under limit of 10
This blends the previous and current fixed window counts using a weighted average based on how far into the current window you are. Near-zero memory overhead and surprisingly accurate.
Token Bucket
Bucket capacity: 10 tokens
Refill rate: 10 tokens / minute
[@@@@@@@@@@] --> full bucket (10 tokens)
[@@@@@@ ] --> 4 requests consumed 4 tokens
[@@@@@@@@@ ] --> tokens refilled over time
[ ] --> burst of 10 exhausts bucket
--> must wait for refill
Tokens accumulate at a steady rate up to a maximum capacity. Each request consumes one (or more) tokens. This naturally allows bursts up to the bucket size while enforcing a long-term average rate.
Leaky Bucket
Requests pour in at variable rate:
||| | |||||| | ||
v v v v v v v v v v v v v
[ queue / bucket ]
| | | | | |
v v v v v v
Processed at fixed rate
Requests enter a queue that drains at a constant rate. If the queue is full, new requests are rejected. This produces the smoothest output rate but adds latency because requests wait in the queue.
Implementing Each Algorithm in TypeScript with Redis
All implementations share a common interface:
interface RateLimitResult {
allowed: boolean;
limit: number;
remaining: number;
retryAfter?: number; // seconds until next request is allowed
resetAt?: number; // unix timestamp when the window resets
}
interface RateLimiter {
consume(key: string, cost?: number): Promise<RateLimitResult>;
}
We use ioredis throughout:
import Redis from "ioredis";
const redis = new Redis({ host: "127.0.0.1", port: 6379 });
Fixed Window
class FixedWindowLimiter implements RateLimiter {
constructor(
private redis: Redis,
private limit: number,
private windowSec: number
) {}
async consume(key: string, cost = 1): Promise<RateLimitResult> {
const now = Math.floor(Date.now() / 1000);
const window = Math.floor(now / this.windowSec) * this.windowSec;
const redisKey = `rl:fw:${key}:${window}`;
const count = await this.redis
.multi()
.incrby(redisKey, cost)
.expire(redisKey, this.windowSec)
.exec();
const current = count![0][1] as number;
const resetAt = window + this.windowSec;
return {
allowed: current <= this.limit,
limit: this.limit,
remaining: Math.max(0, this.limit - current),
resetAt,
retryAfter: current > this.limit ? resetAt - now : undefined,
};
}
}
Sliding Window Counter
This is the production workhorse. It approximates a sliding window using two fixed windows and a weighted average, requiring only two Redis keys and no stored timestamps.
class SlidingWindowLimiter implements RateLimiter {
constructor(
private redis: Redis,
private limit: number,
private windowSec: number
) {}
async consume(key: string, cost = 1): Promise<RateLimitResult> {
const now = Math.floor(Date.now() / 1000);
const currentWindow = Math.floor(now / this.windowSec) * this.windowSec;
const previousWindow = currentWindow - this.windowSec;
const elapsed = now - currentWindow;
const weight = (this.windowSec - elapsed) / this.windowSec;
const prevKey = `rl:sw:${key}:${previousWindow}`;
const currKey = `rl:sw:${key}:${currentWindow}`;
const [prevCount, currCount] = await this.redis
.mget(prevKey, currKey)
.then((r) => r.map((v) => parseInt(v ?? "0", 10)));
const estimated = Math.floor(prevCount * weight) + currCount;
if (estimated + cost > this.limit) {
return {
allowed: false,
limit: this.limit,
remaining: 0,
retryAfter: this.windowSec - elapsed,
resetAt: currentWindow + this.windowSec,
};
}
await this.redis
.multi()
.incrby(currKey, cost)
.expire(currKey, this.windowSec * 2)
.exec();
return {
allowed: true,
limit: this.limit,
remaining: Math.max(0, this.limit - estimated - cost),
resetAt: currentWindow + this.windowSec,
};
}
}
Token Bucket
The token bucket is ideal when you want to allow bursts. We store two values in a Redis hash: the token count and the last refill timestamp. A Lua script makes the check-and-update atomic.
class TokenBucketLimiter implements RateLimiter {
private script: string;
constructor(
private redis: Redis,
private capacity: number,
private refillRate: number, // tokens per second
private windowSec: number
) {
this.script = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1])
local last_refill = tonumber(data[2])
if tokens == nil then
tokens = capacity
last_refill = now
end
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + elapsed * refill_rate)
last_refill = now
local allowed = 0
local retry_after = 0
if tokens >= cost then
tokens = tokens - cost
allowed = 1
else
retry_after = math.ceil((cost - tokens) / refill_rate)
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
redis.call('EXPIRE', key, ttl)
return {allowed, math.floor(tokens), retry_after}
`;
}
async consume(key: string, cost = 1): Promise<RateLimitResult> {
const now = Date.now() / 1000;
const redisKey = `rl:tb:${key}`;
const result = (await this.redis.eval(
this.script, 1, redisKey,
this.capacity, this.refillRate, now, cost, this.windowSec * 2
)) as number[];
return {
allowed: result[0] === 1,
limit: this.capacity,
remaining: result[1],
retryAfter: result[2] > 0 ? result[2] : undefined,
};
}
}
Leaky Bucket
The leaky bucket can be modeled as a counter that drains at a fixed rate. We use the same Lua-script-in-Redis approach.
class LeakyBucketLimiter implements RateLimiter {
private script: string;
constructor(
private redis: Redis,
private capacity: number,
private drainRate: number, // requests drained per second
private windowSec: number
) {
this.script = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local drain_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
local data = redis.call('HMGET', key, 'level', 'last_drain')
local level = tonumber(data[1]) or 0
local last_drain = tonumber(data[2]) or now
local elapsed = math.max(0, now - last_drain)
level = math.max(0, level - elapsed * drain_rate)
last_drain = now
local allowed = 0
local retry_after = 0
if level + cost <= capacity then
level = level + cost
allowed = 1
else
retry_after = math.ceil((level + cost - capacity) / drain_rate)
end
redis.call('HMSET', key, 'level', level, 'last_drain', last_drain)
redis.call('EXPIRE', key, ttl)
return {allowed, math.floor(capacity - level), retry_after}
`;
}
async consume(key: string, cost = 1): Promise<RateLimitResult> {
const now = Date.now() / 1000;
const redisKey = `rl:lb:${key}`;
const result = (await this.redis.eval(
this.script, 1, redisKey,
this.capacity, this.drainRate, now, cost, this.windowSec * 2
)) as number[];
return {
allowed: result[0] === 1,
limit: this.capacity,
remaining: Math.max(0, result[1]),
retryAfter: result[2] > 0 ? result[2] : undefined,
};
}
}
Choosing Your Rate Limit Key
The "key" in rate limiting determines who is being limited. Different strategies serve different purposes:
type KeyExtractor = (req: Request) => string;
const keyStrategies: Record<string, KeyExtractor> = {
// For public endpoints: limit by IP
ip: (req) => {
return req.headers.get("x-forwarded-for")?.split(",")[0].trim()
?? req.headers.get("cf-connecting-ip")
?? "unknown";
},
// For authenticated endpoints: limit by user ID
user: (req) => {
const userId = (req as any).auth?.userId;
if (!userId) throw new Error("No user ID — use IP strategy instead");
return `user:${userId}`;
},
// For third-party integrations: limit by API key
apiKey: (req) => {
const key = req.headers.get("x-api-key")
?? req.headers.get("authorization")?.replace("Bearer ", "");
if (!key) throw new Error("No API key provided");
return `key:${key.slice(-12)}`; // use suffix to avoid storing full key
},
// Composite: user + endpoint for fine-grained control
userEndpoint: (req) => {
const userId = (req as any).auth?.userId ?? "anon";
const path = new URL(req.url).pathname;
return `${userId}:${path}`;
},
};
In practice, you often layer multiple strategies. Public endpoints get IP-based limits. Authenticated endpoints get per-user limits that are more generous. Expensive endpoints (search, export, AI inference) get their own tighter limits stacked on top.
Rate Limit Headers
Communicating limits to clients is not optional -- it is an API contract. The emerging standard (RFC 9110 and the draft RateLimit header field RFC) uses these headers:
function setRateLimitHeaders(
res: Response,
result: RateLimitResult,
policy: string = "default"
): void {
const headers = res.headers;
// Standard headers (draft RFC)
headers.set("RateLimit-Limit", String(result.limit));
headers.set("RateLimit-Remaining", String(result.remaining));
if (result.resetAt) {
headers.set("RateLimit-Reset", String(result.resetAt));
}
headers.set("RateLimit-Policy", `${result.limit};w=${policy}`);
// Legacy X-prefixed headers (still widely expected)
headers.set("X-RateLimit-Limit", String(result.limit));
headers.set("X-RateLimit-Remaining", String(result.remaining));
if (result.resetAt) {
headers.set("X-RateLimit-Reset", String(result.resetAt));
}
// Retry-After is standard (RFC 9110)
if (!result.allowed && result.retryAfter) {
headers.set("Retry-After", String(result.retryAfter));
}
}
A well-behaved 429 response looks like:
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 1711036800
Retry-After: 23
Content-Type: application/json
{"error": "rate_limit_exceeded", "retryAfter": 23}
Handling Bursts
Token bucket inherently handles bursts (the bucket capacity is the burst size). But what if you want to allow occasional bursts on top of a sliding window limiter? Add a separate burst allowance:
class BurstAwareLimiter implements RateLimiter {
private sustained: SlidingWindowLimiter;
private burst: TokenBucketLimiter;
constructor(redis: Redis, config: {
sustainedLimit: number; // e.g., 100 per minute
sustainedWindow: number; // e.g., 60
burstCapacity: number; // e.g., 20 extra
burstRefillRate: number; // e.g., 0.33 tokens/sec (20 per minute)
}) {
this.sustained = new SlidingWindowLimiter(
redis, config.sustainedLimit, config.sustainedWindow
);
this.burst = new TokenBucketLimiter(
redis, config.burstCapacity, config.burstRefillRate, config.sustainedWindow
);
}
async consume(key: string, cost = 1): Promise<RateLimitResult> {
const sustainedResult = await this.sustained.consume(key, cost);
if (sustainedResult.allowed) return sustainedResult;
// Sustained limit hit — try burst allowance
const burstResult = await this.burst.consume(`burst:${key}`, cost);
if (burstResult.allowed) {
return {
...burstResult,
limit: burstResult.limit,
remaining: burstResult.remaining,
};
}
return sustainedResult; // both exhausted
}
}
Distributed Rate Limiting Across Multiple Servers
Using Redis as the backing store already gives you distributed rate limiting for free -- all servers share the same state. But there are edge cases to handle.
Redis Unavailability
If Redis is down, you have two choices: fail open (allow all requests) or fail closed (reject all). Most production systems fail open with a local fallback:
class ResilientLimiter implements RateLimiter {
private localCounts = new Map<string, { count: number; resetAt: number }>();
constructor(
private primary: RateLimiter,
private limit: number,
private windowSec: number
) {}
async consume(key: string, cost = 1): Promise<RateLimitResult> {
try {
return await this.primary.consume(key, cost);
} catch (err) {
// Redis is down — fall back to local in-memory counter
return this.localConsume(key, cost);
}
}
private localConsume(key: string, cost: number): RateLimitResult {
const now = Math.floor(Date.now() / 1000);
let entry = this.localCounts.get(key);
if (!entry || now >= entry.resetAt) {
entry = {
count: 0,
resetAt: now + this.windowSec,
};
this.localCounts.set(key, entry);
}
entry.count += cost;
// Use a more conservative limit per-node
const localLimit = Math.max(1, Math.floor(this.limit / 4));
return {
allowed: entry.count <= localLimit,
limit: localLimit,
remaining: Math.max(0, localLimit - entry.count),
resetAt: entry.resetAt,
};
}
}
The local fallback uses limit / 4 (assuming roughly 4 servers) to avoid a sudden 4x spike when Redis recovers.
Near-Simultaneous Requests
Lua scripts in Redis execute atomically, so two requests arriving at the exact same millisecond on different servers will be serialized by Redis. This is why the Lua script approach matters -- MULTI/EXEC pipelines are atomic but the read-then-write is not. For the sliding window counter (which does a read in TypeScript then a write), wrap the entire operation in a Lua script in production:
const SLIDING_WINDOW_LUA = `
local curr_key = KEYS[1]
local prev_key = KEYS[2]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local weight = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local prev = tonumber(redis.call('GET', prev_key) or "0")
local curr = tonumber(redis.call('GET', curr_key) or "0")
local estimated = math.floor(prev * weight) + curr
if estimated + cost > limit then
return {0, 0, math.floor(estimated)}
end
local new_count = redis.call('INCRBY', curr_key, cost)
redis.call('EXPIRE', curr_key, window * 2)
return {1, limit - math.floor(prev * weight) - new_count, math.floor(prev * weight) + new_count}
`;
Rate Limiting in API Gateways
In a gateway architecture, rate limiting should happen at the edge before requests reach your services. Here is a middleware pattern that composes multiple limiters:
type LimiterRule = {
name: string;
limiter: RateLimiter;
keyExtract: KeyExtractor;
matchPath?: RegExp;
};
function rateLimitMiddleware(rules: LimiterRule[]) {
return async (req: Request): Promise<Response | null> => {
const path = new URL(req.url).pathname;
for (const rule of rules) {
if (rule.matchPath && !rule.matchPath.test(path)) continue;
const key = rule.keyExtract(req);
const result = await rule.limiter.consume(key);
if (!result.allowed) {
const res = new Response(
JSON.stringify({
error: "rate_limit_exceeded",
limiter: rule.name,
retryAfter: result.retryAfter,
}),
{ status: 429, headers: { "Content-Type": "application/json" } }
);
setRateLimitHeaders(res, result);
return res;
}
}
return null; // all checks passed, proceed to handler
};
}
// Usage: compose multiple layers
const limiter = rateLimitMiddleware([
{
name: "global-ip",
limiter: new SlidingWindowLimiter(redis, 1000, 60),
keyExtract: keyStrategies.ip,
},
{
name: "auth-user",
limiter: new TokenBucketLimiter(redis, 200, 3.33, 60),
keyExtract: keyStrategies.user,
},
{
name: "expensive-endpoints",
limiter: new TokenBucketLimiter(redis, 10, 0.167, 60),
keyExtract: keyStrategies.userEndpoint,
matchPath: /^\/(search|export|ai)\//,
},
]);
Cost-Based Rate Limiting
Not all requests are equal. A GET /users/me is cheap. A POST /ai/generate that runs a large language model inference is expensive. Cost-based limiting assigns a weight to each request:
const endpointCosts: Record<string, number> = {
"GET:/api/users": 1,
"GET:/api/search": 5,
"POST:/api/export": 20,
"POST:/api/ai/generate": 50,
"POST:/api/bulk-import": 100,
};
function getRequestCost(req: Request): number {
const method = req.method;
const path = new URL(req.url).pathname;
// Check exact match first, then prefix match
const exactKey = `${method}:${path}`;
if (endpointCosts[exactKey]) return endpointCosts[exactKey];
for (const [pattern, cost] of Object.entries(endpointCosts)) {
const [m, p] = pattern.split(":");
if (method === m && path.startsWith(p)) return cost;
}
return 1; // default cost
}
// Apply in middleware
async function costAwareRateLimit(req: Request): Promise<RateLimitResult> {
const key = keyStrategies.user(req);
const cost = getRequestCost(req);
// User has 1000 "credits" per minute
const limiter = new TokenBucketLimiter(redis, 1000, 16.67, 60);
return limiter.consume(key, cost);
}
This means a user with 1000 credits per minute can make 1000 cheap reads, or 20 AI generation calls, or a mix. The token bucket is the natural choice here because its cost parameter maps directly to variable request weights.
You can take this further by returning the cost in the response headers so clients can plan their usage:
headers.set("X-RateLimit-Cost", String(cost));
headers.set("X-RateLimit-Remaining-Credits", String(result.remaining));
Putting It All Together
Here is a complete Express-style integration showing how the pieces compose in a real application:
import express from "express";
import Redis from "ioredis";
const app = express();
const redis = new Redis();
// Define tiered limits
const tiers: Record<string, { limit: number; burstCapacity: number }> = {
free: { limit: 100, burstCapacity: 10 },
pro: { limit: 1000, burstCapacity: 50 },
enterprise: { limit: 10000, burstCapacity: 200 },
};
app.use(async (req, res, next) => {
const tier = (req as any).auth?.tier ?? "free";
const config = tiers[tier];
const limiter = new TokenBucketLimiter(
redis,
config.limit,
config.limit / 60, // spread evenly across 60 seconds
120 // 2-minute TTL
);
const key = (req as any).auth?.userId
? `user:${(req as any).auth.userId}`
: `ip:${req.ip}`;
const cost = getRequestCost(req);
const result = await limiter.consume(key, cost);
// Always set headers, even when allowed
res.set("RateLimit-Limit", String(config.limit));
res.set("RateLimit-Remaining", String(result.remaining));
res.set("X-RateLimit-Cost", String(cost));
if (!result.allowed) {
res.set("Retry-After", String(result.retryAfter));
return res.status(429).json({
error: "rate_limit_exceeded",
tier,
retryAfter: result.retryAfter,
upgradeUrl: tier === "free" ? "/pricing" : undefined,
});
}
next();
});
Quick Reference: Which Algorithm to Use
| Scenario | Algorithm | Why |
|---|---|---|
| Simple API with low traffic | Fixed window | Easy to implement, good enough |
| General-purpose API rate limiting | Sliding window counter | Best accuracy-to-memory ratio |
| APIs that need burst tolerance | Token bucket | Burst capacity is a first-class parameter |
| Smoothing traffic to downstream services | Leaky bucket | Guarantees constant output rate |
| Cost-based / variable-weight limits | Token bucket | Natural cost parameter support |
| Strict compliance requirements | Sliding window log | Exact counts, no approximation |
Operational Checklist
Before shipping rate limiting to production, verify:
- Fail-open behavior -- your API still works when Redis is unreachable.
- Headers on every response -- not just 429s. Clients need to see remaining quota on successful requests.
-
Monitoring -- track
rate_limit_hitas a metric, broken down by key strategy and tier. A spike in 429s may indicate an attack or a misconfigured limit. - Differentiated limits -- at minimum, separate authenticated from unauthenticated traffic. Paying customers should never share a pool with anonymous scrapers.
- Documentation -- publish your limits. Undocumented rate limits cause frustration and support tickets.
- Gradual rollout -- start in logging-only mode (emit metrics but allow all requests), then enable enforcement after you understand your traffic patterns.
Rate limiting is one of those systems that appears trivial but touches authentication, infrastructure resilience, billing, and developer experience. Get the algorithm right, put it behind Redis with proper Lua atomicity, set the headers, and you have a system that protects your services and respects your users.
Next in the series: we will cover circuit breakers, bulkheads, and graceful degradation patterns for building backend services that stay up when their dependencies go down.
If this was useful, consider:
- Sponsoring on GitHub to support more open-source tools
- Buying me a coffee on Ko-fi
You Might Also Like
- API Rate Limiting with Redis: Token Bucket, Sliding Window, and Per-Client Limits
- API Rate Limiting with Redis: Token Bucket, Sliding Window, and Per-Client Limits (2026)
- Build a Custom API Gateway in Node.js: Routing, JWT Auth, and Rate Limits (2026)
Follow me for more production-ready backend content!
If this helped you, buy me a coffee on Ko-fi!
Top comments (0)