Why Rate Limiting Matters
Without rate limiting, a single misbehaving client can:
- Exhaust your database connection pool
- Burn through your OpenAI credits in minutes
- Make your service unavailable for everyone else
Rate limiting is infrastructure, not an afterthought.
The Algorithms
1. Fixed Window
Count requests in fixed time buckets (e.g., 100 requests per minute).
const requests = new Map<string, { count: number; resetAt: number }>();
function isRateLimited(clientId: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const window = requests.get(clientId);
if (!window || now > window.resetAt) {
requests.set(clientId, { count: 1, resetAt: now + windowMs });
return false;
}
if (window.count >= limit) return true;
window.count++;
return false;
}
Problem: A client can make 100 requests at 11:59 and 100 more at 12:00—200 requests in 2 seconds.
2. Sliding Window
Count requests in a rolling window, not a fixed bucket.
const timestamps = new Map<string, number[]>();
function isRateLimited(clientId: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const cutoff = now - windowMs;
const clientTimestamps = timestamps.get(clientId) ?? [];
const recent = clientTimestamps.filter(t => t > cutoff);
if (recent.length >= limit) return true;
recent.push(now);
timestamps.set(clientId, recent);
return false;
}
Better: No burst at window boundaries. Worse: Memory grows with request volume.
3. Token Bucket
Clients accumulate tokens over time. Each request consumes one token.
interface Bucket {
tokens: number;
lastRefill: number;
}
const buckets = new Map<string, Bucket>();
function isRateLimited(
clientId: string,
capacity: number, // max tokens
refillRate: number, // tokens per second
): boolean {
const now = Date.now() / 1000;
let bucket = buckets.get(clientId);
if (!bucket) {
bucket = { tokens: capacity, lastRefill: now };
}
// Refill based on elapsed time
const elapsed = now - bucket.lastRefill;
bucket.tokens = Math.min(capacity, bucket.tokens + elapsed * refillRate);
bucket.lastRefill = now;
if (bucket.tokens < 1) return true; // rate limited
bucket.tokens--;
buckets.set(clientId, bucket);
return false;
}
Best for: APIs with bursty legitimate traffic. Allows short bursts up to capacity, sustains refillRate long-term.
Production: Redis-Backed Rate Limiting
In-memory doesn't work across multiple server instances. Use Redis:
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(100, '1 m'),
analytics: true,
prefix: '@myapp/ratelimit',
});
// In your API handler
export async function POST(request: Request) {
const ip = request.headers.get('x-forwarded-for') ?? '127.0.0.1';
const { success, limit, remaining, reset } = await ratelimit.limit(ip);
if (!success) {
return new Response('Too Many Requests', {
status: 429,
headers: {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': new Date(reset).toISOString(),
'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
},
});
}
return handleRequest(request);
}
Express Middleware
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100,
standardHeaders: true, // Return rate limit info in headers
legacyHeaders: false,
store: new RedisStore({
client: redisClient,
}),
keyGenerator: (req) => {
// Rate limit by API key if present, otherwise by IP
return req.headers['x-api-key']?.toString()
?? req.ip
?? 'unknown';
},
handler: (req, res) => {
res.status(429).json({
error: 'Too many requests',
retryAfter: res.getHeader('Retry-After'),
});
},
});
app.use('/api/', limiter);
Tiered Rate Limits
Different users deserve different limits:
function getRateLimit(user: User): { requests: number; windowMs: number } {
switch (user.plan) {
case 'free': return { requests: 100, windowMs: 60_000 };
case 'pro': return { requests: 1000, windowMs: 60_000 };
case 'enterprise': return { requests: 10000, windowMs: 60_000 };
default: return { requests: 50, windowMs: 60_000 };
}
}
// Per-endpoint limits
const aiLimiter = rateLimit({
max: (req) => req.user?.plan === 'enterprise' ? 1000 : 10,
windowMs: 60_000,
message: 'AI endpoint rate limit exceeded. Upgrade for higher limits.',
});
app.post('/api/ai/generate', authenticate, aiLimiter, generateHandler);
What to Rate Limit
| Endpoint | Limit | Window |
|---|---|---|
| Public API | 100/IP | 15 min |
| Auth (login) | 5/IP | 15 min |
| Password reset | 3/email | 1 hour |
| AI generation | 10/user | 1 min |
| File upload | 20/user | 1 hour |
Login endpoints especially—brute force protection is non-negotiable.
Rate limiting is one of those things that feels optional until the moment it isn't.
Rate limiting and auth built in from day one: Whoff Agents AI SaaS Starter Kit includes Redis-backed rate limiting pre-configured.
Top comments (0)