DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Rate Limiting Your API: Algorithms, Tradeoffs, and Implementation

Why Rate Limiting Matters

Without rate limiting, a single misbehaving client can:

  • Exhaust your database connection pool
  • Burn through your OpenAI credits in minutes
  • Make your service unavailable for everyone else

Rate limiting is infrastructure, not an afterthought.

The Algorithms

1. Fixed Window

Count requests in fixed time buckets (e.g., 100 requests per minute).

const requests = new Map<string, { count: number; resetAt: number }>();

function isRateLimited(clientId: string, limit: number, windowMs: number): boolean {
  const now = Date.now();
  const window = requests.get(clientId);

  if (!window || now > window.resetAt) {
    requests.set(clientId, { count: 1, resetAt: now + windowMs });
    return false;
  }

  if (window.count >= limit) return true;

  window.count++;
  return false;
}
Enter fullscreen mode Exit fullscreen mode

Problem: A client can make 100 requests at 11:59 and 100 more at 12:00—200 requests in 2 seconds.

2. Sliding Window

Count requests in a rolling window, not a fixed bucket.

const timestamps = new Map<string, number[]>();

function isRateLimited(clientId: string, limit: number, windowMs: number): boolean {
  const now = Date.now();
  const cutoff = now - windowMs;

  const clientTimestamps = timestamps.get(clientId) ?? [];
  const recent = clientTimestamps.filter(t => t > cutoff);

  if (recent.length >= limit) return true;

  recent.push(now);
  timestamps.set(clientId, recent);
  return false;
}
Enter fullscreen mode Exit fullscreen mode

Better: No burst at window boundaries. Worse: Memory grows with request volume.

3. Token Bucket

Clients accumulate tokens over time. Each request consumes one token.

interface Bucket {
  tokens: number;
  lastRefill: number;
}

const buckets = new Map<string, Bucket>();

function isRateLimited(
  clientId: string,
  capacity: number,      // max tokens
  refillRate: number,    // tokens per second
): boolean {
  const now = Date.now() / 1000;
  let bucket = buckets.get(clientId);

  if (!bucket) {
    bucket = { tokens: capacity, lastRefill: now };
  }

  // Refill based on elapsed time
  const elapsed = now - bucket.lastRefill;
  bucket.tokens = Math.min(capacity, bucket.tokens + elapsed * refillRate);
  bucket.lastRefill = now;

  if (bucket.tokens < 1) return true; // rate limited

  bucket.tokens--;
  buckets.set(clientId, bucket);
  return false;
}
Enter fullscreen mode Exit fullscreen mode

Best for: APIs with bursty legitimate traffic. Allows short bursts up to capacity, sustains refillRate long-term.

Production: Redis-Backed Rate Limiting

In-memory doesn't work across multiple server instances. Use Redis:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, '1 m'),
  analytics: true,
  prefix: '@myapp/ratelimit',
});

// In your API handler
export async function POST(request: Request) {
  const ip = request.headers.get('x-forwarded-for') ?? '127.0.0.1';
  const { success, limit, remaining, reset } = await ratelimit.limit(ip);

  if (!success) {
    return new Response('Too Many Requests', {
      status: 429,
      headers: {
        'X-RateLimit-Limit': limit.toString(),
        'X-RateLimit-Remaining': remaining.toString(),
        'X-RateLimit-Reset': new Date(reset).toISOString(),
        'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
      },
    });
  }

  return handleRequest(request);
}
Enter fullscreen mode Exit fullscreen mode

Express Middleware

import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100,
  standardHeaders: true,   // Return rate limit info in headers
  legacyHeaders: false,
  store: new RedisStore({
    client: redisClient,
  }),
  keyGenerator: (req) => {
    // Rate limit by API key if present, otherwise by IP
    return req.headers['x-api-key']?.toString() 
      ?? req.ip 
      ?? 'unknown';
  },
  handler: (req, res) => {
    res.status(429).json({
      error: 'Too many requests',
      retryAfter: res.getHeader('Retry-After'),
    });
  },
});

app.use('/api/', limiter);
Enter fullscreen mode Exit fullscreen mode

Tiered Rate Limits

Different users deserve different limits:

function getRateLimit(user: User): { requests: number; windowMs: number } {
  switch (user.plan) {
    case 'free':    return { requests: 100,  windowMs: 60_000 };
    case 'pro':     return { requests: 1000, windowMs: 60_000 };
    case 'enterprise': return { requests: 10000, windowMs: 60_000 };
    default:        return { requests: 50,   windowMs: 60_000 };
  }
}

// Per-endpoint limits
const aiLimiter = rateLimit({
  max: (req) => req.user?.plan === 'enterprise' ? 1000 : 10,
  windowMs: 60_000,
  message: 'AI endpoint rate limit exceeded. Upgrade for higher limits.',
});

app.post('/api/ai/generate', authenticate, aiLimiter, generateHandler);
Enter fullscreen mode Exit fullscreen mode

What to Rate Limit

Endpoint Limit Window
Public API 100/IP 15 min
Auth (login) 5/IP 15 min
Password reset 3/email 1 hour
AI generation 10/user 1 min
File upload 20/user 1 hour

Login endpoints especially—brute force protection is non-negotiable.

Rate limiting is one of those things that feels optional until the moment it isn't.


Rate limiting and auth built in from day one: Whoff Agents AI SaaS Starter Kit includes Redis-backed rate limiting pre-configured.

Top comments (0)