Rate Limiting in Distributed Systems: Algorithms Every Backend Developer Should Know

#algorithms #api #backend #distributedsystems

Rate limiting is one of those things you don't think about until your API gets hammered at 3 AM.

Whether it's a DDoS attack, a buggy client sending 10,000 requests per second, or a partner integration gone wrong — rate limiting is your first line of defense.

Why Rate Limiting Matters

Without rate limiting:

A single client can starve others of resources
Your database gets overwhelmed during traffic spikes
You can't enforce fair usage across API consumers
Cost of serving requests spirals out of control

The 4 Core Algorithms

1. Token Bucket

The most widely used algorithm. Think of it as a bucket that fills with tokens at a steady rate. Each request removes a token. No tokens? Request denied.

Why it works: Allows short bursts while enforcing long-term rate. Used by AWS, Stripe, and most major APIs.

2. Leaky Bucket

Requests enter a queue (the bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped.

Why it works: Provides a perfectly smooth output rate. Great for downstream services that can't handle bursts.

3. Fixed Window Counter

Divide time into fixed windows (e.g., 1 minute). Count requests per window. Reset at the boundary.

Why it works: Dead simple to implement. But has the "boundary problem" — 2x the allowed rate can hit at window edges.

4. Sliding Window Log/Counter

Tracks the exact timestamp of each request (log) or uses a weighted combination of current and previous windows (counter).

Why it works: Most accurate. No boundary problem. But uses more memory.

Quick Comparison

Algorithm	Burst Handling	Memory	Accuracy
Token Bucket	✅ Allows bursts	Low	Good
Leaky Bucket	❌ Smooths all	Low	Good
Fixed Window	⚠️ Boundary issue	Very Low	Fair
Sliding Window	✅ Accurate	Higher	Best

Where to Rate Limit

API Gateway — First line of defense (per-client limits)
Application layer — Business logic limits (e.g., 5 password attempts)
Database layer — Connection pooling and query limits

What Happens When Limits Are Hit?

Return 429 Too Many Requests with these headers:

X-RateLimit-Limit — Max requests allowed
X-RateLimit-Remaining — Requests left in window
Retry-After — Seconds until the client can retry

Deep Dive

I wrote a comprehensive guide covering distributed rate limiting, Redis implementations, and real-world patterns used by companies like Stripe and Cloudflare:

👉 Rate Limiting: The Complete Guide

This is part of my system design series at SWE Helper — free tools, guides, and interview prep for software engineers.