DEV Community

Cover image for Rate Limiting in Distributed Systems: Algorithms Every Backend Developer Should Know
Dhawal Kumar Singh
Dhawal Kumar Singh

Posted on

Rate Limiting in Distributed Systems: Algorithms Every Backend Developer Should Know

Rate limiting is one of those things you don't think about until your API gets hammered at 3 AM.

Whether it's a DDoS attack, a buggy client sending 10,000 requests per second, or a partner integration gone wrong — rate limiting is your first line of defense.

Why Rate Limiting Matters

Without rate limiting:

  • A single client can starve others of resources
  • Your database gets overwhelmed during traffic spikes
  • You can't enforce fair usage across API consumers
  • Cost of serving requests spirals out of control

The 4 Core Algorithms

1. Token Bucket

The most widely used algorithm. Think of it as a bucket that fills with tokens at a steady rate. Each request removes a token. No tokens? Request denied.

Why it works: Allows short bursts while enforcing long-term rate. Used by AWS, Stripe, and most major APIs.

2. Leaky Bucket

Requests enter a queue (the bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped.

Why it works: Provides a perfectly smooth output rate. Great for downstream services that can't handle bursts.

3. Fixed Window Counter

Divide time into fixed windows (e.g., 1 minute). Count requests per window. Reset at the boundary.

Why it works: Dead simple to implement. But has the "boundary problem" — 2x the allowed rate can hit at window edges.

4. Sliding Window Log/Counter

Tracks the exact timestamp of each request (log) or uses a weighted combination of current and previous windows (counter).

Why it works: Most accurate. No boundary problem. But uses more memory.

Quick Comparison

Algorithm Burst Handling Memory Accuracy
Token Bucket ✅ Allows bursts Low Good
Leaky Bucket ❌ Smooths all Low Good
Fixed Window ⚠️ Boundary issue Very Low Fair
Sliding Window ✅ Accurate Higher Best

Where to Rate Limit

  1. API Gateway — First line of defense (per-client limits)
  2. Application layer — Business logic limits (e.g., 5 password attempts)
  3. Database layer — Connection pooling and query limits

What Happens When Limits Are Hit?

Return 429 Too Many Requests with these headers:

  • X-RateLimit-Limit — Max requests allowed
  • X-RateLimit-Remaining — Requests left in window
  • Retry-After — Seconds until the client can retry

Deep Dive

I wrote a comprehensive guide covering distributed rate limiting, Redis implementations, and real-world patterns used by companies like Stripe and Cloudflare:

👉 Rate Limiting: The Complete Guide


This is part of my system design series at SWE Helper — free tools, guides, and interview prep for software engineers.

Top comments (0)