JosephAkayesi

Posted on Mar 2

Design a Rate Limiter

#architecture #backend #security #systemdesign

Rate Limiting — System Design Deep Dive

A rate limiter is a piece of software that regulates how much traffic a client can send to a server within a given period of time.

At scale, rate limiting becomes a fundamental building block for building reliable and cost-efficient systems.

Why Do We Need Rate Limiting?

Rate limiters provide several important benefits:

✅ Prevent denial-of-service (DoS) attacks
✅ Promote fair usage of shared resources
✅ Protect backend services from overload
✅ Reduce infrastructure and operational costs

Real-World Examples

Rate limiting appears everywhere in modern applications:

Users can share up to 150 posts per day
Users can post 300 tweets within 2 hours
Users can make two withdrawal transactions within 15 seconds

In reality, rate limits depend entirely on your application's access patterns and business rules.

Where Can Rate Limiters Live?

Rate limiters can be deployed in different parts of the system:

1️⃣ Client-Side Rate Limiting

Client-side rate limiting happens within the application itself.

Pros

Reduces unnecessary requests early
Improves perceived responsiveness

Cons

Less secure
Clients can tamper with requests and bypass restrictions

2️⃣ Server-Side Rate Limiting

Server-side rate limiting enforces rules centrally.

Pros

Strong enforcement
Cannot be bypassed easily
Reliable tracking of usage

3️⃣ API Gateway / Middleware Layer

A very common approach is placing the rate limiter at the API gateway.

This allows all incoming traffic to be evaluated before reaching backend services.

Core Rate Limiting Algorithms

Most industry rate limiters are based on a few well-known algorithms.

Fixed Window Counter
Sliding Window Counter
Token Bucket
Leaky Bucket

Let’s walk through each one.

Fixed Window Counter

In a fixed window counter, clients can make a specific number of requests within a fixed time interval.

Example:

100 requests per minute

Problem: Burstiness

A client could send:

100 requests at 00:59
Another 100 requests at 01:00

Result: 200 requests within seconds, even though the limit is 100 per minute.

Sliding Window Counter

Instead of resetting counters at fixed intervals, the sliding window evaluates requests relative to the current time.

When a request arrives:

The system checks how many requests occurred during the previous time window.
If the limit is exceeded, the request is rejected.

This significantly reduces burst traffic compared to fixed windows.

Token Bucket

The token bucket is one of the most widely used rate limiting algorithms.

How it works

A bucket contains tokens.
Each token allows one request.
Tokens refill at a constant rate.
Requests consume tokens.
If no tokens remain → request is rejected.

Burst traffic is allowed as long as tokens are available.

More expensive operations can consume multiple tokens.

This makes token buckets ideal for high-traffic APIs.

Leaky Bucket

The leaky bucket processes requests at a constant rate.

Think of it as a FIFO queue:

Requests enter the queue
Requests are processed steadily
When the queue is full, new requests are dropped

This smooths traffic spikes and ensures consistent processing.

High-Level Architecture

A rate limiter typically acts as middleware between clients and servers.

Every incoming request is evaluated before reaching the API.

If a request exceeds limits, the server responds with:
HTTP 429 — Too Many Requests

Helpful Rate Limit Headers

Servers often return headers to help clients behave correctly:

Header	Meaning
`X-RateLimit-Remaining`	Remaining allowed requests
`X-RateLimit-Limit`	Maximum allowed requests
`X-RateLimit-Retry-After`	Seconds before retrying

Rule Configuration

Rate limiting rules define what is allowed.

Example:

Maximum 5 marketing messages per day
Maximum 5 login attempts per minute

Rules are typically:

Stored on disk or configuration services
Loaded into cache by workers
Evaluated in middleware during requests

Request Flow

Client sends request
Request reaches rate limiter middleware
Rules are loaded from cache
Counters and timestamps are checked
Request is either forwarded or throttled

Rate Limiting in Distributed Systems

Scaling introduces new challenges.

Race Conditions

Multiple concurrent requests may update counters simultaneously.

Example:

Limit = 3 requests/sec
Two threads read counter value = 2
Both allow requests → limit exceeded

Solutions:

Atomic operations
Redis sorted sets
Distributed locks (with performance tradeoffs)

Synchronization Problems

In distributed systems:

Requests may hit different servers
Replication lag causes stale counters
Limits become inconsistent

Sticky sessions can help but are usually avoided due to operational complexity.

Centralized Rate Limiting (Global Cache)

A common solution is using a centralized datastore like Redis.

All nodes read and update shared counters.

Tradeoffs:

Potential single point of failure
Increased latency for global users

Performance Optimization

A better large-scale solution is a multi–data center architecture.

Deploy rate limiter nodes close to users
Maintain regional counters
Synchronize data using eventual consistency

Benefits:

Reduced latency
Improved user experience
Better global scalability

Monitoring and Observability

After deployment, monitoring is critical.

Track:

Rate limit hit frequency
False positives
Traffic patterns
Algorithm effectiveness
User impact

Rate limiting is not a “set and forget” system — it requires continuous tuning.

Final Thoughts

Rate limiting is more than just protecting APIs from abuse. It is a core reliability mechanism that:

stabilizes systems under load,
ensures fairness,
and controls operational costs.

Choosing the right algorithm and architecture depends heavily on your traffic patterns, scale, and consistency requirements.

Design it carefully — because at scale, rate limiting becomes part of your system’s resilience strategy.

DEV Community

Design a Rate Limiter

Rate Limiting — System Design Deep Dive

Why Do We Need Rate Limiting?

Real-World Examples

Where Can Rate Limiters Live?

1️⃣ Client-Side Rate Limiting

2️⃣ Server-Side Rate Limiting

3️⃣ API Gateway / Middleware Layer

Core Rate Limiting Algorithms

Fixed Window Counter

Problem: Burstiness

Sliding Window Counter

Token Bucket

How it works

Leaky Bucket

High-Level Architecture

Helpful Rate Limit Headers

Rule Configuration

Request Flow

Rate Limiting in Distributed Systems

Race Conditions

Synchronization Problems

Centralized Rate Limiting (Global Cache)

Performance Optimization

Monitoring and Observability

Final Thoughts

Top comments (0)