Ganesh Parella

Posted on Feb 18

Designing a Rate Limiter to prevent spamming

#architecture #backend #systemdesign #tutorial

Imagine you are building a social website or any large-scale system. Suddenly, a million requests flood your system from a single IP address. Your servers slow down or even crash.

How do we prevent this?

The answer is: Rate Limiting.

In simple terms, a rate limiter restricts the number of requests a user (or IP address) can send within a given time window.

Let’s design one.

Functional Requirements

Limit the number of requests per user ID or IP address
Return an error (e.g., HTTP 429) when the limit is exceeded

Non-Functional Requirements

Low latency while checking the limit (e.g., <10ms)
High availability (Availability > Consistency)
Scalable for millions of users

System Function / Endpoint
boolean isAvailable(userId, request)

If true → forward request to backend
If false → return 429 (Too Many Requests)

Choosing the Right Algorithm

When we think about limiting requests over time, a natural idea is:

Limit the number of requests in a fixed time window.
But there’s a problem.
Suppose we allow 100 requests per second.
If a user sends:
100 requests at the end of second 1
100 requests at the beginning of second 2
That’s 200 requests within ~1 second.
This is known as the Fixed Window problem.
We don’t want that.

Sliding Window
To solve this, we can use a sliding window approach.
The idea:
At any given time window, the number of requests must not exceed the limit.
This is more accurate but requires storing timestamps of requests.
Implementation might use:

Sorted sets
Heaps / priority queues

However, memory usage increases with traffic.

Token Bucket (Preferred Approach)

Think of tokens as balls in a bucket.
Each request consumes one token.
Tokens are refilled at a fixed rate.
If no tokens are available → reject the request.

Example:

Bucket size = 100
Refill rate = 100 per minute
If a user sends 100 requests instantly, they must wait until tokens are refilled.

Benefits:

Allows burst traffic (up to bucket capacity)
Smoothens traffic over time
Flexible and production-friendly
Token Bucket is widely used in real systems.

High-Level Architecture

In this design:

The Rate Limiter logic is placed before the backend API.
Load balancer distributes traffic across multiple app servers.
A shared Redis store keeps token bucket state.

This ensures:

Distributed rate limiting
No single point of failure
Low latency checks

Bottlenecks
Redis Bottleneck
If millions of users hit the system simultaneously, Redis may become the bottleneck.
To scale:

Use Redis clustering
Shard keys across multiple Redis nodes
Use consistent hashing for distribution If each Redis instance stores 100k users and we need to support 1 million users: We need around 10 Redis nodes.

2. Concurrency Problem

What if:

A user has only 1 token left
Two requests hit Redis at the same time from different servers?
Redis solves this using atomic operations.

Using:

Lua scripts
Atomic commands like INCR
Or transactions

This prevents race conditions.

3. Latency Considerations

To reduce latency:

Keep Redis close to application servers (same region)
Use cluster topology
Avoid cross-region calls for rate limit checks
Geographical distance directly impacts response time.

Final Thought

Functional requirements define what the system does.
Non-functional requirements define how well it performs at scale.

Rate limiting may look simple, but designing it correctly in distributed systems requires careful thought.

See you in the next post 🚀

DEV Community

Designing a Rate Limiter to prevent spamming

Top comments (0)