DEV Community

Ganesh Parella
Ganesh Parella

Posted on

Designing a Rate Limiter to prevent spamming

Imagine you are building a social website or any large-scale system. Suddenly, a million requests flood your system from a single IP address. Your servers slow down or even crash.

How do we prevent this?

The answer is: Rate Limiting.

In simple terms, a rate limiter restricts the number of requests a user (or IP address) can send within a given time window.

Let’s design one.

Functional Requirements

  • Limit the number of requests per user ID or IP address
  • Return an error (e.g., HTTP 429) when the limit is exceeded

Non-Functional Requirements

  • Low latency while checking the limit (e.g., <10ms)
  • High availability (Availability > Consistency)
  • Scalable for millions of users

System Function / Endpoint
boolean isAvailable(userId, request)

  • If true → forward request to backend
  • If false → return 429 (Too Many Requests)

Choosing the Right Algorithm

When we think about limiting requests over time, a natural idea is:

Limit the number of requests in a fixed time window.
But there’s a problem.
Suppose we allow 100 requests per second.
If a user sends:
100 requests at the end of second 1
100 requests at the beginning of second 2
That’s 200 requests within ~1 second.
This is known as the Fixed Window problem.
We don’t want that.

Sliding Window
To solve this, we can use a sliding window approach.
The idea:
At any given time window, the number of requests must not exceed the limit.
This is more accurate but requires storing timestamps of requests.
Implementation might use:

  • Sorted sets
  • Heaps / priority queues

However, memory usage increases with traffic.

Token Bucket (Preferred Approach)

  • Think of tokens as balls in a bucket.
  • Each request consumes one token.
  • Tokens are refilled at a fixed rate.
  • If no tokens are available → reject the request.

Example:

  • Bucket size = 100
  • Refill rate = 100 per minute
  • If a user sends 100 requests instantly, they must wait until tokens are refilled.

Benefits:

  • Allows burst traffic (up to bucket capacity)
  • Smoothens traffic over time
  • Flexible and production-friendly
  • Token Bucket is widely used in real systems.

High-Level Architecture

High-Level Architecture
In this design:

  • The Rate Limiter logic is placed before the backend API.
  • Load balancer distributes traffic across multiple app servers.
  • A shared Redis store keeps token bucket state.

This ensures:

  • Distributed rate limiting
  • No single point of failure
  • Low latency checks

Bottlenecks
Redis Bottleneck
If millions of users hit the system simultaneously, Redis may become the bottleneck.
To scale:

  • Use Redis clustering
  • Shard keys across multiple Redis nodes
  • Use consistent hashing for distribution If each Redis instance stores 100k users and we need to support 1 million users: We need around 10 Redis nodes.

2. Concurrency Problem

What if:

  • A user has only 1 token left
  • Two requests hit Redis at the same time from different servers?
  • Redis solves this using atomic operations.

Using:

  • Lua scripts
  • Atomic commands like INCR
  • Or transactions

This prevents race conditions.

3. Latency Considerations

To reduce latency:

  • Keep Redis close to application servers (same region)
  • Use cluster topology
  • Avoid cross-region calls for rate limit checks
  • Geographical distance directly impacts response time.

Final Thought

  • Functional requirements define what the system does.
  • Non-functional requirements define how well it performs at scale.

Rate limiting may look simple, but designing it correctly in distributed systems requires careful thought.

See you in the next post 🚀

Top comments (0)