DEV Community

Cover image for Building a Distributed Rate Limiter for FastAPI with Redis (Sliding Window Algorithm)
Ridwan Olanrewaju Azeez
Ridwan Olanrewaju Azeez

Posted on

Building a Distributed Rate Limiter for FastAPI with Redis (Sliding Window Algorithm)

Building a Distributed Rate Limiter for FastAPI with Redis

Every API eventually runs into the same problem.

A bot, scraper, or even a buggy client suddenly starts sending thousands of requests per second. When that happens, your server slows down, your database struggles, and real users start seeing errors.

This is exactly the kind of situation rate limiting is meant to prevent.

Recently I built RateGuard, a small Python library that adds distributed rate limiting to FastAPI using Redis. In this post I want to walk through how it works and the design decisions behind it.


Why I Built RateGuard

While working with FastAPI, I looked at a few rate limiting libraries. Most of them had at least one issue.

Some only support in-memory limits, which means they break once your API runs on multiple servers.

Others work in distributed setups but require more infrastructure than I wanted.

So I decided to build something simple with a few goals in mind:

  • easy to plug into FastAPI
  • works across multiple servers
  • accurate under heavy traffic
  • simple enough to understand and maintain

That became RateGuard.


What Is Rate Limiting?

Rate limiting controls how many requests a user can send to an API within a certain time period.

For example:

  • a user can send 10 requests per minute
  • after the limit is reached they receive a 429 Too Many Requests response
  • after the time window passes the limit resets

A simple analogy is a coffee shop rule:

One free coffee per customer per hour.

The barista keeps track of who got a coffee and when. If you come back too soon, you have to wait.


Why Not Just Use a Counter?

A very common approach is to use a simple counter that resets every minute.

The problem is that this method can be abused.

Imagine your limit is 10 requests per minute and the counter resets at exactly 12:00:00.

A user could do this:

  • send 10 requests at 11:59:55
  • the counter resets at 12:00:00
  • send another 10 requests at 12:00:05

That ends up being 20 requests in about 10 seconds, even though the limit is supposed to be 10 per minute.

This issue is called the fixed window problem.


The Sliding Window Approach

To avoid this problem, RateGuard uses the sliding window algorithm.

Instead of resetting at fixed times, it always looks back a certain number of seconds from the current request.

The logic looks like this:

  1. a request arrives at time T
  2. check all requests between T - window and T
  3. remove anything older than the window
  4. count the remaining requests
  5. if the count is below the limit, allow the request
  6. otherwise return a 429 response

Going back to the coffee shop example, instead of resetting every hour on the clock, the barista asks:

Did this person get a coffee in the last 60 minutes?

The time window moves forward with every request.


Basic Architecture

RateGuard sits between incoming requests and your FastAPI application.

Client Request
      |
      v
FastAPI Server
      |
      v
RateGuard Middleware
      |
      v
Redis Sorted Set
      |
      v
Allow or Block Request
Enter fullscreen mode Exit fullscreen mode

Redis stores request timestamps so every server in the system can see them.


Why Redis?

Redis is a good fit for rate limiting for two main reasons.

Speed

Rate limiting runs on every request, so it has to be fast. Redis is an in-memory data store and can handle a huge number of operations per second.

Shared state

If your API runs on several servers, each one needs to know how many requests have already been made. Redis works as a shared store that all servers can read from and write to.


Using Redis Sorted Sets

RateGuard stores request data inside a Redis Sorted Set.

A sorted set stores values with a score. The score determines the order.

In this case:

  • the score is the request timestamp
  • the value is a unique request ID

Example data:

Key: ratelimit:192.168.1.1

1709856060000 -> req_abc123
1709856080000 -> req_def456
1709856100000 -> req_xyz789
Enter fullscreen mode Exit fullscreen mode

For each request, RateGuard:

  1. removes entries older than the time window
  2. counts the remaining entries
  3. decides whether to allow or block the request
  4. records the new request

This approach works well even when multiple servers are handling traffic.


Installing RateGuard

pip install rate-guardian
Enter fullscreen mode Exit fullscreen mode

You will also need a Redis instance. I used Upstash Redis, which has a generous free tier.


Quick Example

import os
from fastapi import FastAPI, Request
from rateguard import RateGuard, RateLimitMiddleware, rate_limit

app = FastAPI()

limiter = RateGuard(
    redis_url=os.environ["UPSTASH_REDIS_REST_URL"],
    redis_token=os.environ["UPSTASH_REDIS_REST_TOKEN"],
)

app.add_middleware(
    RateLimitMiddleware,
    limiter=limiter,
    limit=10,
    window=60
)

@app.get("/")
async def home():
    return {"message": "API is protected by RateGuard"}
Enter fullscreen mode Exit fullscreen mode

After adding the middleware, every endpoint is automatically protected.


Per Route Limits

Sometimes you want stricter limits for certain endpoints.

For example, a search endpoint that queries a database.

@app.get("/search")
@rate_limit(limiter, limit=5, window=60)
async def search(request: Request, q: str = ""):
    return {"query": q, "results": []}
Enter fullscreen mode Exit fullscreen mode

Now /search only allows 5 requests per minute.


Response Headers

RateGuard includes useful headers in responses.

Header Description
X-RateLimit-Limit Maximum allowed requests
X-RateLimit-Remaining Requests left
X-RateLimit-Reset Seconds until reset
Retry-After Only present on 429 responses

These help clients know when to slow down.


What Happens If Redis Fails?

One design decision I made was to fail open.

If Redis is temporarily unavailable, requests are allowed instead of blocked.

For most APIs, blocking all traffic because Redis is down would be worse than briefly running without rate limiting.


Core Logic Example

Here is a simplified version of the main logic:

import time
import uuid

def is_allowed(self, key: str, limit: int, window: int):
    now = int(time.time() * 1000)
    oldest = now - (window * 1000)

    pipe = self.redis.pipeline()
    pipe.zremrangebyscore(key, 0, oldest)
    pipe.zcard(key)
    pipe.zadd(key, {str(uuid.uuid4()): now})
    pipe.expire(key, window)

    results = pipe.exec()

    count = results[1]
    allowed = count < limit

    return allowed
Enter fullscreen mode Exit fullscreen mode

Using a pipeline groups the Redis operations together and avoids race conditions.


What's Next

A few improvements I plan to add:

  • support for standard Redis deployments
  • rate limiting by user ID
  • an optional token bucket algorithm
  • better metrics and monitoring

Try It Out

pip install rate-guardian
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/Jpeg-create/rate-guard

PyPI: https://pypi.org/project/rate-guardian/

If you find it useful, a ⭐ on GitHub means a lot.

Top comments (0)