R.A. Olanrewaju

Posted on Mar 8

Building a Distributed Rate Limiter for FastAPI with Redis (Sliding Window Algorithm)

#python #api #redis #opensource

Building a Distributed Rate Limiter for FastAPI with Redis

Every API eventually runs into the same problem.

A bot, scraper, or even a buggy client suddenly starts sending thousands of requests per second. When that happens, your server slows down, your database struggles, and real users start seeing errors.

This is exactly the kind of situation rate limiting is meant to prevent.

Recently I built RateGuard, a small Python library that adds distributed rate limiting to FastAPI using Redis. In this post I want to walk through how it works and the design decisions behind it.

Why I Built RateGuard

While working with FastAPI, I looked at a few rate limiting libraries. Most of them had at least one issue.

Some only support in-memory limits, which means they break once your API runs on multiple servers.

Others work in distributed setups but require more infrastructure than I wanted.

So I decided to build something simple with a few goals in mind:

easy to plug into FastAPI
works across multiple servers
accurate under heavy traffic
simple enough to understand and maintain

That became RateGuard.

What Is Rate Limiting?

Rate limiting controls how many requests a user can send to an API within a certain time period.

For example:

a user can send 10 requests per minute
after the limit is reached they receive a 429 Too Many Requests response
after the time window passes the limit resets

A simple analogy is a coffee shop rule:

One free coffee per customer per hour.

The barista keeps track of who got a coffee and when. If you come back too soon, you have to wait.

Why Not Just Use a Counter?

A very common approach is to use a simple counter that resets every minute.

The problem is that this method can be abused.

Imagine your limit is 10 requests per minute and the counter resets at exactly 12:00:00.

A user could do this:

send 10 requests at 11:59:55
the counter resets at 12:00:00
send another 10 requests at 12:00:05

That ends up being 20 requests in about 10 seconds, even though the limit is supposed to be 10 per minute.

This issue is called the fixed window problem.

The Sliding Window Approach

To avoid this problem, RateGuard uses the sliding window algorithm.

Instead of resetting at fixed times, it always looks back a certain number of seconds from the current request.

The logic looks like this:

a request arrives at time T
check all requests between T - window and T
remove anything older than the window
count the remaining requests
if the count is below the limit, allow the request
otherwise return a 429 response

Going back to the coffee shop example, instead of resetting every hour on the clock, the barista asks:

Did this person get a coffee in the last 60 minutes?

The time window moves forward with every request.

Basic Architecture

RateGuard sits between incoming requests and your FastAPI application.

Client Request
      |
      v
FastAPI Server
      |
      v
RateGuard Middleware
      |
      v
Redis Sorted Set
      |
      v
Allow or Block Request

Redis stores request timestamps so every server in the system can see them.

Why Redis?

Redis is a good fit for rate limiting for two main reasons.

Speed

Rate limiting runs on every request, so it has to be fast. Redis is an in-memory data store and can handle a huge number of operations per second.

Shared state

If your API runs on several servers, each one needs to know how many requests have already been made. Redis works as a shared store that all servers can read from and write to.

Using Redis Sorted Sets

RateGuard stores request data inside a Redis Sorted Set.

A sorted set stores values with a score. The score determines the order.

In this case:

the score is the request timestamp
the value is a unique request ID

Example data:

Key: ratelimit:192.168.1.1

1709856060000 -> req_abc123
1709856080000 -> req_def456
1709856100000 -> req_xyz789

For each request, RateGuard:

removes entries older than the time window
counts the remaining entries
decides whether to allow or block the request
records the new request

This approach works well even when multiple servers are handling traffic.

Installing RateGuard

pip install rate-guardian

You will also need a Redis instance. I used Upstash Redis, which has a generous free tier.

Quick Example

import os
from fastapi import FastAPI, Request
from rateguard import RateGuard, RateLimitMiddleware, rate_limit

app = FastAPI()

limiter = RateGuard(
    redis_url=os.environ["UPSTASH_REDIS_REST_URL"],
    redis_token=os.environ["UPSTASH_REDIS_REST_TOKEN"],
)

app.add_middleware(
    RateLimitMiddleware,
    limiter=limiter,
    limit=10,
    window=60
)

@app.get("/")
async def home():
    return {"message": "API is protected by RateGuard"}

After adding the middleware, every endpoint is automatically protected.

Per Route Limits

Sometimes you want stricter limits for certain endpoints.

For example, a search endpoint that queries a database.

@app.get("/search")
@rate_limit(limiter, limit=5, window=60)
async def search(request: Request, q: str = ""):
    return {"query": q, "results": []}

Now /search only allows 5 requests per minute.

Response Headers

RateGuard includes useful headers in responses.

Header	Description
X-RateLimit-Limit	Maximum allowed requests
X-RateLimit-Remaining	Requests left
X-RateLimit-Reset	Seconds until reset
Retry-After	Only present on 429 responses

These help clients know when to slow down.

What Happens If Redis Fails?

One design decision I made was to fail open.

If Redis is temporarily unavailable, requests are allowed instead of blocked.

For most APIs, blocking all traffic because Redis is down would be worse than briefly running without rate limiting.

Core Logic Example

Here is a simplified version of the main logic:

import time
import uuid

def is_allowed(self, key: str, limit: int, window: int):
    now = int(time.time() * 1000)
    oldest = now - (window * 1000)

    pipe = self.redis.pipeline()
    pipe.zremrangebyscore(key, 0, oldest)
    pipe.zcard(key)
    pipe.zadd(key, {str(uuid.uuid4()): now})
    pipe.expire(key, window)

    results = pipe.exec()

    count = results[1]
    allowed = count < limit

    return allowed

Using a pipeline groups the Redis operations together and avoids race conditions.

What's Next

A few improvements I plan to add:

support for standard Redis deployments
rate limiting by user ID
an optional token bucket algorithm
better metrics and monitoring

Try It Out

pip install rate-guardian

GitHub: https://github.com/Jpeg-create/rate-guard

PyPI: https://pypi.org/project/rate-guardian/

If you find it useful, a ⭐ on GitHub means a lot.

DEV Community