himanshu patel

Posted on Apr 7

Python Rate Limiter That Tunes Itself — Here's Why That Matters

#python #webdev #api #opensource

Rate limiting is one of those things every API needs, but most developers only think about it after something goes wrong.

A bot hammers your endpoint. A user accidentally fires 1000 requests in a loop. Your server slows to a crawl. Sound familiar?

I've been there. So I built smart-ratelimiter — a Python library with six rate limiting algorithms, including one that automatically adapts to your traffic. Let me walk you through what that means and why it matters.

First, What Even Is Rate Limiting?

Rate limiting means saying: "You're only allowed X requests in Y seconds."

Once the limit is hit, the server responds with HTTP 429 — "Too Many Requests" — and tells the client to slow down.

Simple concept. But the how matters a lot.

The Problem With Simple Rate Limiters

Most libraries give you a Fixed Window — divide time into buckets, count requests per bucket, reject when full.

|--- 60s window ---|--- 60s window ---|
|  100 requests   |  100 requests   |

It works. But it has a nasty flaw called the boundary burst problem.

A clever client can send 100 requests just before the window ends, then another 100 immediately after it resets — effectively making 200 requests in just a few seconds while technically "following the rules."

...98, 99, 100 | 1, 2, 3... 100
               ^
           window reset — attacker fires 200 req in seconds

Not great.

Six Algorithms, Each With a Purpose

Different situations call for different approaches. Here's how to think about them:

Algorithm	Best For
Fixed Window	Simple use cases, low-traffic endpoints
Sliding Window Log	When you need exact counts, no boundary burst
Sliding Window Counter	High traffic, need accuracy but low memory
Token Bucket	APIs that need to allow short bursts
Leaky Bucket	Protecting downstream services from any spike
Adaptive Hybrid	Multi-tenant APIs with unpredictable traffic

All of them work the same way in code:

result = limiter.is_allowed("user:42")
if result.allowed:
    # handle request
else:
    # return 429, retry after result.retry_after seconds

The Interesting One: Adaptive Hybrid

This is the one I'm most proud of.

Most rate limiters are static — you set a limit and it never changes. But real traffic isn't static. A public API might be quiet at 3am and slammed at noon. Why apply the same strictness at both times?

The Adaptive Hybrid combines three layers:

Layer 1 — Sliding Window: prevents the boundary burst exploit we talked about earlier.

Layer 2 — Token Bucket: allows short legitimate bursts (like a user opening your app and firing a few quick requests).

Layer 3 — Load Sensor: watches your overall traffic rate. When traffic is high, it automatically tightens the burst allowance. When traffic is low, it relaxes it. No manual tuning required.

from ratelimiter import AdaptiveRateLimiter, MemoryBackend

limiter = AdaptiveRateLimiter(
    backend=MemoryBackend(),
    limit=100,            # hard ceiling: 100 req per 60s
    window=60.0,
    burst_multiplier=3,   # allow up to 300 burst when quiet
    high_load_threshold=0.8,  # tighten above 80% of base rate
    penalty=0.5,          # cut burst by 50% under high load
)

result = limiter.is_allowed("user:42")
print(result.metadata)
# {'tokens': 299.0, 'effective_burst': 300, 'layer': 'token_bucket'}

Under low traffic → user gets 300 burst tokens. Under high traffic → automatically drops to 150. Your API stays protected without you touching a config file.

Using It in Your App

Install it — zero required dependencies:

pip install smart-ratelimiter

As a decorator:

from ratelimiter import TokenBucketRateLimiter, MemoryBackend, rate_limit

limiter = TokenBucketRateLimiter(MemoryBackend(), limit=10, window=1)

@rate_limit(limiter, key_func=lambda user_id, **_: f"user:{user_id}")
def send_email(user_id: int) -> None:
    ...  # max 10 emails/sec per user

As Flask middleware:

from ratelimiter.middleware import RateLimitMiddleware

app.wsgi_app = RateLimitMiddleware(app.wsgi_app, limiter=limiter)

As FastAPI middleware:

from ratelimiter.middleware import AsyncRateLimitMiddleware

app.add_middleware(AsyncRateLimitMiddleware, limiter=limiter)

Every is_allowed() call returns rate limit headers ready to attach to your HTTP response:

response.headers.update(result.headers)
# X-RateLimit-Limit: 100
# X-RateLimit-Remaining: 42
# Retry-After: 12.50

Swap Backends Without Changing Your Logic

Need to share rate limit state across multiple servers? Just swap the backend:

# Development
from ratelimiter import MemoryBackend

# Production (distributed)
import redis
from ratelimiter.backends.redis_backend import RedisBackend
client = redis.Redis(host="localhost", decode_responses=True)
backend = RedisBackend(client=client)

# Same limiter code, different backend
limiter = AdaptiveRateLimiter(backend, limit=100, window=60)

Wrapping Up

Rate limiting doesn't have to be a fixed, dumb counter. With the right algorithm your API can:

Allow legitimate bursts without getting abused
Automatically tighten up under attack or heavy load
Relax when things are quiet so users have a good experience

GitHub: https://github.com/himanshu9209/ratelimiter
PyPI: pip install smart-ratelimiter

Would love feedback — especially if you've hit edge cases with rate limiting in your own projects. Drop a comment below!

DEV Community