DEV Community

Cover image for Python Rate Limiter That Tunes Itself — Here's Why That Matters
himanshu patel
himanshu patel

Posted on

Python Rate Limiter That Tunes Itself — Here's Why That Matters

Rate limiting is one of those things every API needs, but most developers only think about it after something goes wrong.

A bot hammers your endpoint. A user accidentally fires 1000 requests in a loop. Your server slows to a crawl. Sound familiar?

I've been there. So I built smart-ratelimiter — a Python library with six rate limiting algorithms, including one that automatically adapts to your traffic. Let me walk you through what that means and why it matters.


First, What Even Is Rate Limiting?

Rate limiting means saying: "You're only allowed X requests in Y seconds."

Once the limit is hit, the server responds with HTTP 429 — "Too Many Requests" — and tells the client to slow down.

Simple concept. But the how matters a lot.


The Problem With Simple Rate Limiters

Most libraries give you a Fixed Window — divide time into buckets, count requests per bucket, reject when full.

|--- 60s window ---|--- 60s window ---|
|  100 requests   |  100 requests   |
Enter fullscreen mode Exit fullscreen mode

It works. But it has a nasty flaw called the boundary burst problem.

A clever client can send 100 requests just before the window ends, then another 100 immediately after it resets — effectively making 200 requests in just a few seconds while technically "following the rules."

...98, 99, 100 | 1, 2, 3... 100
               ^
           window reset — attacker fires 200 req in seconds
Enter fullscreen mode Exit fullscreen mode

Not great.


Six Algorithms, Each With a Purpose

Different situations call for different approaches. Here's how to think about them:

Algorithm Best For
Fixed Window Simple use cases, low-traffic endpoints
Sliding Window Log When you need exact counts, no boundary burst
Sliding Window Counter High traffic, need accuracy but low memory
Token Bucket APIs that need to allow short bursts
Leaky Bucket Protecting downstream services from any spike
Adaptive Hybrid Multi-tenant APIs with unpredictable traffic

All of them work the same way in code:

result = limiter.is_allowed("user:42")
if result.allowed:
    # handle request
else:
    # return 429, retry after result.retry_after seconds
Enter fullscreen mode Exit fullscreen mode

The Interesting One: Adaptive Hybrid

This is the one I'm most proud of.

Most rate limiters are static — you set a limit and it never changes. But real traffic isn't static. A public API might be quiet at 3am and slammed at noon. Why apply the same strictness at both times?

The Adaptive Hybrid combines three layers:

Layer 1 — Sliding Window: prevents the boundary burst exploit we talked about earlier.

Layer 2 — Token Bucket: allows short legitimate bursts (like a user opening your app and firing a few quick requests).

Layer 3 — Load Sensor: watches your overall traffic rate. When traffic is high, it automatically tightens the burst allowance. When traffic is low, it relaxes it. No manual tuning required.

from ratelimiter import AdaptiveRateLimiter, MemoryBackend

limiter = AdaptiveRateLimiter(
    backend=MemoryBackend(),
    limit=100,            # hard ceiling: 100 req per 60s
    window=60.0,
    burst_multiplier=3,   # allow up to 300 burst when quiet
    high_load_threshold=0.8,  # tighten above 80% of base rate
    penalty=0.5,          # cut burst by 50% under high load
)

result = limiter.is_allowed("user:42")
print(result.metadata)
# {'tokens': 299.0, 'effective_burst': 300, 'layer': 'token_bucket'}
Enter fullscreen mode Exit fullscreen mode

Under low traffic → user gets 300 burst tokens. Under high traffic → automatically drops to 150. Your API stays protected without you touching a config file.


Using It in Your App

Install it — zero required dependencies:

pip install smart-ratelimiter
Enter fullscreen mode Exit fullscreen mode

As a decorator:

from ratelimiter import TokenBucketRateLimiter, MemoryBackend, rate_limit

limiter = TokenBucketRateLimiter(MemoryBackend(), limit=10, window=1)

@rate_limit(limiter, key_func=lambda user_id, **_: f"user:{user_id}")
def send_email(user_id: int) -> None:
    ...  # max 10 emails/sec per user
Enter fullscreen mode Exit fullscreen mode

As Flask middleware:

from ratelimiter.middleware import RateLimitMiddleware

app.wsgi_app = RateLimitMiddleware(app.wsgi_app, limiter=limiter)
Enter fullscreen mode Exit fullscreen mode

As FastAPI middleware:

from ratelimiter.middleware import AsyncRateLimitMiddleware

app.add_middleware(AsyncRateLimitMiddleware, limiter=limiter)
Enter fullscreen mode Exit fullscreen mode

Every is_allowed() call returns rate limit headers ready to attach to your HTTP response:

response.headers.update(result.headers)
# X-RateLimit-Limit: 100
# X-RateLimit-Remaining: 42
# Retry-After: 12.50
Enter fullscreen mode Exit fullscreen mode

Swap Backends Without Changing Your Logic

Need to share rate limit state across multiple servers? Just swap the backend:

# Development
from ratelimiter import MemoryBackend

# Production (distributed)
import redis
from ratelimiter.backends.redis_backend import RedisBackend
client = redis.Redis(host="localhost", decode_responses=True)
backend = RedisBackend(client=client)

# Same limiter code, different backend
limiter = AdaptiveRateLimiter(backend, limit=100, window=60)
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

Rate limiting doesn't have to be a fixed, dumb counter. With the right algorithm your API can:

  • Allow legitimate bursts without getting abused
  • Automatically tighten up under attack or heavy load
  • Relax when things are quiet so users have a good experience

GitHub: https://github.com/himanshu9209/ratelimiter
PyPI: pip install smart-ratelimiter

Would love feedback — especially if you've hit edge cases with rate limiting in your own projects. Drop a comment below!

Top comments (0)