Rate limiting is one of those things every API needs, but most developers only think about it after something goes wrong.
A bot hammers your endpoint. A user accidentally fires 1000 requests in a loop. Your server slows to a crawl. Sound familiar?
I've been there. So I built smart-ratelimiter — a Python library with six rate limiting algorithms, including one that automatically adapts to your traffic. Let me walk you through what that means and why it matters.
First, What Even Is Rate Limiting?
Rate limiting means saying: "You're only allowed X requests in Y seconds."
Once the limit is hit, the server responds with HTTP 429 — "Too Many Requests" — and tells the client to slow down.
Simple concept. But the how matters a lot.
The Problem With Simple Rate Limiters
Most libraries give you a Fixed Window — divide time into buckets, count requests per bucket, reject when full.
|--- 60s window ---|--- 60s window ---|
| 100 requests | 100 requests |
It works. But it has a nasty flaw called the boundary burst problem.
A clever client can send 100 requests just before the window ends, then another 100 immediately after it resets — effectively making 200 requests in just a few seconds while technically "following the rules."
...98, 99, 100 | 1, 2, 3... 100
^
window reset — attacker fires 200 req in seconds
Not great.
Six Algorithms, Each With a Purpose
Different situations call for different approaches. Here's how to think about them:
| Algorithm | Best For |
|---|---|
| Fixed Window | Simple use cases, low-traffic endpoints |
| Sliding Window Log | When you need exact counts, no boundary burst |
| Sliding Window Counter | High traffic, need accuracy but low memory |
| Token Bucket | APIs that need to allow short bursts |
| Leaky Bucket | Protecting downstream services from any spike |
| Adaptive Hybrid | Multi-tenant APIs with unpredictable traffic |
All of them work the same way in code:
result = limiter.is_allowed("user:42")
if result.allowed:
# handle request
else:
# return 429, retry after result.retry_after seconds
The Interesting One: Adaptive Hybrid
This is the one I'm most proud of.
Most rate limiters are static — you set a limit and it never changes. But real traffic isn't static. A public API might be quiet at 3am and slammed at noon. Why apply the same strictness at both times?
The Adaptive Hybrid combines three layers:
Layer 1 — Sliding Window: prevents the boundary burst exploit we talked about earlier.
Layer 2 — Token Bucket: allows short legitimate bursts (like a user opening your app and firing a few quick requests).
Layer 3 — Load Sensor: watches your overall traffic rate. When traffic is high, it automatically tightens the burst allowance. When traffic is low, it relaxes it. No manual tuning required.
from ratelimiter import AdaptiveRateLimiter, MemoryBackend
limiter = AdaptiveRateLimiter(
backend=MemoryBackend(),
limit=100, # hard ceiling: 100 req per 60s
window=60.0,
burst_multiplier=3, # allow up to 300 burst when quiet
high_load_threshold=0.8, # tighten above 80% of base rate
penalty=0.5, # cut burst by 50% under high load
)
result = limiter.is_allowed("user:42")
print(result.metadata)
# {'tokens': 299.0, 'effective_burst': 300, 'layer': 'token_bucket'}
Under low traffic → user gets 300 burst tokens. Under high traffic → automatically drops to 150. Your API stays protected without you touching a config file.
Using It in Your App
Install it — zero required dependencies:
pip install smart-ratelimiter
As a decorator:
from ratelimiter import TokenBucketRateLimiter, MemoryBackend, rate_limit
limiter = TokenBucketRateLimiter(MemoryBackend(), limit=10, window=1)
@rate_limit(limiter, key_func=lambda user_id, **_: f"user:{user_id}")
def send_email(user_id: int) -> None:
... # max 10 emails/sec per user
As Flask middleware:
from ratelimiter.middleware import RateLimitMiddleware
app.wsgi_app = RateLimitMiddleware(app.wsgi_app, limiter=limiter)
As FastAPI middleware:
from ratelimiter.middleware import AsyncRateLimitMiddleware
app.add_middleware(AsyncRateLimitMiddleware, limiter=limiter)
Every is_allowed() call returns rate limit headers ready to attach to your HTTP response:
response.headers.update(result.headers)
# X-RateLimit-Limit: 100
# X-RateLimit-Remaining: 42
# Retry-After: 12.50
Swap Backends Without Changing Your Logic
Need to share rate limit state across multiple servers? Just swap the backend:
# Development
from ratelimiter import MemoryBackend
# Production (distributed)
import redis
from ratelimiter.backends.redis_backend import RedisBackend
client = redis.Redis(host="localhost", decode_responses=True)
backend = RedisBackend(client=client)
# Same limiter code, different backend
limiter = AdaptiveRateLimiter(backend, limit=100, window=60)
Wrapping Up
Rate limiting doesn't have to be a fixed, dumb counter. With the right algorithm your API can:
- Allow legitimate bursts without getting abused
- Automatically tighten up under attack or heavy load
- Relax when things are quiet so users have a good experience
GitHub: https://github.com/himanshu9209/ratelimiter
PyPI: pip install smart-ratelimiter
Would love feedback — especially if you've hit edge cases with rate limiting in your own projects. Drop a comment below!
Top comments (0)