Real rate limiting, Bloom filters, credential stuffing detection, and the bugs that almost broke everything. Live demo included.
GitHub: macaulaypraise/api-gateway-with-abuse-detection
Live demo: api-gateway-with-abuse-detection.onrender.com/docs
As someone transitioning into backend engineering, I wanted to build something that went beyond tutorials. I didn't want a CRUD app. I wanted something that would teach me how real systems defend themselves — something I could point to in an interview and say: "I built this from scratch and I know exactly why every line exists."
That project became an API Gateway with Abuse Detection — a FastAPI service that sits in front of upstream backends and actively detects credential stuffing, scraping bots, and known-bad actors. Here's a technical breakdown of how it works, the decisions behind it, and the real bugs that nearly cost me my sanity.
What the System Does
Every request passes through a six-step middleware chain in this exact order:
1. RequestID → UUID trace ID attached to every request
2. Auth → JWT validation, client_id + role extracted
3. BloomFilter → O(1) bad IP + bad user-agent check
4. RateLimit → sliding window per authenticated client
5. AbuseDetector → graduated response (throttle/block)
6. ShadowMode → log would-be blocks before enforcement
Each middleware depends on the one before it. If the Bloom filter flags you, the rate limiter never runs. Fail fast, fail cheap.
The Core Components (And Why Each One Exists)
1. Sliding Window Rate Limiter
Fixed-window rate limiting has a well-known flaw: a client can send N requests at the end of window 1 and N more at the start of window 2 — that's 2N requests in 2 seconds while technically never violating the per-window rule.
The sliding window eliminates this. Every request gets timestamped and stored in a Redis sorted set. On each new request:
- Delete all entries older than the window
- Count what remains
- Allow or deny
The key word is atomic. If steps 1–3 aren't wrapped in a Lua script, a concurrent request can slip between the remove and the count, creating a race condition that lets clients exceed their limit.
-- Executed atomically on the Redis server
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now)
return 1 -- allowed
end
return 0 -- blocked
Production verification: 150 parallel requests against the live Render deployment confirmed the enforcer is exact:
100 × 200 OK ← exactly the rate limit
50 × 429 ← every request over the limit rejected
Prometheus confirmed rate_limit_rejections_total{client_id="demo"} 200.0 after two parallel test runs. The client_id label proves the JWT identity is tracked, not the IP address — a crucial distinction for shared NATs and corporate networks.
2. Two-Dimensional Auth Failure Tracking
Credential stuffing is tracked on two axes simultaneously:
-
By IP:
failed_auth:{ip}— one IP failing across many accounts -
By username:
failed_auth:{username}— many IPs targeting the same account
These are separate Redis keys with independent TTLs, configurable via environment variables:
AUTH_FAILURE_IP_THRESHOLD=10 # failures before IP soft-block
AUTH_FAILURE_USER_THRESHOLD=20 # failures before username soft-block
AUTH_FAILURE_WINDOW_SECONDS=300 # counter TTL
Keeping these counters independent means you can block a specific IP without penalizing every other IP targeting that same user, and flag a username as under attack without affecting unrelated clients.
3. Scraping Detection via Request Timing Entropy
Humans generate requests with high temporal variance. Bots generate requests with suspiciously regular inter-request timing.
For each client, I maintain a sliding window of the last N timestamps in a Redis sorted set and compute the standard deviation of the inter-arrival gaps. A standard deviation below SCRAPING_ENTROPY_THRESHOLD (default 0.5) triggers a bot flag.
The elegant part: this doesn't care about request volume. A sophisticated bot that rate-limits itself to human speeds will still be caught if it's too regular. This pairs with user-agent fingerprinting (the second Bloom filter) to create a multi-signal detection approach.
4. Dual Bloom Filters
Two in-memory Bloom filters, both synced from Redis every 60 seconds by a background worker:
-
known_bad_ips— screens every incoming IP at O(1) with no Redis round-trip -
abusive_agents— user-agent fingerprinting for known scraper signatures
Configuration:
BLOOM_FILTER_CAPACITY=1000000 # expected entries
BLOOM_FILTER_ERROR_RATE=0.001 # 0.1% false positive rate
At a 0.1% false positive rate across 1 million IPs, the filter requires roughly 1.1 MB of memory. The worst case is a legitimate IP being flagged — which shadow mode surfaces before enforcement is ever enabled.
Critical implementation detail: the filter must live on app.state.bloom and be shared across all requests. Per-request instantiation gives you a fresh empty filter on every call — zero enforcement, zero errors, 100% invisible failure. More on this in the bugs section.
5. Graduated Response System
Three states instead of a binary allow/block:
| State | Behavior |
|---|---|
ALLOWED |
Request passes through normally |
THROTTLED |
Response delayed via asyncio.sleep, served with Retry-After
|
SOFT_BLOCK |
Immediate 429 — Redis TTL, temporary, self-expiring |
This matters because going straight to hard block means a legitimate client that briefly triggered a rule is permanently punished. The graduated approach lets real users recover automatically while truly malicious clients face escalating consequences.
6. Shadow Mode — The Safety Net
Shadow mode is how you deploy new detection rules without blocking real users. When a request would trigger a rule, shadow mode logs the event to Redis with a 24-hour TTL instead of blocking. The request passes through normally.
What makes this interesting is the implementation: shadow mode is a runtime toggle, not a deploy-time config. It's controlled via a Redis key:
# Enable — observe but don't block
curl -X POST $BASE/admin/shadow-mode?enabled=true \
-H "Authorization: Bearer $ADMIN_TOKEN"
# Disable — start enforcing
curl -X POST $BASE/admin/shadow-mode?enabled=false \
-H "Authorization: Bearer $ADMIN_TOKEN"
The middleware reads config:shadow_mode_enabled from Redis on every request, falling back to the SHADOW_MODE_ENABLED environment variable if the key is absent. Toggle takes effect on the next request — no redeployment, no restart.
Database-Backed RBAC
The admin role system started as a simple ADMIN_USERNAMES environment variable. That approach has an obvious flaw: any user who registers with that exact username bypasses all admin checks.
The replacement: a UserRole enum (USER, ADMIN) stored in the users table, embedded in the JWT at login time.
# JWT payload at login
{"sub": username, "role": user.role}
The require_admin dependency reads the JWT role claim directly — no database query per request. To promote a user:
UPDATE users SET role = 'admin' WHERE username = 'target';
The user logs in again, receives a JWT with "role": "admin", and admin endpoints immediately become accessible. Their previous token expires in 30 minutes. No server restart required.
The Bugs That Actually Hurt
Bug 1: The Async Password Verification Trap
This one was subtle and genuinely dangerous. I had refactored verify_password to be an async function wrapping bcrypt's blocking checkpw in asyncio.to_thread() — which was correct. But I forgot to await it at the call site:
# 🚨 WRONG — coroutine object is always truthy
if verify_password(plain, hashed):
# This branch ALWAYS executes
...
# ✅ CORRECT
if await verify_password(plain, hashed):
...
A coroutine object that's never awaited evaluates as truthy. Every password check passed, regardless of input. All authentication was silently bypassed. The auth endpoint returned a valid JWT for any password entered against any account.
There were no exceptions, no warnings, no test failures if your tests weren't checking wrong-password rejection specifically. The fix is trivial once you find it — finding it is the hard part.
Bug 2: Bloom Filter Instantiated Per-Request
The block-ip admin route was creating a new BloomFilterService() inside the route handler, adding the IP to that instance, and returning. Meanwhile, the middleware's shared in-memory filter (on app.state.bloom) was never updated — until the 60-second background sync ran.
The result: a hard-blocked IP could make 60 more requests before the block took effect. The fix was making admin routes update request.app.state.bloom directly:
# 🚨 WRONG — local instance, never seen by middleware
bloom = BloomFilterService()
bloom.add(ip)
# ✅ CORRECT — updates the shared middleware instance immediately
request.app.state.bloom.add(ip)
Bug 3: Static Admin Username Bypassed by Registration
The original ADMIN_USERNAMES config approach had a security hole: if the env var was set to "admin", anyone could register with username admin and gain admin access. Replaced entirely with the database-backed UserRole enum. The setting and its associated property were deleted from config.py.
Bug 4: Duplicate Alembic Migration Head
Running make makemigration twice without migrating in between creates two heads in the Alembic migration graph. The fix:
alembic merge heads -m "merge heads"
alembic stamp head
alembic upgrade head
Not a show-stopper, but something that will confuse you the first time you hit it.
Bug 5: Sequential curl Doesn't Test Rate Limiting
This one isn't a code bug — it's a test methodology bug that looks exactly like a code bug.
A rate limit of 100 requests per 60-second window means requests must arrive within the same 60-second window to count against each other. Over a network connection (Render free tier adds ~500ms per request), 300 sequential calls take roughly 5 minutes. At any point only ~60 requests sit inside the window — well under the limit. The limiter appears broken when it's working correctly.
# This will NOT trigger rate limiting against a remote host
for i in $(seq 1 300); do curl $BASE/gateway/proxy; done
# This will — all requests fire within the same window
for i in $(seq 1 150); do
curl -s -o /dev/null -w "%{http_code}\n" \
$BASE/gateway/proxy \
-H "Authorization: Bearer $TOKEN" &
done | sort | uniq -c
# Output: 100 × 200, 50 × 429
Always use parallel requests when testing rate limiting against any remote deployment.
Performance Numbers
From a 60-second Locust load test, 20 concurrent users (legitimate users, credential stuffers, and scrapers running simultaneously):
| Metric | Result |
|---|---|
| Throughput | 59 req/s sustained |
| Legitimate user failure rate | 0% |
| Credential stuffing detection | Blocked within 10 attempts |
| P50 gateway latency | 10ms |
| P99 gateway latency | 440ms (includes throttle delay) |
| Shadow events logged in 60s | 740 |
The P99 spike is intentional — throttled clients hit asyncio.sleep, which is where the latency comes from. Legitimate users sit at the P50 line throughout.
Test Coverage
67 tests, 93% coverage. The most important tests to get right:
-
test_sliding_window_blocks_boundary_spike— send N requests at end of window 1, N at start of window 2, assert total allowed is N not 2N -
test_concurrent_duplicate_requests—asyncio.gatherfiring same endpoint 5 times simultaneously, assert no race condition in the counter -
test_shadow_mode_does_not_block— enable shadow mode, send a would-be-blocked request, assert 200 returned and shadow log has an entry -
test_credential_stuffing_detected— fail auth 10 times from same IP, assert 11th is blocked -
test_require_admin_valid_adminandtest_non_admin_cannot_access_admin_routes— RBAC enforcement
Integration tests run against real Redis and PostgreSQL via a separate docker-compose.test.yml. Test isolation uses TRUNCATE TABLE ... RESTART IDENTITY CASCADE per test, not drop_all/create_all — same isolation, far lower overhead.
Production Stack
| Component | Technology |
|---|---|
| Web framework | FastAPI + Uvicorn |
| Rate limit state | Redis 7 (sorted sets + Lua scripts) |
| IP/agent filtering | Bloom filter (pybloom-live) |
| Auth | JWT (python-jose) + bcrypt (asyncio.to_thread) |
| Database | PostgreSQL 15 + SQLAlchemy async |
| Migrations | Alembic |
| Metrics | Prometheus |
| Logging | structlog (JSON output with request_id on every line) |
| Testing | pytest + pytest-asyncio + Locust |
| CI | GitHub Actions |
| Hosting | Render (app) + Upstash (Redis) + Supabase (PostgreSQL) |
Interview Talking Points Worth Owning
"Why Lua scripts in Redis?" — MULTI/EXEC is optimistic; other clients can interleave between commands. Lua runs atomically on the Redis server. The read-increment-expire cycle cannot be observed in an intermediate state under concurrent load.
"How do you handle a Redis outage?" — Fail open vs. fail closed is a business decision. A bank fails closed — block everything if rate limit state is unavailable. A media site fails open — serve traffic and accept the abuse risk. Expose it as a config flag.
"What about shared IPs and NATs?" — IP alone is a weak identifier. The system layers it with JWT client_id. IP rate limiting catches unauthenticated abuse; user-level limiting catches authenticated abuse. Both are needed, neither is sufficient alone.
"How does the Bloom filter help performance?" — Without it, every request does a Redis SISMEMBER call — a network round-trip. The Bloom filter checks the same list from process memory in microseconds. At 0.1% false positive rate, 1 in 1000 legitimate IPs might be flagged — which shadow mode surfaces before enforcement is enabled.
"What would you change at 10x scale?" — Move to Redis Cluster to eliminate the single point of failure. Load detection rules from Redis at runtime instead of config at deploy time. Add ML anomaly detection as a second signal layer. Per-datacenter rate limiting with global sync.
What I'd Do Differently
The most valuable lesson wasn't any individual component — it was build order. The pattern that worked: environment → infrastructure → config → database models → core clients → services → API layer → workers. Never jumping a stage. A broken Redis client makes every rate limiter test confusing. A broken DB session makes every auth test unreliable.
The second lesson: cross-check against your spec after you think you're done. The graduated response system, user-agent fingerprinting, and several Prometheus metrics were all missing from my "complete" implementation until I ran a systematic audit.
Try It
The live demo is running at api-gateway-with-abuse-detection.onrender.com/docs. Register a user, grab a JWT, hit the gateway endpoint 110 times in parallel, and watch the 429s start. Shadow stats accumulate at /admin/shadow-stats if you have an admin token.
Source, DESIGN.md, and load test scenarios: github.com/macaulaypraise/api-gateway-with-abuse-detection
Tags: python fastapi redis security webdev
Top comments (0)