<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Macaulay Praise</title>
    <description>The latest articles on DEV Community by Macaulay Praise (@wolfraider).</description>
    <link>https://dev.to/wolfraider</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3829266%2F3b75624f-53a7-4734-9bfb-5e06c51be579.jpg</url>
      <title>DEV Community: Macaulay Praise</title>
      <link>https://dev.to/wolfraider</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wolfraider"/>
    <language>en</language>
    <item>
      <title>Rate Limiting Wasn't Enough — So I Built an API Gateway with Behavioral Abuse Detection</title>
      <dc:creator>Macaulay Praise</dc:creator>
      <pubDate>Thu, 09 Apr 2026 16:21:33 +0000</pubDate>
      <link>https://dev.to/wolfraider/rate-limiting-wasnt-enough-so-i-built-an-api-gateway-with-behavioral-abuse-detection-24j4</link>
      <guid>https://dev.to/wolfraider/rate-limiting-wasnt-enough-so-i-built-an-api-gateway-with-behavioral-abuse-detection-24j4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Real rate limiting, Bloom filters, credential stuffing detection, and the bugs that almost broke everything. Live demo included.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/macaulaypraise/api-gateway-with-abuse-detection" rel="noopener noreferrer"&gt;macaulaypraise/api-gateway-with-abuse-detection&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://api-gateway-with-abuse-detection.onrender.com/docs" rel="noopener noreferrer"&gt;api-gateway-with-abuse-detection.onrender.com/docs&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;As someone transitioning into backend engineering, I wanted to build something that went beyond tutorials. I didn't want a CRUD app. I wanted something that would teach me how real systems defend themselves — something I could point to in an interview and say: &lt;em&gt;"I built this from scratch and I know exactly why every line exists."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That project became an &lt;strong&gt;API Gateway with Abuse Detection&lt;/strong&gt; — a FastAPI service that sits in front of upstream backends and actively detects credential stuffing, scraping bots, and known-bad actors. Here's a technical breakdown of how it works, the decisions behind it, and the real bugs that nearly cost me my sanity.&lt;/p&gt;


&lt;h2&gt;
  
  
  What the System Does
&lt;/h2&gt;

&lt;p&gt;Every request passes through a &lt;strong&gt;six-step middleware chain&lt;/strong&gt; in this exact order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. RequestID      → UUID trace ID attached to every request
2. Auth           → JWT validation, client_id + role extracted
3. BloomFilter    → O(1) bad IP + bad user-agent check
4. RateLimit      → sliding window per authenticated client
5. AbuseDetector  → graduated response (throttle/block)
6. ShadowMode     → log would-be blocks before enforcement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each middleware depends on the one before it. If the Bloom filter flags you, the rate limiter never runs. Fail fast, fail cheap.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Components (And Why Each One Exists)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Sliding Window Rate Limiter
&lt;/h3&gt;

&lt;p&gt;Fixed-window rate limiting has a well-known flaw: a client can send &lt;code&gt;N&lt;/code&gt; requests at the end of window 1 and &lt;code&gt;N&lt;/code&gt; more at the start of window 2 — that's &lt;code&gt;2N&lt;/code&gt; requests in 2 seconds while technically never violating the per-window rule.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;sliding window&lt;/strong&gt; eliminates this. Every request gets timestamped and stored in a Redis sorted set. On each new request:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Delete all entries older than the window&lt;/li&gt;
&lt;li&gt;Count what remains&lt;/li&gt;
&lt;li&gt;Allow or deny&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key word is &lt;em&gt;atomic&lt;/em&gt;. If steps 1–3 aren't wrapped in a Lua script, a concurrent request can slip between the remove and the count, creating a race condition that lets clients exceed their limit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Executed atomically on the Redis server&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ZREMRANGEBYSCORE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ZCARD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ZADD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;-- allowed&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;-- blocked&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production verification:&lt;/strong&gt; 150 parallel requests against the live Render deployment confirmed the enforcer is exact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;100 × 200 OK  ← exactly the rate limit
 50 × 429     ← every request over the limit rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prometheus confirmed &lt;code&gt;rate_limit_rejections_total{client_id="demo"} 200.0&lt;/code&gt; after two parallel test runs. The &lt;code&gt;client_id&lt;/code&gt; label proves the &lt;strong&gt;JWT identity is tracked, not the IP address&lt;/strong&gt; — a crucial distinction for shared NATs and corporate networks.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Two-Dimensional Auth Failure Tracking
&lt;/h3&gt;

&lt;p&gt;Credential stuffing is tracked on two axes simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;By IP&lt;/strong&gt;: &lt;code&gt;failed_auth:{ip}&lt;/code&gt; — one IP failing across many accounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;By username&lt;/strong&gt;: &lt;code&gt;failed_auth:{username}&lt;/code&gt; — many IPs targeting the same account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are separate Redis keys with independent TTLs, configurable via environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;AUTH_FAILURE_IP_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10       &lt;span class="c"&gt;# failures before IP soft-block&lt;/span&gt;
&lt;span class="nv"&gt;AUTH_FAILURE_USER_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;20     &lt;span class="c"&gt;# failures before username soft-block&lt;/span&gt;
&lt;span class="nv"&gt;AUTH_FAILURE_WINDOW_SECONDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;300    &lt;span class="c"&gt;# counter TTL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keeping these counters independent means you can block a specific IP without penalizing every other IP targeting that same user, and flag a username as under attack without affecting unrelated clients.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Scraping Detection via Request Timing Entropy
&lt;/h3&gt;

&lt;p&gt;Humans generate requests with high temporal variance. Bots generate requests with suspiciously regular inter-request timing.&lt;/p&gt;

&lt;p&gt;For each client, I maintain a sliding window of the last N timestamps in a Redis sorted set and compute the &lt;strong&gt;standard deviation of the inter-arrival gaps&lt;/strong&gt;. A standard deviation below &lt;code&gt;SCRAPING_ENTROPY_THRESHOLD&lt;/code&gt; (default &lt;code&gt;0.5&lt;/code&gt;) triggers a bot flag.&lt;/p&gt;

&lt;p&gt;The elegant part: this doesn't care about request volume. A sophisticated bot that rate-limits itself to human speeds will still be caught if it's too &lt;em&gt;regular&lt;/em&gt;. This pairs with user-agent fingerprinting (the second Bloom filter) to create a multi-signal detection approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Dual Bloom Filters
&lt;/h3&gt;

&lt;p&gt;Two in-memory Bloom filters, both synced from Redis every 60 seconds by a background worker:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;known_bad_ips&lt;/code&gt; — screens every incoming IP at O(1) with no Redis round-trip&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;abusive_agents&lt;/code&gt; — user-agent fingerprinting for known scraper signatures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;BLOOM_FILTER_CAPACITY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000000  &lt;span class="c"&gt;# expected entries&lt;/span&gt;
&lt;span class="nv"&gt;BLOOM_FILTER_ERROR_RATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.001  &lt;span class="c"&gt;# 0.1% false positive rate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At a 0.1% false positive rate across 1 million IPs, the filter requires roughly 1.1 MB of memory. The worst case is a legitimate IP being flagged — which shadow mode surfaces before enforcement is ever enabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical implementation detail&lt;/strong&gt;: the filter must live on &lt;code&gt;app.state.bloom&lt;/code&gt; and be shared across all requests. Per-request instantiation gives you a fresh empty filter on every call — zero enforcement, zero errors, 100% invisible failure. More on this in the bugs section.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Graduated Response System
&lt;/h3&gt;

&lt;p&gt;Three states instead of a binary allow/block:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;State&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ALLOWED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Request passes through normally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;THROTTLED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Response delayed via &lt;code&gt;asyncio.sleep&lt;/code&gt;, served with &lt;code&gt;Retry-After&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SOFT_BLOCK&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Immediate 429 — Redis TTL, temporary, self-expiring&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This matters because going straight to hard block means a legitimate client that briefly triggered a rule is permanently punished. The graduated approach lets real users recover automatically while truly malicious clients face escalating consequences.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Shadow Mode — The Safety Net
&lt;/h3&gt;

&lt;p&gt;Shadow mode is how you deploy new detection rules without blocking real users. When a request would trigger a rule, shadow mode &lt;strong&gt;logs the event to Redis with a 24-hour TTL instead of blocking&lt;/strong&gt;. The request passes through normally.&lt;/p&gt;

&lt;p&gt;What makes this interesting is the implementation: shadow mode is a &lt;strong&gt;runtime toggle&lt;/strong&gt;, not a deploy-time config. It's controlled via a Redis key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable — observe but don't block&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nv"&gt;$BASE&lt;/span&gt;/admin/shadow-mode?enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$ADMIN_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Disable — start enforcing&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nv"&gt;$BASE&lt;/span&gt;/admin/shadow-mode?enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$ADMIN_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The middleware reads &lt;code&gt;config:shadow_mode_enabled&lt;/code&gt; from Redis on every request, falling back to the &lt;code&gt;SHADOW_MODE_ENABLED&lt;/code&gt; environment variable if the key is absent. Toggle takes effect on the next request — no redeployment, no restart.&lt;/p&gt;




&lt;h2&gt;
  
  
  Database-Backed RBAC
&lt;/h2&gt;

&lt;p&gt;The admin role system started as a simple &lt;code&gt;ADMIN_USERNAMES&lt;/code&gt; environment variable. That approach has an obvious flaw: any user who registers with that exact username bypasses all admin checks.&lt;/p&gt;

&lt;p&gt;The replacement: a &lt;code&gt;UserRole&lt;/code&gt; enum (&lt;code&gt;USER&lt;/code&gt;, &lt;code&gt;ADMIN&lt;/code&gt;) stored in the &lt;code&gt;users&lt;/code&gt; table, embedded in the JWT at login time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# JWT payload at login
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sub&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;require_admin&lt;/code&gt; dependency reads the JWT &lt;code&gt;role&lt;/code&gt; claim directly — no database query per request. To promote a user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'admin'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'target'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user logs in again, receives a JWT with &lt;code&gt;"role": "admin"&lt;/code&gt;, and admin endpoints immediately become accessible. Their previous token expires in 30 minutes. No server restart required.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bugs That Actually Hurt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bug 1: The Async Password Verification Trap
&lt;/h3&gt;

&lt;p&gt;This one was subtle and genuinely dangerous. I had refactored &lt;code&gt;verify_password&lt;/code&gt; to be an &lt;code&gt;async&lt;/code&gt; function wrapping bcrypt's blocking &lt;code&gt;checkpw&lt;/code&gt; in &lt;code&gt;asyncio.to_thread()&lt;/code&gt; — which was correct. But I forgot to &lt;code&gt;await&lt;/code&gt; it at the call site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 🚨 WRONG — coroutine object is always truthy
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;verify_password&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashed&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# This branch ALWAYS executes
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ CORRECT
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;verify_password&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashed&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A coroutine object that's never awaited evaluates as truthy. Every password check passed, regardless of input. All authentication was silently bypassed. The auth endpoint returned a valid JWT for any password entered against any account.&lt;/p&gt;

&lt;p&gt;There were no exceptions, no warnings, no test failures if your tests weren't checking wrong-password rejection specifically. The fix is trivial once you find it — finding it is the hard part.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 2: Bloom Filter Instantiated Per-Request
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;block-ip&lt;/code&gt; admin route was creating a new &lt;code&gt;BloomFilterService()&lt;/code&gt; inside the route handler, adding the IP to that instance, and returning. Meanwhile, the middleware's shared in-memory filter (on &lt;code&gt;app.state.bloom&lt;/code&gt;) was never updated — until the 60-second background sync ran.&lt;/p&gt;

&lt;p&gt;The result: a hard-blocked IP could make 60 more requests before the block took effect. The fix was making admin routes update &lt;code&gt;request.app.state.bloom&lt;/code&gt; directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 🚨 WRONG — local instance, never seen by middleware
&lt;/span&gt;&lt;span class="n"&gt;bloom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BloomFilterService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;bloom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ CORRECT — updates the shared middleware instance immediately
&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bloom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bug 3: Static Admin Username Bypassed by Registration
&lt;/h3&gt;

&lt;p&gt;The original &lt;code&gt;ADMIN_USERNAMES&lt;/code&gt; config approach had a security hole: if the env var was set to &lt;code&gt;"admin"&lt;/code&gt;, anyone could register with username &lt;code&gt;admin&lt;/code&gt; and gain admin access. Replaced entirely with the database-backed &lt;code&gt;UserRole&lt;/code&gt; enum. The setting and its associated property were deleted from &lt;code&gt;config.py&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 4: Duplicate Alembic Migration Head
&lt;/h3&gt;

&lt;p&gt;Running &lt;code&gt;make makemigration&lt;/code&gt; twice without migrating in between creates two heads in the Alembic migration graph. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;alembic merge heads &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"merge heads"&lt;/span&gt;
alembic stamp &lt;span class="nb"&gt;head
&lt;/span&gt;alembic upgrade &lt;span class="nb"&gt;head&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not a show-stopper, but something that will confuse you the first time you hit it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 5: Sequential curl Doesn't Test Rate Limiting
&lt;/h3&gt;

&lt;p&gt;This one isn't a code bug — it's a test methodology bug that looks exactly like a code bug.&lt;/p&gt;

&lt;p&gt;A rate limit of 100 requests per 60-second window means requests must arrive within the same 60-second window to count against each other. Over a network connection (Render free tier adds ~500ms per request), 300 sequential calls take roughly 5 minutes. At any point only ~60 requests sit inside the window — well under the limit. The limiter appears broken when it's working correctly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# This will NOT trigger rate limiting against a remote host&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;seq &lt;/span&gt;1 300&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;curl &lt;span class="nv"&gt;$BASE&lt;/span&gt;/gateway/proxy&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# This will — all requests fire within the same window&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;seq &lt;/span&gt;1 150&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;$BASE&lt;/span&gt;/gateway/proxy &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &amp;amp;
&lt;span class="k"&gt;done&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;uniq&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt;
&lt;span class="c"&gt;# Output: 100 × 200, 50 × 429&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always use parallel requests when testing rate limiting against any remote deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance Numbers
&lt;/h2&gt;

&lt;p&gt;From a 60-second Locust load test, 20 concurrent users (legitimate users, credential stuffers, and scrapers running simultaneously):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;59 req/s sustained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legitimate user failure rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credential stuffing detection&lt;/td&gt;
&lt;td&gt;Blocked within 10 attempts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P50 gateway latency&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P99 gateway latency&lt;/td&gt;
&lt;td&gt;440ms (includes throttle delay)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shadow events logged in 60s&lt;/td&gt;
&lt;td&gt;740&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The P99 spike is intentional — throttled clients hit &lt;code&gt;asyncio.sleep&lt;/code&gt;, which is where the latency comes from. Legitimate users sit at the P50 line throughout.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test Coverage
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;67 tests, 93% coverage.&lt;/strong&gt; The most important tests to get right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;test_sliding_window_blocks_boundary_spike&lt;/code&gt;&lt;/strong&gt; — send N requests at end of window 1, N at start of window 2, assert total allowed is N not 2N&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;test_concurrent_duplicate_requests&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;asyncio.gather&lt;/code&gt; firing same endpoint 5 times simultaneously, assert no race condition in the counter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;test_shadow_mode_does_not_block&lt;/code&gt;&lt;/strong&gt; — enable shadow mode, send a would-be-blocked request, assert 200 returned and shadow log has an entry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;test_credential_stuffing_detected&lt;/code&gt;&lt;/strong&gt; — fail auth 10 times from same IP, assert 11th is blocked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;test_require_admin_valid_admin&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;test_non_admin_cannot_access_admin_routes&lt;/code&gt;&lt;/strong&gt; — RBAC enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Integration tests run against real Redis and PostgreSQL via a separate &lt;code&gt;docker-compose.test.yml&lt;/code&gt;. Test isolation uses &lt;code&gt;TRUNCATE TABLE ... RESTART IDENTITY CASCADE&lt;/code&gt; per test, not &lt;code&gt;drop_all/create_all&lt;/code&gt; — same isolation, far lower overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Web framework&lt;/td&gt;
&lt;td&gt;FastAPI + Uvicorn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limit state&lt;/td&gt;
&lt;td&gt;Redis 7 (sorted sets + Lua scripts)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP/agent filtering&lt;/td&gt;
&lt;td&gt;Bloom filter (pybloom-live)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;JWT (python-jose) + bcrypt (asyncio.to_thread)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL 15 + SQLAlchemy async&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrations&lt;/td&gt;
&lt;td&gt;Alembic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metrics&lt;/td&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;structlog (JSON output with request_id on every line)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;td&gt;pytest + pytest-asyncio + Locust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI&lt;/td&gt;
&lt;td&gt;GitHub Actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Render (app) + Upstash (Redis) + Supabase (PostgreSQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Interview Talking Points Worth Owning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Why Lua scripts in Redis?"&lt;/strong&gt; — &lt;code&gt;MULTI/EXEC&lt;/code&gt; is optimistic; other clients can interleave between commands. Lua runs atomically on the Redis server. The read-increment-expire cycle cannot be observed in an intermediate state under concurrent load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How do you handle a Redis outage?"&lt;/strong&gt; — Fail open vs. fail closed is a business decision. A bank fails closed — block everything if rate limit state is unavailable. A media site fails open — serve traffic and accept the abuse risk. Expose it as a config flag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What about shared IPs and NATs?"&lt;/strong&gt; — IP alone is a weak identifier. The system layers it with JWT &lt;code&gt;client_id&lt;/code&gt;. IP rate limiting catches unauthenticated abuse; user-level limiting catches authenticated abuse. Both are needed, neither is sufficient alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How does the Bloom filter help performance?"&lt;/strong&gt; — Without it, every request does a Redis &lt;code&gt;SISMEMBER&lt;/code&gt; call — a network round-trip. The Bloom filter checks the same list from process memory in microseconds. At 0.1% false positive rate, 1 in 1000 legitimate IPs might be flagged — which shadow mode surfaces before enforcement is enabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What would you change at 10x scale?"&lt;/strong&gt; — Move to Redis Cluster to eliminate the single point of failure. Load detection rules from Redis at runtime instead of config at deploy time. Add ML anomaly detection as a second signal layer. Per-datacenter rate limiting with global sync.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;The most valuable lesson wasn't any individual component — it was &lt;strong&gt;build order&lt;/strong&gt;. The pattern that worked: environment → infrastructure → config → database models → core clients → services → API layer → workers. Never jumping a stage. A broken Redis client makes every rate limiter test confusing. A broken DB session makes every auth test unreliable.&lt;/p&gt;

&lt;p&gt;The second lesson: cross-check against your spec after you think you're done. The graduated response system, user-agent fingerprinting, and several Prometheus metrics were all missing from my "complete" implementation until I ran a systematic audit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The live demo is running at &lt;a href="https://api-gateway-with-abuse-detection.onrender.com/docs" rel="noopener noreferrer"&gt;api-gateway-with-abuse-detection.onrender.com/docs&lt;/a&gt;. Register a user, grab a JWT, hit the gateway endpoint 110 times in parallel, and watch the 429s start. Shadow stats accumulate at &lt;code&gt;/admin/shadow-stats&lt;/code&gt; if you have an admin token.&lt;/p&gt;

&lt;p&gt;Source, DESIGN.md, and load test scenarios: &lt;a href="https://github.com/macaulaypraise/api-gateway-with-abuse-detection" rel="noopener noreferrer"&gt;github.com/macaulaypraise/api-gateway-with-abuse-detection&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: &lt;code&gt;python&lt;/code&gt; &lt;code&gt;fastapi&lt;/code&gt; &lt;code&gt;redis&lt;/code&gt; &lt;code&gt;security&lt;/code&gt; &lt;code&gt;webdev&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>backend</category>
      <category>security</category>
      <category>showdev</category>
    </item>
    <item>
      <title>The Dual-Write Problem: Why Your Payment API Is One Crash Away From Silent Data Loss</title>
      <dc:creator>Macaulay Praise</dc:creator>
      <pubDate>Tue, 17 Mar 2026 11:37:44 +0000</pubDate>
      <link>https://dev.to/wolfraider/the-dual-write-problem-why-your-payment-api-is-one-crash-away-from-silent-data-loss-mk7</link>
      <guid>https://dev.to/wolfraider/the-dual-write-problem-why-your-payment-api-is-one-crash-away-from-silent-data-loss-mk7</guid>
      <description>&lt;p&gt;You commit a payment to your database. Then you publish an event to Kafka so downstream services can settle it. Both succeed — until one day the process crashes in the 3 milliseconds between those two operations.&lt;/p&gt;

&lt;p&gt;The database says the payment happened. Kafka never heard about it. The settlement worker never ran. The customer was charged and nothing moved.&lt;/p&gt;

&lt;p&gt;That's the dual-write problem. This post explains why it's unsolvable with the obvious approaches, and how the Outbox pattern fixes it properly — using an implementation I built and load-tested to 1,000 concurrent users with zero duplicate charges.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the Obvious Solutions Don't Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Just publish to Kafka first, then write to the DB."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same problem, reversed. The event fires but the payment row never gets written. Your downstream consumers process a payment that your database has no record of.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use a transaction that wraps both."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't. A database transaction and a Kafka publish are two entirely separate systems. PostgreSQL has no knowledge of Kafka. There is no &lt;code&gt;COMMIT&lt;/code&gt; that covers both. The moment you step outside your DB transaction to call &lt;code&gt;producer.send()&lt;/code&gt;, you're in crash territory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use Two-Phase Commit (2PC)."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kafka doesn't support it. And even in systems where both sides support 2PC, you're introducing a coordinator as a single point of failure with significantly higher latency. This is why 2PC has largely been abandoned in modern distributed systems in favour of patterns like the Outbox.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Crash Window Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Here's the exact sequence that fails silently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;BEGIN&lt;/span&gt; &lt;span class="n"&gt;transaction&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;payments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'PENDING'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt; &lt;span class="k"&gt;write&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;COMMIT&lt;/span&gt;                                       &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;                                              &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="err"&gt;💥&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="n"&gt;crashes&lt;/span&gt; &lt;span class="n"&gt;here&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'payment.initiated'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;      &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="n"&gt;never&lt;/span&gt; &lt;span class="n"&gt;reached&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 4 is real. Network blips, OOM kills, deploys — any of these can fire between steps 3 and 5. The window is tiny, but at scale it closes eventually.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Outbox Pattern
&lt;/h2&gt;

&lt;p&gt;The fix is to stop writing to two systems. Write to one.&lt;/p&gt;

&lt;p&gt;Instead of publishing directly to Kafka, you write the event as a row in an &lt;code&gt;outbox_events&lt;/code&gt; table — &lt;strong&gt;inside the same database transaction as the payment row&lt;/strong&gt;. A separate background poller reads from that table and publishes to Kafka.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;BEGIN&lt;/span&gt; &lt;span class="n"&gt;transaction&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;payments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'PENDING'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox_events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'payment.initiated'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;COMMIT&lt;/span&gt;                                    &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="k"&gt;both&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt; &lt;span class="n"&gt;land&lt;/span&gt; &lt;span class="n"&gt;atomically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the Kafka publish is handled by the poller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OUTBOX&lt;/span&gt; &lt;span class="n"&gt;POLLER&lt;/span&gt;  &lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox_events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
               &lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox_events&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the poller crashes after publishing but before marking the row, it simply replays on restart — Kafka receives a duplicate, which you handle with a deterministic event ID (more on this below). The payment row is never orphaned because the event was committed to the database first.&lt;/p&gt;

&lt;p&gt;The full flow in my implementation looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLIENT  →  POST /payments  +  Idempotency-Key: &amp;lt;uuid&amp;gt;
                │
                ▼
        ┌─ Redis cache check ──── HIT → return stored response (no DB touch)
        ├─ Distributed lock ───── prevents concurrent duplicate requests
        ├─ DB transaction ──────── Payment row + OutboxEvent row (atomic)
        └─ Cache response, release lock → 202 Accepted

OUTBOX POLLER  →  polls outbox_events WHERE published_at IS NULL  →  Kafka

KAFKA  →  SETTLEMENT WORKER
           ├─ PENDING → PROCESSING → SETTLED / FAILED
           ├─ Exponential backoff, max 5 retries
           └─ Dead Letter Queue on exhaustion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Handling the At-Least-Once Delivery Problem
&lt;/h2&gt;

&lt;p&gt;The outbox poller delivers at-least-once to Kafka — meaning duplicate events are possible on replay. The settlement worker handles this with deterministic UUID5 event IDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NAMESPACE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same &lt;code&gt;topic:partition:offset&lt;/code&gt; always produces the same UUID. On replay, the deduplication check is a no-op — it sees the event ID already in &lt;code&gt;processed_events&lt;/code&gt; and skips it. No double processing, no complex coordination.&lt;/p&gt;




&lt;h2&gt;
  
  
  Does It Actually Work?
&lt;/h2&gt;

&lt;p&gt;I ran two load test scenarios with Locust against a single Docker container:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Concurrent Users&lt;/th&gt;
&lt;th&gt;Total Requests&lt;/th&gt;
&lt;th&gt;Duplicate Charges&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normal load&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;1,378&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stress test&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;12,746&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Correctness held at 0% duplicate charges through both. The 0.4% error rate at 1,000 users was connection pool exhaustion — not an idempotency failure. Every retry with the same idempotency key returned the identical &lt;code&gt;payment_id&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Outbox Pattern Trades Off
&lt;/h2&gt;

&lt;p&gt;Nothing is free. The outbox poller introduces a small delay — typically 1–5 seconds — between a payment being committed and its event reaching Kafka. For most use cases this is acceptable. For real-time fraud scoring that needs to act on the event immediately, it isn't, and you'd need a different approach.&lt;/p&gt;

&lt;p&gt;The poller also needs to be a reliable background process. If it stops running silently, your outbox table grows and events stall. Monitoring queue depth is not optional.&lt;/p&gt;




&lt;h2&gt;
  
  
  The One-Sentence Summary
&lt;/h2&gt;

&lt;p&gt;The Outbox pattern solves the dual-write problem by making the event a database record first and delegating the Kafka publish to a separate, restartable poller — so you never write to two systems atomically, you write to one.&lt;/p&gt;




&lt;p&gt;Full source code, DESIGN.md, and load test results: &lt;strong&gt;&lt;a href="https://github.com/macaulaypraise/idempotent-payment-processing-system.git" rel="noopener noreferrer"&gt;https://github.com/macaulaypraise/idempotent-payment-processing-system.git&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stack: Python 3.12 · FastAPI · PostgreSQL 15 · Redis 7 · Kafka · SQLAlchemy (async) · Docker Compose&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>python</category>
      <category>backend</category>
      <category>kafka</category>
    </item>
  </channel>
</rss>
