<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ikuko OTANI</title>
    <description>The latest articles on DEV Community by Ikuko OTANI (@ikuko-otani).</description>
    <link>https://dev.to/ikuko-otani</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3997840%2F25394a73-20c1-4d08-b9a5-e52afacd31e7.png</url>
      <title>DEV Community: Ikuko OTANI</title>
      <link>https://dev.to/ikuko-otani</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ikuko-otani"/>
    <language>en</language>
    <item>
      <title>Double-Entry Bookkeeping in Python: How I Built a Fintech-Grade Ledger API with FastAPI and async SQLAlchemy</title>
      <dc:creator>Ikuko OTANI</dc:creator>
      <pubDate>Sat, 27 Jun 2026 07:44:17 +0000</pubDate>
      <link>https://dev.to/ikuko-otani/double-entry-bookkeeping-in-python-how-i-built-a-fintech-grade-ledger-api-with-fastapi-and-async-2pj</link>
      <guid>https://dev.to/ikuko-otani/double-entry-bookkeeping-in-python-how-i-built-a-fintech-grade-ledger-api-with-fastapi-and-async-2pj</guid>
      <description>&lt;p&gt;Every payment system has the same class of bug waiting to happen: a network timeout triggers a retry, and the retry creates a duplicate charge. The customer pays twice and the ledger is wrong.&lt;/p&gt;

&lt;p&gt;I spent over a decade building enterprise systems — mostly in PHP and Oracle PL/SQL. Financial accounting, order management, and manufacturing systems were a recurring theme across my work.&lt;/p&gt;

&lt;p&gt;When I decided to transition into modern backend engineering, I chose a domain I already understood deeply: &lt;strong&gt;double-entry bookkeeping&lt;/strong&gt;. This let me demonstrate production-level engineering with a modern async Python stack — without getting distracted by unfamiliar business rules.&lt;/p&gt;

&lt;p&gt;The result is &lt;a href="https://github.com/ikuko-otani/payment-ledger-api" rel="noopener noreferrer"&gt;&lt;strong&gt;payment-ledger-api&lt;/strong&gt;&lt;/a&gt;: a REST API that enforces the same ledger invariants and conventions that payment processors like Stripe and Mollie expose in their public APIs.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://payment-ledger-api.fly.dev/docs" rel="noopener noreferrer"&gt;Live Swagger UI demo&lt;/a&gt;&lt;/strong&gt; (hosted on Fly.io — first request may take a few seconds while the machine wakes up)&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Double-Entry Bookkeeping?
&lt;/h2&gt;

&lt;p&gt;Every financial event is recorded as &lt;strong&gt;two equal and opposite entries&lt;/strong&gt; — a debit on one account and a credit on another. The ledger invariant is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Total debits = Total credits&lt;/strong&gt; — always, for every transaction.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, when a customer pays a €100 invoice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Account&lt;/th&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;Amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cash (Asset)&lt;/td&gt;
&lt;td&gt;Debit&lt;/td&gt;
&lt;td&gt;€100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Revenue&lt;/td&gt;
&lt;td&gt;Credit&lt;/td&gt;
&lt;td&gt;€100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If the entries don't balance, the transaction is rejected. This invariant is the foundation of every accounting system — and the core constraint this API enforces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture at a Glance
&lt;/h2&gt;

&lt;p&gt;This API doesn't serve production traffic — but every decision in it is one I'd be ready to defend in a production design review. That framing drove every trade-off below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Python 3.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;FastAPI (async)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORM&lt;/td&gt;
&lt;td&gt;SQLAlchemy 2.0 + asyncpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL 16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;Redis 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;JWT (PyJWT) + bcrypt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;OpenTelemetry + Jaeger, structlog (JSON)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metrics&lt;/td&gt;
&lt;td&gt;Prometheus + Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI&lt;/td&gt;
&lt;td&gt;GitHub Actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy&lt;/td&gt;
&lt;td&gt;Fly.io&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Data Model
&lt;/h3&gt;

&lt;p&gt;Three entities model the accounting domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;accounts 1 ──── N entries N ──── 1 transactions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;accounts&lt;/strong&gt; — the chart of accounts (Asset, Liability, Equity, Revenue, Expense)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;transactions&lt;/strong&gt; — immutable headers representing a single financial event&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;entries&lt;/strong&gt; — debit/credit lines; each transaction has ≥ 2 entries that must balance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Double-entry transactions with balance validation enforced at two layers — application and database&lt;/li&gt;
&lt;li&gt;Multi-currency support — hub-and-spoke USD conversion at write time with point-in-time exchange rates&lt;/li&gt;
&lt;li&gt;Idempotency-key support via Redis — safely retry &lt;code&gt;POST&lt;/code&gt; requests without creating duplicates&lt;/li&gt;
&lt;li&gt;Immutable audit log — append-only &lt;code&gt;audit_logs&lt;/code&gt; with JSONB before/after snapshots, written atomically with every mutation&lt;/li&gt;
&lt;li&gt;JWT authentication with role-based access control (&lt;code&gt;admin&lt;/code&gt; / &lt;code&gt;auditor&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;96% test coverage using &lt;a href="https://testcontainers-python.readthedocs.io/" rel="noopener noreferrer"&gt;testcontainers&lt;/a&gt; — every test runs against a real PostgreSQL instance, no mocks&lt;/li&gt;
&lt;li&gt;Distributed tracing with OpenTelemetry + Jaeger (local dev via &lt;code&gt;docker compose&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Prometheus metrics via &lt;code&gt;/metrics&lt;/code&gt; with a pre-built Grafana dashboard — tracks request counts, latency histograms, and in-progress requests per route&lt;/li&gt;
&lt;li&gt;Structured JSON logging via structlog — every request log includes &lt;code&gt;request_id&lt;/code&gt;, &lt;code&gt;trace_id&lt;/code&gt;, &lt;code&gt;method&lt;/code&gt;, &lt;code&gt;path&lt;/code&gt;, &lt;code&gt;status_code&lt;/code&gt;, and &lt;code&gt;latency_ms&lt;/code&gt;; &lt;code&gt;trace_id&lt;/code&gt; ties each log line to the corresponding Jaeger span&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what a traced &lt;code&gt;POST /transactions&lt;/code&gt; request looks like in Jaeger — each child span corresponds to an individual SQL query generated by SQLAlchemy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjrzzpmm9oqt0n2y7plx5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjrzzpmm9oqt0n2y7plx5.png" alt="Jaeger trace waterfall showing a POST /transactions request" width="799" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI pipeline: lint (ruff + mypy --strict + pip-audit) and test (pytest + testcontainers) run in parallel → Docker build&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Defense in Depth: Two Layers of Balance Enforcement
&lt;/h2&gt;

&lt;p&gt;The double-entry invariant (&lt;code&gt;SUM(debits) == SUM(credits)&lt;/code&gt;) is enforced at two independent layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application layer&lt;/strong&gt; — the service validates before persisting, returning a clear 422 error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/services/transaction_service.py (simplified)
&lt;/span&gt;&lt;span class="n"&gt;debit_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Direction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEBIT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;credit_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Direction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CREDIT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;debit_sum&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;credit_sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Entries are not balanced: debit=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;debit_sum&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; credit=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;credit_sum&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Database layer&lt;/strong&gt; — a PostgreSQL constraint trigger acts as a safety net, catching any write that bypasses the service layer (direct SQL, migration scripts, admin tools):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;check_entries_balance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
    &lt;span class="n"&gt;debit_sum&lt;/span&gt;  &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;credit_sum&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;direction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'DEBIT'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;direction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'CREDIT'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;debit_sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credit_sum&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;transaction_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="n"&gt;debit_sum&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;credit_sum&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt;
        &lt;span class="n"&gt;RAISE&lt;/span&gt; &lt;span class="n"&gt;EXCEPTION&lt;/span&gt; &lt;span class="s1"&gt;'entries are not balanced: debit=% credit=%'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;debit_sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credit_sum&lt;/span&gt;
            &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ERRCODE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'check_violation'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;trg_check_entries_balance&lt;/span&gt;
&lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;
&lt;span class="k"&gt;DEFERRABLE&lt;/span&gt; &lt;span class="k"&gt;INITIALLY&lt;/span&gt; &lt;span class="k"&gt;DEFERRED&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;EACH&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;check_entries_balance&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why &lt;code&gt;DEFERRABLE INITIALLY DEFERRED&lt;/code&gt;? Entries are inserted row by row within a single transaction. A normal trigger would fire after the first &lt;code&gt;INSERT&lt;/code&gt; — when only the debit side exists — and immediately reject it. Deferring to &lt;code&gt;COMMIT&lt;/code&gt; time ensures all entries are present before validating the balance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Design Decisions Worth Discussing
&lt;/h2&gt;

&lt;p&gt;The full list of Architecture Decision Records lives in the &lt;a href="https://github.com/ikuko-otani/payment-ledger-api/tree/main/docs/adr" rel="noopener noreferrer"&gt;docs/adr/&lt;/a&gt; directory. Here are the four I found most interesting to implement.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Money as Integers, Not Floats
&lt;/h3&gt;

&lt;p&gt;All monetary amounts are stored as &lt;code&gt;BIGINT&lt;/code&gt; in the currency's smallest unit — &lt;code&gt;1000&lt;/code&gt; means €10.00, not ten thousand euros. A separate &lt;code&gt;currency VARCHAR(3)&lt;/code&gt; column carries the ISO 4217 code (foreign-keyed to a currencies table).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; IEEE 754 floating-point cannot represent &lt;code&gt;0.1&lt;/code&gt; exactly. &lt;code&gt;0.1 + 0.2 = 0.30000000000000004&lt;/code&gt; is a famous example, and in a ledger that sums thousands of entries, those rounding errors accumulate into real balance discrepancies. Integer arithmetic eliminates this class of bug entirely.&lt;/p&gt;

&lt;p&gt;This is the same convention Stripe, Mollie, and Adyen use in their public APIs. If you've ever seen &lt;code&gt;"amount": 1099&lt;/code&gt; in a Stripe response, that's €10.99 represented as an integer in cents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Cryptocurrencies with 8+ decimal places need a different strategy (e.g., &lt;code&gt;NUMERIC(30,8)&lt;/code&gt;). For fiat currencies, &lt;code&gt;BIGINT&lt;/code&gt; is more than sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Redis-Backed Idempotency with Response Replay
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;POST /transactions&lt;/code&gt; accepts an &lt;code&gt;Idempotency-Key&lt;/code&gt; header (value must be a valid UUID). The key is stored in Redis with a 24-hour TTL, following a two-phase state machine inspired by Stripe's implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;New request&lt;/strong&gt; — &lt;code&gt;SET NX&lt;/code&gt; stores a SHA-256 fingerprint of the request body with &lt;code&gt;"status": "pending"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On success&lt;/strong&gt; — the pending marker is overwritten with the serialized response, so duplicate requests replay the cached response body with a &lt;code&gt;200&lt;/code&gt; status instead of returning an error&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-flight duplicate&lt;/strong&gt; — if the same key and body arrive while the first request is still processing (status is still &lt;code&gt;"pending"&lt;/code&gt;), the API returns &lt;code&gt;409&lt;/code&gt; to prevent concurrent execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fingerprint mismatch&lt;/strong&gt; — if the same key is reused with a different request body, the API returns &lt;code&gt;422&lt;/code&gt; to prevent silent request substitution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On failure&lt;/strong&gt; — the key is deleted, allowing the client to retry with the same key&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the core of the state machine — a FastAPI dependency that wraps the entire request lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/dependencies/idempotency.py (simplified)
&lt;/span&gt;
&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;fingerprint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;redis_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idempotency_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;pending_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fingerprint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;was_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pending_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;86_400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;was_set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fingerprint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;422&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Idempotency-Key reused with different request body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;IdempotencyContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;replay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# → 200 replay
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;409&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Duplicate request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;          &lt;span class="c1"&gt;# route handler runs here
&lt;/span&gt;&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# failed → allow retry
&lt;/span&gt;    &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;yield&lt;/code&gt; is the key design choice — it turns this dependency into an async context manager. The route handler executes between &lt;code&gt;yield&lt;/code&gt; and the &lt;code&gt;except&lt;/code&gt; block, so failures automatically clean up the Redis key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One caveat on the fingerprint:&lt;/strong&gt; it hashes the raw request bytes, not a canonicalized form — so a client that retries with semantically identical but byte-different JSON (reordered keys, extra whitespace) would receive a &lt;code&gt;422&lt;/code&gt;. That's an acceptable assumption here because a well-behaved client resends the exact same serialized payload on retry; canonicalizing first (e.g. the JSON Canonicalization Scheme, RFC 8785) would relax it, at the cost of parsing and re-serializing every request body before hashing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; In payment systems, network failures trigger retries. Without idempotency, a retry could create a duplicate transaction — charging a customer twice. The response-replay mechanism goes further: instead of just blocking duplicates, it returns the exact same response the client would have received, making retries truly transparent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Redis is a hard dependency on the write path. If Redis is unavailable, &lt;code&gt;POST /transactions&lt;/code&gt; returns 500 rather than silently skipping the idempotency check — I chose correctness over availability because a skipped check could create duplicate transactions. The response body is also cached in Redis alongside the fingerprint, increasing per-key storage. This gives &lt;em&gt;effectively-once&lt;/em&gt; semantics, not true exactly-once: a crash between the DB commit and the Redis cache-write leaves the key in &lt;code&gt;"pending"&lt;/code&gt; state, causing retries to receive &lt;code&gt;409&lt;/code&gt; until the 24-hour TTL expires — closing that window would require an atomic commit across both stores.&lt;/p&gt;

&lt;p&gt;Here's what it looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First request → 201 Created&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://payment-ledger-api.fly.dev/api/v1/transactions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"description":"Invoice #42", ...}'&lt;/span&gt;
&lt;span class="c"&gt;# → 201 {"id": "...", "status": "POSTED", ...}&lt;/span&gt;

&lt;span class="c"&gt;# Same key, same body → 200 (replayed from cache)&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST ... &lt;span class="o"&gt;(&lt;/span&gt;same &lt;span class="nb"&gt;command&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;# → 200 {"id": "...", "status": "POSTED", ...}  ← same response body&lt;/span&gt;

&lt;span class="c"&gt;# Same key, different body → 422 (fingerprint mismatch)&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST ... &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"description":"Invoice #99", ...}'&lt;/span&gt;
&lt;span class="c"&gt;# → 422 {"detail": "Idempotency-Key reused with different request body"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three requests, three different outcomes — all from the same key.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. JWT Claims That Eliminate Per-Request DB Lookups
&lt;/h3&gt;

&lt;p&gt;At login, the user's &lt;code&gt;role&lt;/code&gt; and &lt;code&gt;is_active&lt;/code&gt; status are embedded directly in the JWT payload. Authenticated requests are resolved entirely from the token — no database query required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; The previous implementation fetched the full &lt;code&gt;User&lt;/code&gt; object from PostgreSQL on every authenticated request. That single &lt;code&gt;SELECT&lt;/code&gt; dominated the per-request latency budget. Since the only fields needed downstream are &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;role&lt;/code&gt;, and &lt;code&gt;is_active&lt;/code&gt; — all known at login time — embedding them in the token turns authentication into a pure in-memory decode (microseconds, not milliseconds).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Role changes and deactivations are &lt;strong&gt;not reflected in existing tokens&lt;/strong&gt; until they expire (configured to 30 minutes in this deployment). In a single-instance deployment with no concurrent admin operations, this window is acceptable. In a multi-tenant production system, I'd tighten this to 5-minute tokens with a silent refresh endpoint — or add a Redis-based token blocklist (one extra RTT per request, but exact revocation).&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Hub-and-Spoke Currency Conversion
&lt;/h3&gt;

&lt;p&gt;A ledger that only handles one currency is a toy. But supporting N currencies naively — storing exchange rates for every possible pair — means maintaining N×(N−1)/2 rates. At 10 currencies, that's 45 rows to keep current.&lt;/p&gt;

&lt;p&gt;This API uses a &lt;strong&gt;hub-and-spoke model&lt;/strong&gt;: every entry stores the original amount in its native currency &lt;em&gt;and&lt;/em&gt; a &lt;code&gt;converted_amount_usd&lt;/code&gt; computed at write time. Only N exchange rates (each currency → USD) are needed, and cross-currency reporting is a single &lt;code&gt;SUM(converted_amount_usd)&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/services/transaction_service.py (simplified)
&lt;/span&gt;&lt;span class="n"&gt;BASE_CURRENCY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_convert_amount_usd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;converted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;quantize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;rounding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ROUND_HALF_UP&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;converted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details matter here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Point-in-time rates.&lt;/strong&gt; The exchange rate is looked up by transaction date, not today's date. A €100 entry posted on June 1st is always valued at the June 1st rate — even if you query it in December. Revaluing past transactions at today's rate would violate accounting immutability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ROUND_HALF_UP&lt;/code&gt;, not banker's rounding.&lt;/strong&gt; Python's built-in round() uses ROUND_HALF_EVEN (2.5 → 2, 3.5 → 4). That minimises cumulative error in statistical aggregations, but in a ledger each row is audited independently. ROUND_HALF_UP matches ISO 20022 conventions and what customers expect — ¥100 at a rate of 0.005 should be $0.01, not $0.00.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Two-hop conversions (JPY → USD → EUR for EUR reporting) accumulate two rounding operations. At MVP scale this is acceptable; a production system with heavy cross-currency reporting would add a dedicated reporting-currency column. Changing BASE_CURRENCY requires a full data migration of every converted_amount_usd value — the constant is treated as immutable once production data exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test Coverage: 96%
&lt;/h3&gt;

&lt;p&gt;Every test runs against a real PostgreSQL instance spun up by &lt;a href="https://testcontainers-python.readthedocs.io/" rel="noopener noreferrer"&gt;testcontainers&lt;/a&gt;. No mocked database, no SQLite substitution — the same engine that runs in production runs in CI. The test suite covers double-entry balance validation, idempotency-key behavior, JWT authentication, audit logging, and balance caching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load Testing: 0% Error Rate
&lt;/h3&gt;

&lt;p&gt;I used &lt;a href="https://locust.io/" rel="noopener noreferrer"&gt;Locust&lt;/a&gt; to simulate authenticated clients mixing transaction writes and balance reads. The API handled &lt;strong&gt;100, 300, and 500 concurrent users with 0% error rate&lt;/strong&gt; on a single-process dev server.&lt;/p&gt;

&lt;p&gt;More interesting than the absolute numbers was the &lt;strong&gt;bottleneck-hunting process&lt;/strong&gt;. Each fix revealed a deeper issue hidden behind the previous one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 — Connection pool exhaustion.&lt;/strong&gt; The first 100-user test hit immediate errors (raw CSV not saved — the fix is tracked in &lt;a href="https://github.com/ikuko-otani/payment-ledger-api/blob/main/docs/tech-debt.md" rel="noopener noreferrer"&gt;TD-026&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SQLAlchemy's default &lt;code&gt;pool_size=5&lt;/code&gt; was too small for concurrent requests. After tuning the pool, the errors disappeared entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — Synchronous bcrypt on the event loop.&lt;/strong&gt; With the pool fixed, the API ran error-free but felt sluggish. Profiling revealed that &lt;code&gt;bcrypt.checkpw()&lt;/code&gt; — a CPU-intensive operation — was blocking the async event loop for the full hash duration, starving all other concurrent requests. The fix was a single line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_password&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plain_password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashed_password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_verify_password_sync&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plain_password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashed_password&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;asyncio.to_thread&lt;/code&gt; offloads the blocking call to a thread pool, freeing the event loop immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 — Single-process ceiling.&lt;/strong&gt; Even after both fixes, throughput at 100 concurrent users plateaued at ~2.4 req/s. The dev server runs a single Python process — one CPU core, one GIL. Scaling to 4 Uvicorn workers (with right-sized connection pools) broke through the ceiling:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fix applied&lt;/th&gt;
&lt;th&gt;Req/s&lt;/th&gt;
&lt;th&gt;Error rate&lt;/th&gt;
&lt;th&gt;Revealed next bottleneck&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline (pool_size=5)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&amp;gt;0% (CSV not saved)&lt;/td&gt;
&lt;td&gt;Connection pool exhaustion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pool size tuned&lt;/td&gt;
&lt;td&gt;2.4&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;bcrypt blocking event loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;bcrypt → asyncio.to_thread&lt;/td&gt;
&lt;td&gt;2.4&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;Single-process ceiling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4 Uvicorn workers&lt;/td&gt;
&lt;td&gt;14.4&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;From 2.4 to 14.4 req/s — a 6× improvement.&lt;/strong&gt; But the aggregate improvement only became visible once the real bottleneck was addressed. It's a good reminder that profiling before optimizing is not optional — and that fixing one bottleneck often just reveals the next.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A note on what these numbers mean:&lt;/strong&gt; The headline p99 (~49 s at 100 users on a single worker) isn't per-request compute cost — it's queueing delay at a saturation point I deliberately pushed past. This is Little's Law in action: with one worker draining ~2.4 req/s, a backlog of 100 concurrent users inevitably stacks up. The exercise was about &lt;em&gt;finding&lt;/em&gt; each saturation point layer by layer, not about the latency number itself.&lt;/p&gt;

&lt;p&gt;Extrapolating per-worker throughput, a modest production deployment — say three 4-worker instances behind a connection pooler like PgBouncer — would reach the low-hundreds of req/s before PostgreSQL write contention on the &lt;code&gt;entries&lt;/code&gt; table becomes the next ceiling. That's the layer I'd profile next.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  CI Pipeline
&lt;/h3&gt;

&lt;p&gt;Every push triggers two parallel jobs — &lt;strong&gt;ruff&lt;/strong&gt; (lint + format) → &lt;strong&gt;mypy --strict&lt;/strong&gt; (type check) → &lt;strong&gt;pip-audit&lt;/strong&gt; (dependency vulnerability scan), and &lt;strong&gt;pytest + testcontainers&lt;/strong&gt; (integration tests with coverage). Once both pass, a final job verifies the &lt;strong&gt;Docker build&lt;/strong&gt;. The full pipeline runs in GitHub Actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned and What's Next
&lt;/h2&gt;

&lt;p&gt;The most transferable lesson was that &lt;strong&gt;correctness and performance need separate verification&lt;/strong&gt;. Every bottleneck fix in this project was individually correct — pool tuning eliminated errors, &lt;code&gt;asyncio.to_thread&lt;/code&gt; freed the event loop — but the aggregate throughput improvement only materialized once the process-count ceiling was also removed. Fixing a bug and fixing the performance of the fix are two different activities.&lt;/p&gt;

&lt;p&gt;Looking back, here's what I'd change if I were starting over:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Adopt event sourcing for the ledger.&lt;/strong&gt; The current design models a &lt;code&gt;PENDING → POSTED → VOIDED&lt;/code&gt; lifecycle with a reversal endpoint for voiding. An append-only event log would go further — capturing the full history of every state transition for compliance, debugging, and replaying ledger state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include load tests in CI.&lt;/strong&gt; Locust results currently live as static snapshots. Running a baseline load test on every PR would catch performance regressions before they reach production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use short-lived tokens with silent refresh.&lt;/strong&gt; A 5-minute token lifetime with a refresh endpoint would tighten the revocation window without reintroducing DB lookups on every request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add explicit locking for concurrent balance updates — &lt;em&gt;if&lt;/em&gt; I introduced a stored balance.&lt;/strong&gt; The current design computes balances on read (&lt;code&gt;SUM&lt;/code&gt; over &lt;code&gt;entries&lt;/code&gt;), so transaction writes are INSERT-only and never contend on the same row (&lt;a href="https://github.com/ikuko-otani/payment-ledger-api/blob/main/docs/adr/002-concurrency-strategy.md" rel="noopener noreferrer"&gt;ADR-002&lt;/a&gt;). If read performance later demanded a materialized &lt;code&gt;balance&lt;/code&gt; column, &lt;em&gt;that&lt;/em&gt; design would require &lt;code&gt;SELECT FOR UPDATE&lt;/code&gt; or advisory locks to prevent lost updates — a trade-off I consciously deferred because the computed model eliminates the problem entirely at this scale.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The full source is on GitHub: &lt;strong&gt;&lt;a href="https://github.com/ikuko-otani/payment-ledger-api" rel="noopener noreferrer"&gt;ikuko-otani/payment-ledger-api&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One design decision I'd especially like to hear opinions on: &lt;strong&gt;&lt;a href="https://github.com/ikuko-otani/payment-ledger-api/blob/main/docs/adr/002-concurrency-strategy.md" rel="noopener noreferrer"&gt;ADR-002&lt;/a&gt;&lt;/strong&gt; covers why I chose &lt;em&gt;computed balance&lt;/em&gt; (no stored balance column, no row locks) over pessimistic or optimistic locking. If you've dealt with high-contention ledgers in production and made a different call — I'd genuinely like to know what drove that decision.&lt;/p&gt;

&lt;p&gt;If you work on payment systems, accounting software, or async Python — feel free to open an issue or drop a comment below.&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>postgres</category>
      <category>fintech</category>
    </item>
  </channel>
</rss>
