Ikuko OTANI

Posted on Jun 27 • Edited on Jun 30

Double-Entry Bookkeeping in Python: How I Built a Fintech-Grade Ledger API with FastAPI and async SQLAlchemy

#python #fastapi #postgres #fintech

Every payment system has the same class of bug waiting to happen: a network timeout triggers a retry, and the retry creates a duplicate charge. The customer pays twice and the ledger is wrong.

I spent over a decade building enterprise systems — mostly in PHP and Oracle PL/SQL. Financial accounting, order management, and manufacturing systems were a recurring theme across my work.

When I decided to transition into modern backend engineering, I chose a domain I already understood deeply: double-entry bookkeeping. This let me demonstrate production-level engineering with a modern async Python stack — without getting distracted by unfamiliar business rules.

The result is payment-ledger-api: a REST API that enforces the same ledger invariants and conventions that payment processors like Stripe and Mollie expose in their public APIs.

👉 Live Swagger UI demo (hosted on Fly.io — first request may take a few seconds while the machine wakes up)

What Is Double-Entry Bookkeeping?

Every financial event is recorded as two equal and opposite entries — a debit on one account and a credit on another. The ledger invariant is simple:

Total debits = Total credits — always, for every transaction.

For example, when a customer pays a €100 invoice:

Account	Direction	Amount
Cash (Asset)	Debit	€100
Revenue	Credit	€100

If the entries don't balance, the transaction is rejected. This invariant is the foundation of every accounting system — and the core constraint this API enforces.

Architecture at a Glance

This API doesn't serve production traffic — but every decision in it is one I'd be ready to defend in a production design review. That framing drove every trade-off below.

Tech Stack

Layer	Technology
Language	Python 3.12
Framework	FastAPI (async)
ORM	SQLAlchemy 2.0 + asyncpg
Database	PostgreSQL 16
Cache	Redis 7
Auth	JWT (PyJWT) + bcrypt
Observability	OpenTelemetry + Jaeger, structlog (JSON)
Metrics	Prometheus + Grafana
CI	GitHub Actions
Deploy	Fly.io

Data Model

Three entities model the accounting domain:

accounts 1 ──── N entries N ──── 1 transactions

accounts — the chart of accounts (Asset, Liability, Equity, Revenue, Expense)
transactions — immutable headers representing a single financial event
entries — debit/credit lines; each transaction has ≥ 2 entries that must balance

Key Features

Double-entry transactions with balance validation enforced at two layers — application and database
Multi-currency support — hub-and-spoke USD conversion at write time with point-in-time exchange rates
Idempotency-key support via Redis — safely retry POST requests without creating duplicates
Immutable audit log — append-only audit_logs with JSONB before/after snapshots, written atomically with every mutation
JWT authentication with role-based access control (admin / auditor)
96% test coverage using testcontainers — every test runs against a real PostgreSQL instance, no mocks
Distributed tracing with OpenTelemetry + Jaeger (local dev via docker compose)
Prometheus metrics via /metrics with a pre-built Grafana dashboard — tracks request counts, latency histograms, and in-progress requests per route
Structured JSON logging via structlog — every request log includes request_id, trace_id, method, path, status_code, and latency_ms; trace_id ties each log line to the corresponding Jaeger span

Here's what a traced POST /transactions request looks like in Jaeger — each child span corresponds to an individual SQL query generated by SQLAlchemy:

CI pipeline: lint (ruff + mypy --strict + pip-audit) and test (pytest + testcontainers) run in parallel → Docker build

Defense in Depth: Two Layers of Balance Enforcement

The double-entry invariant (SUM(debits) == SUM(credits)) is enforced at two independent layers:

Application layer — the service validates before persisting, returning a clear 422 error:

# app/services/transaction_service.py (simplified)
debit_sum = sum(e.amount for e in payload.entries if e.direction == Direction.DEBIT)
credit_sum = sum(e.amount for e in payload.entries if e.direction == Direction.CREDIT)

if debit_sum != credit_sum:
    raise ValidationError(
        detail=f"Entries are not balanced: debit={debit_sum} credit={credit_sum}"
    )

Database layer — a PostgreSQL constraint trigger acts as a safety net, catching any write that bypasses the service layer (direct SQL, migration scripts, admin tools):

CREATE OR REPLACE FUNCTION check_entries_balance()
RETURNS TRIGGER AS $$
DECLARE
    debit_sum  BIGINT;
    credit_sum BIGINT;
BEGIN
    SELECT
        COALESCE(SUM(amount) FILTER (WHERE direction = 'DEBIT'),  0),
        COALESCE(SUM(amount) FILTER (WHERE direction = 'CREDIT'), 0)
    INTO debit_sum, credit_sum
    FROM entries
    WHERE transaction_id = NEW.transaction_id;

    IF debit_sum <> credit_sum THEN
        RAISE EXCEPTION 'entries are not balanced: debit=% credit=%',
            debit_sum, credit_sum
            USING ERRCODE = 'check_violation';
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE CONSTRAINT TRIGGER trg_check_entries_balance
AFTER INSERT ON entries
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW EXECUTE FUNCTION check_entries_balance();

Why DEFERRABLE INITIALLY DEFERRED? Entries are inserted row by row within a single transaction. A normal trigger would fire after the first INSERT — when only the debit side exists — and immediately reject it. Deferring to COMMIT time ensures all entries are present before validating the balance.

Four Design Decisions Worth Discussing

The full list of Architecture Decision Records lives in the docs/adr/ directory. Here are the four I found most interesting to implement.

1. Money as Integers, Not Floats

All monetary amounts are stored as BIGINT in the currency's smallest unit — 1000 means €10.00, not ten thousand euros. A separate currency VARCHAR(3) column carries the ISO 4217 code (foreign-keyed to a currencies table).

Why? IEEE 754 floating-point cannot represent 0.1 exactly. 0.1 + 0.2 = 0.30000000000000004 is a famous example, and in a ledger that sums thousands of entries, those rounding errors accumulate into real balance discrepancies. Integer arithmetic eliminates this class of bug entirely.

This is the same convention Stripe, Mollie, and Adyen use in their public APIs. If you've ever seen "amount": 1099 in a Stripe response, that's €10.99 represented as an integer in cents.

Trade-off: Cryptocurrencies with 8+ decimal places need a different strategy (e.g., NUMERIC(30,8)). For fiat currencies, BIGINT is more than sufficient.

2. Redis-Backed Idempotency with Response Replay

POST /transactions accepts an Idempotency-Key header (value must be a valid UUID). The key is stored in Redis with a 24-hour TTL, following a two-phase state machine inspired by Stripe's implementation:

New request — SET NX stores a SHA-256 fingerprint of the request body with "status": "pending"
On success — the pending marker is overwritten with the serialized response, so duplicate requests replay the cached response body with a 200 status instead of returning an error
In-flight duplicate — if the same key and body arrive while the first request is still processing (status is still "pending"), the API returns 409 to prevent concurrent execution
Fingerprint mismatch — if the same key is reused with a different request body, the API returns 422 to prevent silent request substitution
On failure — the key is deleted, allowing the client to retry with the same key

Here's the core of the state machine — a FastAPI dependency that wraps the entire request lifecycle:

# app/dependencies/idempotency.py (simplified)

body = await request.body()
fingerprint = hashlib.sha256(body).hexdigest()
redis_key = f"idempotency:{idempotency_key}"

pending_data = json.dumps({"fingerprint": fingerprint, "status": "pending"})
was_set = await redis.set(redis_key, pending_data, nx=True, ex=86_400)

if not was_set:
    stored = json.loads(await redis.get(redis_key))
    if stored["fingerprint"] != fingerprint:
        raise HTTPException(422, "Idempotency-Key reused with different request body")
    if "response" in stored:
        yield IdempotencyContext(replay=stored["response"])  # → 200 replay
        return
    raise HTTPException(409, "Duplicate request")

try:
    yield ctx          # route handler runs here
except Exception:
    await redis.delete(redis_key)  # failed → allow retry
    raise

The yield is the key design choice — it turns this dependency into an async context manager. The route handler executes between yield and the except block, so failures automatically clean up the Redis key.

One caveat on the fingerprint: it hashes the raw request bytes, not a canonicalized form — so a client that retries with semantically identical but byte-different JSON (reordered keys, extra whitespace) would receive a 422. That's an acceptable assumption here because a well-behaved client resends the exact same serialized payload on retry; canonicalizing first (e.g. the JSON Canonicalization Scheme, RFC 8785) would relax it, at the cost of parsing and re-serializing every request body before hashing.

Why? In payment systems, network failures trigger retries. Without idempotency, a retry could create a duplicate transaction — charging a customer twice. The response-replay mechanism goes further: instead of just blocking duplicates, it returns the exact same response the client would have received, making retries truly transparent.

Trade-off: Redis is a hard dependency on the write path. If Redis is unavailable, POST /transactions returns 500 rather than silently skipping the idempotency check — I chose correctness over availability because a skipped check could create duplicate transactions. The response body is also cached in Redis alongside the fingerprint, increasing per-key storage. This gives effectively-once semantics, not true exactly-once: a crash between the DB commit and the Redis cache-write leaves the key in "pending" state, causing retries to receive 409 until the 24-hour TTL expires — closing that window would require an atomic commit across both stores.

Here's what it looks like in practice:

# First request → 201 Created
curl -X POST https://payment-ledger-api.fly.dev/api/v1/transactions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000" \
  -H "Content-Type: application/json" \
  -d '{"description":"Invoice #42", ...}'
# → 201 {"id": "...", "status": "POSTED", ...}

# Same key, same body → 200 (replayed from cache)
curl -X POST ... (same command)
# → 200 {"id": "...", "status": "POSTED", ...}  ← same response body

# Same key, different body → 422 (fingerprint mismatch)
curl -X POST ... -d '{"description":"Invoice #99", ...}'
# → 422 {"detail": "Idempotency-Key reused with different request body"}

Three requests, three different outcomes — all from the same key.

3. JWT Claims That Eliminate Per-Request DB Lookups

At login, the user's role and is_active status are embedded directly in the JWT payload. Authenticated requests are resolved entirely from the token — no database query required.

Why? The previous implementation fetched the full User object from PostgreSQL on every authenticated request. That single SELECT dominated the per-request latency budget. Since the only fields needed downstream are id, role, and is_active — all known at login time — embedding them in the token turns authentication into a pure in-memory decode (microseconds, not milliseconds).

Trade-off: Role changes and deactivations are not reflected in existing tokens until they expire (configured to 30 minutes in this deployment). In a single-instance deployment with no concurrent admin operations, this window is acceptable. In a multi-tenant production system, I'd tighten this to 5-minute tokens with a silent refresh endpoint — or add a Redis-based token blocklist (one extra RTT per request, but exact revocation).

4. Hub-and-Spoke Currency Conversion

A ledger that only handles one currency is a toy. But supporting N currencies naively — storing exchange rates for every possible pair — means maintaining N×(N−1)/2 rates. At 10 currencies, that's 45 rows to keep current.

This API uses a hub-and-spoke model: every entry stores the original amount in its native currency and a converted_amount_usd computed at write time. Only N exchange rates (each currency → USD) are needed, and cross-currency reporting is a single SUM(converted_amount_usd).

# app/services/transaction_service.py (simplified)
BASE_CURRENCY = "USD"

def _convert_amount_usd(amount: int, rate: Decimal) -> int:
    converted = (Decimal(amount) * rate).quantize(
        Decimal("1"), rounding=ROUND_HALF_UP
    )
    return int(converted)

Two details matter here:

Point-in-time rates. The exchange rate is looked up by transaction date, not today's date. A €100 entry posted on June 1st is always valued at the June 1st rate — even if you query it in December. Revaluing past transactions at today's rate would violate accounting immutability.

ROUND_HALF_UP, not banker's rounding. Python's built-in round() uses ROUND_HALF_EVEN (2.5 → 2, 3.5 → 4). That minimises cumulative error in statistical aggregations, but in a ledger each row is audited independently. ROUND_HALF_UP matches ISO 20022 conventions and what customers expect — ¥100 at a rate of 0.005 should be $0.01, not $0.00.

Trade-off: Two-hop conversions (JPY → USD → EUR for EUR reporting) accumulate two rounding operations. At MVP scale this is acceptable; a production system with heavy cross-currency reporting would add a dedicated reporting-currency column. Changing BASE_CURRENCY requires a full data migration of every converted_amount_usd value — the constant is treated as immutable once production data exists.

The Numbers

Test Coverage: 96%

Every test runs against a real PostgreSQL instance spun up by testcontainers. No mocked database, no SQLite substitution — the same engine that runs in production runs in CI. The test suite covers double-entry balance validation, idempotency-key behavior, JWT authentication, audit logging, and balance caching.

Load Testing: 0% Error Rate

I used Locust to simulate authenticated clients mixing transaction writes and balance reads. The API handled 100, 300, and 500 concurrent users with 0% error rate on a single-process dev server.

More interesting than the absolute numbers was the bottleneck-hunting process. Each fix revealed a deeper issue hidden behind the previous one:

Layer 1 — Connection pool exhaustion. The first 100-user test hit immediate errors (raw CSV not saved — the fix is tracked in TD-026):

sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached

SQLAlchemy's default pool_size=5 was too small for concurrent requests. After tuning the pool, the errors disappeared entirely.

Layer 2 — Synchronous bcrypt on the event loop. With the pool fixed, the API ran error-free but felt sluggish. Profiling revealed that bcrypt.checkpw() — a CPU-intensive operation — was blocking the async event loop for the full hash duration, starving all other concurrent requests. The fix was a single line:

async def verify_password(plain_password: str, hashed_password: str) -> bool:
    return await asyncio.to_thread(_verify_password_sync, plain_password, hashed_password)

asyncio.to_thread offloads the blocking call to a thread pool, freeing the event loop immediately.

Layer 3 — Single-process ceiling. Even after both fixes, throughput at 100 concurrent users plateaued at ~2.4 req/s. The dev server runs a single Python process — one CPU core, one GIL. Scaling to 4 Uvicorn workers (with right-sized connection pools) broke through the ceiling:

Fix applied	Req/s	Error rate	Revealed next bottleneck
Baseline (pool_size=5)	—	>0% (CSV not saved)	Connection pool exhaustion
Pool size tuned	2.4	0%	bcrypt blocking event loop
bcrypt → asyncio.to_thread	2.4	0%	Single-process ceiling
4 Uvicorn workers	14.4	0%	—

From 2.4 to 14.4 req/s — a 6× improvement. But the aggregate improvement only became visible once the real bottleneck was addressed. It's a good reminder that profiling before optimizing is not optional — and that fixing one bottleneck often just reveals the next.

A note on what these numbers mean: The headline p99 (~49 s at 100 users on a single worker) isn't per-request compute cost — it's queueing delay at a saturation point I deliberately pushed past. This is Little's Law in action: with one worker draining ~2.4 req/s, a backlog of 100 concurrent users inevitably stacks up. The exercise was about finding each saturation point layer by layer, not about the latency number itself.

Extrapolating per-worker throughput, a modest production deployment — say three 4-worker instances behind a connection pooler like PgBouncer — would reach the low-hundreds of req/s before PostgreSQL write contention on the entries table becomes the next ceiling. That's the layer I'd profile next.

CI Pipeline

Every push triggers two parallel jobs — ruff (lint + format) → mypy --strict (type check) → pip-audit (dependency vulnerability scan), and pytest + testcontainers (integration tests with coverage). Once both pass, a final job verifies the Docker build. The full pipeline runs in GitHub Actions.

What I Learned and What's Next

The most transferable lesson was that correctness and performance need separate verification. Every bottleneck fix in this project was individually correct — pool tuning eliminated errors, asyncio.to_thread freed the event loop — but the aggregate throughput improvement only materialized once the process-count ceiling was also removed. Fixing a bug and fixing the performance of the fix are two different activities.

Looking back, here's what I'd change if I were starting over:

Adopt event sourcing for the ledger. The current design models a PENDING → POSTED → VOIDED lifecycle with a reversal endpoint for voiding. An append-only event log would go further — capturing the full history of every state transition for compliance, debugging, and replaying ledger state.
Include load tests in CI. Locust results currently live as static snapshots. Running a baseline load test on every PR would catch performance regressions before they reach production.
Use short-lived tokens with silent refresh. A 5-minute token lifetime with a refresh endpoint would tighten the revocation window without reintroducing DB lookups on every request.
Add explicit locking for concurrent balance updates — if I introduced a stored balance. The current design computes balances on read (SUM over entries), so transaction writes are INSERT-only and never contend on the same row (ADR-002). If read performance later demanded a materialized balance column, that design would require SELECT FOR UPDATE or advisory locks to prevent lost updates — a trade-off I consciously deferred because the computed model eliminates the problem entirely at this scale.

The full source is on GitHub: ikuko-otani/payment-ledger-api

One design decision I'd especially like to hear opinions on: ADR-002 covers why I chose computed balance (no stored balance column, no row locks) over pessimistic or optimistic locking. If you've dealt with high-contention ledgers in production and made a different call — I'd genuinely like to know what drove that decision.

If you work on payment systems, accounting software, or async Python — feel free to open an issue or drop a comment below.

DEV Community