Every payment system has the same class of bug waiting to happen: a network timeout triggers a retry, and the retry creates a duplicate charge. The customer pays twice and the ledger is wrong.
I spent over a decade building enterprise systems ā mostly in PHP and Oracle PL/SQL. Financial accounting, order management, and manufacturing systems were a recurring theme across my work.
When I decided to transition into modern backend engineering, I chose a domain I already understood deeply: double-entry bookkeeping. This let me demonstrate production-level engineering with a modern async Python stack ā without getting distracted by unfamiliar business rules.
The result is payment-ledger-api: a REST API that enforces the same ledger invariants and conventions that payment processors like Stripe and Mollie expose in their public APIs.
š Live Swagger UI demo (hosted on Fly.io ā first request may take a few seconds while the machine wakes up)
What Is Double-Entry Bookkeeping?
Every financial event is recorded as two equal and opposite entries ā a debit on one account and a credit on another. The ledger invariant is simple:
Total debits = Total credits ā always, for every transaction.
For example, when a customer pays a ā¬100 invoice:
| Account | Direction | Amount |
|---|---|---|
| Cash (Asset) | Debit | ā¬100 |
| Revenue | Credit | ā¬100 |
If the entries don't balance, the transaction is rejected. This invariant is the foundation of every accounting system ā and the core constraint this API enforces.
Architecture at a Glance
This API doesn't serve production traffic ā but every decision in it is one I'd be ready to defend in a production design review. That framing drove every trade-off below.
Tech Stack
| Layer | Technology |
|---|---|
| Language | Python 3.12 |
| Framework | FastAPI (async) |
| ORM | SQLAlchemy 2.0 + asyncpg |
| Database | PostgreSQL 16 |
| Cache | Redis 7 |
| Auth | JWT (PyJWT) + bcrypt |
| Observability | OpenTelemetry + Jaeger, structlog (JSON) |
| Metrics | Prometheus + Grafana |
| CI | GitHub Actions |
| Deploy | Fly.io |
Data Model
Three entities model the accounting domain:
accounts 1 āāāā N entries N āāāā 1 transactions
- accounts ā the chart of accounts (Asset, Liability, Equity, Revenue, Expense)
- transactions ā immutable headers representing a single financial event
- entries ā debit/credit lines; each transaction has ā„ 2 entries that must balance
Key Features
- Double-entry transactions with balance validation enforced at two layers ā application and database
- Multi-currency support ā hub-and-spoke USD conversion at write time with point-in-time exchange rates
- Idempotency-key support via Redis ā safely retry
POSTrequests without creating duplicates - Immutable audit log ā append-only
audit_logswith JSONB before/after snapshots, written atomically with every mutation - JWT authentication with role-based access control (
admin/auditor) - 96% test coverage using testcontainers ā every test runs against a real PostgreSQL instance, no mocks
- Distributed tracing with OpenTelemetry + Jaeger (local dev via
docker compose) - Prometheus metrics via
/metricswith a pre-built Grafana dashboard ā tracks request counts, latency histograms, and in-progress requests per route - Structured JSON logging via structlog ā every request log includes
request_id,trace_id,method,path,status_code, andlatency_ms;trace_idties each log line to the corresponding Jaeger span
Here's what a traced POST /transactions request looks like in Jaeger ā each child span corresponds to an individual SQL query generated by SQLAlchemy:
- CI pipeline: lint (ruff + mypy --strict + pip-audit) and test (pytest + testcontainers) run in parallel ā Docker build
Defense in Depth: Two Layers of Balance Enforcement
The double-entry invariant (SUM(debits) == SUM(credits)) is enforced at two independent layers:
Application layer ā the service validates before persisting, returning a clear 422 error:
# app/services/transaction_service.py (simplified)
debit_sum = sum(e.amount for e in payload.entries if e.direction == Direction.DEBIT)
credit_sum = sum(e.amount for e in payload.entries if e.direction == Direction.CREDIT)
if debit_sum != credit_sum:
raise ValidationError(
detail=f"Entries are not balanced: debit={debit_sum} credit={credit_sum}"
)
Database layer ā a PostgreSQL constraint trigger acts as a safety net, catching any write that bypasses the service layer (direct SQL, migration scripts, admin tools):
CREATE OR REPLACE FUNCTION check_entries_balance()
RETURNS TRIGGER AS $$
DECLARE
debit_sum BIGINT;
credit_sum BIGINT;
BEGIN
SELECT
COALESCE(SUM(amount) FILTER (WHERE direction = 'DEBIT'), 0),
COALESCE(SUM(amount) FILTER (WHERE direction = 'CREDIT'), 0)
INTO debit_sum, credit_sum
FROM entries
WHERE transaction_id = NEW.transaction_id;
IF debit_sum <> credit_sum THEN
RAISE EXCEPTION 'entries are not balanced: debit=% credit=%',
debit_sum, credit_sum
USING ERRCODE = 'check_violation';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER trg_check_entries_balance
AFTER INSERT ON entries
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW EXECUTE FUNCTION check_entries_balance();
Why DEFERRABLE INITIALLY DEFERRED? Entries are inserted row by row within a single transaction. A normal trigger would fire after the first INSERT ā when only the debit side exists ā and immediately reject it. Deferring to COMMIT time ensures all entries are present before validating the balance.
Four Design Decisions Worth Discussing
The full list of Architecture Decision Records lives in the docs/adr/ directory. Here are the four I found most interesting to implement.
1. Money as Integers, Not Floats
All monetary amounts are stored as BIGINT in the currency's smallest unit ā 1000 means ā¬10.00, not ten thousand euros. A separate currency VARCHAR(3) column carries the ISO 4217 code (foreign-keyed to a currencies table).
Why? IEEE 754 floating-point cannot represent 0.1 exactly. 0.1 + 0.2 = 0.30000000000000004 is a famous example, and in a ledger that sums thousands of entries, those rounding errors accumulate into real balance discrepancies. Integer arithmetic eliminates this class of bug entirely.
This is the same convention Stripe, Mollie, and Adyen use in their public APIs. If you've ever seen "amount": 1099 in a Stripe response, that's ā¬10.99 represented as an integer in cents.
Trade-off: Cryptocurrencies with 8+ decimal places need a different strategy (e.g., NUMERIC(30,8)). For fiat currencies, BIGINT is more than sufficient.
2. Redis-Backed Idempotency with Response Replay
POST /transactions accepts an Idempotency-Key header (value must be a valid UUID). The key is stored in Redis with a 24-hour TTL, following a two-phase state machine inspired by Stripe's implementation:
-
New request ā
SET NXstores a SHA-256 fingerprint of the request body with"status": "pending" -
On success ā the pending marker is overwritten with the serialized response, so duplicate requests replay the cached response body with a
200status instead of returning an error -
In-flight duplicate ā if the same key and body arrive while the first request is still processing (status is still
"pending"), the API returns409to prevent concurrent execution -
Fingerprint mismatch ā if the same key is reused with a different request body, the API returns
422to prevent silent request substitution - On failure ā the key is deleted, allowing the client to retry with the same key
Here's the core of the state machine ā a FastAPI dependency that wraps the entire request lifecycle:
# app/dependencies/idempotency.py (simplified)
body = await request.body()
fingerprint = hashlib.sha256(body).hexdigest()
redis_key = f"idempotency:{idempotency_key}"
pending_data = json.dumps({"fingerprint": fingerprint, "status": "pending"})
was_set = await redis.set(redis_key, pending_data, nx=True, ex=86_400)
if not was_set:
stored = json.loads(await redis.get(redis_key))
if stored["fingerprint"] != fingerprint:
raise HTTPException(422, "Idempotency-Key reused with different request body")
if "response" in stored:
yield IdempotencyContext(replay=stored["response"]) # ā 200 replay
return
raise HTTPException(409, "Duplicate request")
try:
yield ctx # route handler runs here
except Exception:
await redis.delete(redis_key) # failed ā allow retry
raise
The yield is the key design choice ā it turns this dependency into an async context manager. The route handler executes between yield and the except block, so failures automatically clean up the Redis key.
One caveat on the fingerprint: it hashes the raw request bytes, not a canonicalized form ā so a client that retries with semantically identical but byte-different JSON (reordered keys, extra whitespace) would receive a 422. That's an acceptable assumption here because a well-behaved client resends the exact same serialized payload on retry; canonicalizing first (e.g. the JSON Canonicalization Scheme, RFC 8785) would relax it, at the cost of parsing and re-serializing every request body before hashing.
Why? In payment systems, network failures trigger retries. Without idempotency, a retry could create a duplicate transaction ā charging a customer twice. The response-replay mechanism goes further: instead of just blocking duplicates, it returns the exact same response the client would have received, making retries truly transparent.
Trade-off: Redis is a hard dependency on the write path. If Redis is unavailable, POST /transactions returns 500 rather than silently skipping the idempotency check ā I chose correctness over availability because a skipped check could create duplicate transactions. The response body is also cached in Redis alongside the fingerprint, increasing per-key storage. This gives effectively-once semantics, not true exactly-once: a crash between the DB commit and the Redis cache-write leaves the key in "pending" state, causing retries to receive 409 until the 24-hour TTL expires ā closing that window would require an atomic commit across both stores.
Here's what it looks like in practice:
# First request ā 201 Created
curl -X POST https://payment-ledger-api.fly.dev/api/v1/transactions \
-H "Authorization: Bearer $TOKEN" \
-H "Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000" \
-H "Content-Type: application/json" \
-d '{"description":"Invoice #42", ...}'
# ā 201 {"id": "...", "status": "POSTED", ...}
# Same key, same body ā 200 (replayed from cache)
curl -X POST ... (same command)
# ā 200 {"id": "...", "status": "POSTED", ...} ā same response body
# Same key, different body ā 422 (fingerprint mismatch)
curl -X POST ... -d '{"description":"Invoice #99", ...}'
# ā 422 {"detail": "Idempotency-Key reused with different request body"}
Three requests, three different outcomes ā all from the same key.
3. JWT Claims That Eliminate Per-Request DB Lookups
At login, the user's role and is_active status are embedded directly in the JWT payload. Authenticated requests are resolved entirely from the token ā no database query required.
Why? The previous implementation fetched the full User object from PostgreSQL on every authenticated request. That single SELECT dominated the per-request latency budget. Since the only fields needed downstream are id, role, and is_active ā all known at login time ā embedding them in the token turns authentication into a pure in-memory decode (microseconds, not milliseconds).
Trade-off: Role changes and deactivations are not reflected in existing tokens until they expire (configured to 30 minutes in this deployment). In a single-instance deployment with no concurrent admin operations, this window is acceptable. In a multi-tenant production system, I'd tighten this to 5-minute tokens with a silent refresh endpoint ā or add a Redis-based token blocklist (one extra RTT per request, but exact revocation).
4. Hub-and-Spoke Currency Conversion
A ledger that only handles one currency is a toy. But supporting N currencies naively ā storing exchange rates for every possible pair ā means maintaining NĆ(Nā1)/2 rates. At 10 currencies, that's 45 rows to keep current.
This API uses a hub-and-spoke model: every entry stores the original amount in its native currency and a converted_amount_usd computed at write time. Only N exchange rates (each currency ā USD) are needed, and cross-currency reporting is a single SUM(converted_amount_usd).
# app/services/transaction_service.py (simplified)
BASE_CURRENCY = "USD"
def _convert_amount_usd(amount: int, rate: Decimal) -> int:
converted = (Decimal(amount) * rate).quantize(
Decimal("1"), rounding=ROUND_HALF_UP
)
return int(converted)
Two details matter here:
Point-in-time rates. The exchange rate is looked up by transaction date, not today's date. A ā¬100 entry posted on June 1st is always valued at the June 1st rate ā even if you query it in December. Revaluing past transactions at today's rate would violate accounting immutability.
ROUND_HALF_UP, not banker's rounding. Python's built-in round() uses ROUND_HALF_EVEN (2.5 ā 2, 3.5 ā 4). That minimises cumulative error in statistical aggregations, but in a ledger each row is audited independently. ROUND_HALF_UP matches ISO 20022 conventions and what customers expect ā Ā„100 at a rate of 0.005 should be $0.01, not $0.00.
Trade-off: Two-hop conversions (JPY ā USD ā EUR for EUR reporting) accumulate two rounding operations. At MVP scale this is acceptable; a production system with heavy cross-currency reporting would add a dedicated reporting-currency column. Changing BASE_CURRENCY requires a full data migration of every converted_amount_usd value ā the constant is treated as immutable once production data exists.
The Numbers
Test Coverage: 96%
Every test runs against a real PostgreSQL instance spun up by testcontainers. No mocked database, no SQLite substitution ā the same engine that runs in production runs in CI. The test suite covers double-entry balance validation, idempotency-key behavior, JWT authentication, audit logging, and balance caching.
Load Testing: 0% Error Rate
I used Locust to simulate authenticated clients mixing transaction writes and balance reads. The API handled 100, 300, and 500 concurrent users with 0% error rate on a single-process dev server.
More interesting than the absolute numbers was the bottleneck-hunting process. Each fix revealed a deeper issue hidden behind the previous one:
Layer 1 ā Connection pool exhaustion. The first 100-user test hit immediate errors (raw CSV not saved ā the fix is tracked in TD-026):
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached
SQLAlchemy's default pool_size=5 was too small for concurrent requests. After tuning the pool, the errors disappeared entirely.
Layer 2 ā Synchronous bcrypt on the event loop. With the pool fixed, the API ran error-free but felt sluggish. Profiling revealed that bcrypt.checkpw() ā a CPU-intensive operation ā was blocking the async event loop for the full hash duration, starving all other concurrent requests. The fix was a single line:
async def verify_password(plain_password: str, hashed_password: str) -> bool:
return await asyncio.to_thread(_verify_password_sync, plain_password, hashed_password)
asyncio.to_thread offloads the blocking call to a thread pool, freeing the event loop immediately.
Layer 3 ā Single-process ceiling. Even after both fixes, throughput at 100 concurrent users plateaued at ~2.4 req/s. The dev server runs a single Python process ā one CPU core, one GIL. Scaling to 4 Uvicorn workers (with right-sized connection pools) broke through the ceiling:
| Fix applied | Req/s | Error rate | Revealed next bottleneck |
|---|---|---|---|
| Baseline (pool_size=5) | ā | >0% (CSV not saved) | Connection pool exhaustion |
| Pool size tuned | 2.4 | 0% | bcrypt blocking event loop |
| bcrypt ā asyncio.to_thread | 2.4 | 0% | Single-process ceiling |
| 4 Uvicorn workers | 14.4 | 0% | ā |
From 2.4 to 14.4 req/s ā a 6Ć improvement. But the aggregate improvement only became visible once the real bottleneck was addressed. It's a good reminder that profiling before optimizing is not optional ā and that fixing one bottleneck often just reveals the next.
A note on what these numbers mean: The headline p99 (~49 s at 100 users on a single worker) isn't per-request compute cost ā it's queueing delay at a saturation point I deliberately pushed past. This is Little's Law in action: with one worker draining ~2.4 req/s, a backlog of 100 concurrent users inevitably stacks up. The exercise was about finding each saturation point layer by layer, not about the latency number itself.
Extrapolating per-worker throughput, a modest production deployment ā say three 4-worker instances behind a connection pooler like PgBouncer ā would reach the low-hundreds of req/s before PostgreSQL write contention on the
entriestable becomes the next ceiling. That's the layer I'd profile next.
CI Pipeline
Every push triggers two parallel jobs ā ruff (lint + format) ā mypy --strict (type check) ā pip-audit (dependency vulnerability scan), and pytest + testcontainers (integration tests with coverage). Once both pass, a final job verifies the Docker build. The full pipeline runs in GitHub Actions.
What I Learned and What's Next
The most transferable lesson was that correctness and performance need separate verification. Every bottleneck fix in this project was individually correct ā pool tuning eliminated errors, asyncio.to_thread freed the event loop ā but the aggregate throughput improvement only materialized once the process-count ceiling was also removed. Fixing a bug and fixing the performance of the fix are two different activities.
Looking back, here's what I'd change if I were starting over:
-
Adopt event sourcing for the ledger. The current design models a
PENDING ā POSTED ā VOIDEDlifecycle with a reversal endpoint for voiding. An append-only event log would go further ā capturing the full history of every state transition for compliance, debugging, and replaying ledger state. - Include load tests in CI. Locust results currently live as static snapshots. Running a baseline load test on every PR would catch performance regressions before they reach production.
- Use short-lived tokens with silent refresh. A 5-minute token lifetime with a refresh endpoint would tighten the revocation window without reintroducing DB lookups on every request.
-
Add explicit locking for concurrent balance updates ā if I introduced a stored balance. The current design computes balances on read (
SUMoverentries), so transaction writes are INSERT-only and never contend on the same row (ADR-002). If read performance later demanded a materializedbalancecolumn, that design would requireSELECT FOR UPDATEor advisory locks to prevent lost updates ā a trade-off I consciously deferred because the computed model eliminates the problem entirely at this scale.
The full source is on GitHub: ikuko-otani/payment-ledger-api
One design decision I'd especially like to hear opinions on: ADR-002 covers why I chose computed balance (no stored balance column, no row locks) over pessimistic or optimistic locking. If you've dealt with high-contention ledgers in production and made a different call ā I'd genuinely like to know what drove that decision.
If you work on payment systems, accounting software, or async Python ā feel free to open an issue or drop a comment below.

Top comments (0)