The Bug That Doesn't Show Up in Testing.
When I set out to build a production-grade wallet system, I expected the hard parts to be the database schema, the transaction logic, or the authentication layer.
What I didn't expect was that the most dangerous bug in the system would have nothing to do with wrong code - it would be about timing.
This is the story of a race condition that lets users spend money they don't have, why it's invisible until it hits production, and how Redis turned it into a non-issue.
What a Wallet System Actually Does
Before we get to the bug, a quick picture of the system. A wallet holds a balance. Users can:
Credit - add funds
Debit - spend funds
Transfer - move funds to another user
Reverse - undo a transaction
Every operation reads the current balance, validates it, and writes a new transaction record. Simple enough. The problem hides in that gap between reading and writing.
The Race Condition
Here's the scenario. A user has ₦5,000 in their wallet. They're on a slow connection, so they tap "Pay" twice. Two HTTP requests hit the server within milliseconds of each other. Both are trying to debit ₦3,000.
Without any concurrency protection, this is what happens:
Request A → reads balance: ₦5,000
Request B → reads balance: ₦5,000
Request A → ₦5,000 - ₦3,000 = ₦2,000 ✓ writes transaction
Request B → ₦5,000 - ₦3,000 = ₦2,000 ✓ writes transaction
Both requests read ₦5,000. Both pass the balance check. Both write a successful debit. The user just spent ₦6,000 from a ₦5,000 wallet - and the system called both transactions successful.
This isn't a theoretical edge case. Under any real concurrent load, this happens. And the worst part: it never shows up in testing, because your test runner doesn't fire two real HTTP requests at the same millisecond.
Why a Database Transaction Doesn't Save You
The instinct here is to reach for a database transaction. Wrap the whole thing in BEGIN / COMMIT and let Postgres sort it out.
The problem is that the race happens before the database transaction begins. Both requests read the balance from the database (or worse, from a cache), both pass the validation check in application code, and then both open their database transactions. The damage is already done.
You need a lock at the application layer, before the read even happens.
The Fix: Distributed Locking with Redis
The solution is to ensure that no two operations on the same wallet can run at the same time. Before any balance operation begins, it must claim exclusive access. If another operation already holds the claim, it retries until the lock is free, or gives up after a timeout.
This is a distributed lock, and Redis is the right tool for it. Here's the implementation:
// src/utils/lock.utils.ts
import { randomUUID } from "crypto";
import redisClient from "../services/redis.service";
const LOCK_TTL_MS = 10_000;
const LOCK_RETRY_COUNT = 50;
const LOCK_RETRY_DELAY_MS = 100;
const RELEASE_SCRIPT = `
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
`;
const acquireLock = async (key: string): Promise<string | null> => {
const token = randomUUID();
const result = await redisClient.set(key, token, "PX", LOCK_TTL_MS, "NX");
return result === "OK" ? token : null;
};
export const withLock = async <T>(userId: string, fn: () => Promise<T>): Promise<T> => {
const key = `wallet:lock:${userId}`;
let token: string | null = null;
for (let attempt = 0; attempt <= LOCK_RETRY_COUNT; attempt++) {
token = await acquireLock(key);
if (token) break;
await new Promise((resolve) => setTimeout(resolve, LOCK_RETRY_DELAY_MS));
}
if (!token) throw new Error("Could not acquire lock — resource is busy");
try {
return await fn();
} finally {
await redisClient.eval(RELEASE_SCRIPT, 1, key, token);
}
};
Let's break down each decision.
SET key token PX 10000 NX
This is a single atomic Redis command that does three things at once:
NX — only set the key if it does not already exist
PX 10000 — expire the key after 10 seconds
The value is a unique token (UUID), not just 1 or "locked"
The atomicity of SET NX is the core guarantee. There is no gap between "check if key exists" and "set the key" — it's one operation. Two requests racing to acquire the same lock cannot both succeed.
Retry logic
Rather than immediately rejecting a request when the lock is held, the implementation retries up to 50 times with a 100ms delay between each attempt - giving it up to 5 seconds to acquire the lock before giving up. This handles brief overlaps gracefully without dropping legitimate requests.
The TTL
The 10-second expiry is a safety net. If the application crashes mid-operation and fails to release the lock, it will expire automatically. Without this, one crash would lock that user's wallet forever.
The Lua script on release
This is where most implementations get it wrong. A naive release looks like this:
await redisClient.del(key); // ❌ dangerous
Here's why that's broken: imagine a slow operation acquires the lock, but takes longer than 10 seconds. The TTL expires. A second operation acquires the lock. Now the first operation finishes and calls del - it just released the second operation's lock, not its own. The guarantee is broken.
The Lua script fixes this by checking the token before deleting:
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
Because Lua scripts execute atomically in Redis, the check-and-delete is a single uninterruptible operation. You can only release a lock you actually own.
The withLock wrapper
Rather than exposing raw acquireLock / releaseLock functions and trusting every caller to use them correctly, the implementation wraps the entire pattern in a higher-order function:
// src/services/wallet.service.ts
export const debitWallet = async (userId: string, amount: number): Promise<Transaction> => {
return await withLock(userId, async () => {
const wallet = await prisma.wallet.findUnique({ where: { userId } });
if (!wallet || wallet.balance.lessThan(amount)) {
throw new Error("Insufficient balance");
}
// create transaction, update balance...
return transaction;
});
};
The finally block inside withLock ensures the lock is always released, even if the operation throws. A lock that leaks on error is as bad as no lock at all.
Preventing Deadlocks on Transfers
A transfer touches two wallets - sender and receiver. If two transfers run simultaneously in opposite directions (A→B and B→A), they can deadlock: each holds one lock and waits for the other indefinitely.
The fix is to always acquire locks in the same order, regardless of who is sending to whom. We sort the user IDs before acquiring:
export const withMultiLock = async <T>(userIds: string[], fn: () => Promise<T>): Promise<T> => {
const sortedIds = [...new Set(userIds)].sort();
const keys = sortedIds.map((id) => `wallet:lock:${id}`);
const tokens: string[] = [];
try {
for (const key of keys) {
let token: string | null = null;
for (let attempt = 0; attempt <= LOCK_RETRY_COUNT; attempt++) {
token = await acquireLock(key);
if (token) break;
await new Promise((resolve) => setTimeout(resolve, LOCK_RETRY_DELAY_MS));
}
if (!token) throw new Error("Could not acquire lock — resource is busy");
tokens.push(token);
}
return await fn();
} finally {
await Promise.all(keys.map((key, i) => releaseLock(key, tokens[i])));
}
};
// In transfer service:
const result = await withMultiLock([senderId, receiver.id], async () => {
// both wallets locked — safe to proceed
});
If A→B and B→A both arrive at the same time, both will sort to [A, B] and try to acquire A's lock first. One wins, the other waits. No deadlock.
The Full System
The distributed lock is one layer in a larger stack. The complete wallet engine includes:
Idempotency - Idempotency-Key header on credit and debit prevents duplicate transactions on network retries, backed by a 24-hour Redis TTL
Webhook delivery with retry - BullMQ queue, 4 attempts, exponential backoff (2s base), per-attempt audit trail
Balance caching - 60-second Redis TTL, invalidated on every write
Token blacklisting - SHA-256 hashed JWTs stored in Redis with matching expiry, making logout actually secure
Cursor-based pagination - transaction history that scales
Full test suite - 103 tests, all mocked, no database required to run.
Built with TypeScript, Express, Prisma, PostgreSQL, and Redis (ioredis). Deployed on Render with a GitHub Actions CI/CD pipeline.
Why This Matters
Race conditions in financial systems are not exotic. They're the default outcome when you don't explicitly prevent them. Most wallet tutorials never mention it. Most wallet implementations in production have it.
The fix isn't complicated - but you have to know to look for it.
If you're building anything that touches money, check your balance reads. If there's no lock between the read and the write, the race condition is already there.
Open for Collaboration
This project is open source and actively welcoming contributors. Whether you want to add a feature, improve test coverage, or spot something that could be done better - PRs are open.
Check out the CONTRIBUTING.md for how to get set up and where to send your changes.
GitHub: https://github.com/tochi27/Wallet_System.git
Live API + Swagger: https://wallet-system-api-16cv.onrender.com/api-docs
Top comments (0)