Daniel Agoziem

Posted on Oct 19

The Saturday Morning Call: How I Stopped a Fintech Exploit in Real-Time

#programming #security #fintech

Some time in 2023, on a Saturday morning, my phone rang. It was Timi. Why is he calling me on a Saturday morning?

"We have a problem," he said. "Money is missing. People are withdrawing amounts that don't match their wallet balances."

Wait a minute. We haven't even launched yet. The system is still in development. How did this happen? How did they even know about it?

A Little Context

I had built the backend and infrastructure for a fintech app with dedicated bank accounts (DBA), a wallet system, and FX capabilities. We were still in the validation phase, testing internally with a small group. But there was real money in the system, not much, but enough to matter.

While still on that call, I grabbed my laptop and went straight to the back office. First action: pause all debits. Then I started asking the critical questions: Who did this? Is it one person or multiple? Are they using multiple accounts or devices?

I decided to follow the money.

The Trail Goes Cold

I started tracing transactions from wallet funding to peer-to-peer debits and transfers. Something was very wrong. The wallet balances and transaction logs didn't add up.

Here's what I found:

User A funds ₦1,000
User A sends ₦1,000 to User B
User B magically receives ₦2,000
User B repeats the process
That ₦1,000 funding balloons to ₦40,000+ across both wallets

What the hell?

Finding the Clue

I dove into the database transaction logs to understand how everything was being recorded. We saved both prevBalance and newBalance for each transaction. Then I spotted it: the prevBalance hadn't changed between two transaction records.

This was the clue. Something had caused the wallet to check the balance once for two different transactions.

And then it hit me: Race condition!

For Non-Technical Readers

Imagine you have ₦10,000 in your account. You and your friend both try to withdraw ₦10,000 at exactly the same time. one from your phone, one from an ATM.

Here's what happens inside the system:

Both requests check your balance simultaneously
They each see ₦10,000 available
Both decide, "Okay, there's enough money"
Both proceed to withdraw ₦10,000
Result: ₦20,000 withdrawn from an account that only had ₦10,000

The system allowed this because it didn't realize both withdrawals were happening at the same time. The two operations raced each other, and both slipped through before the system could catch them.

That's a race condition, when the result depends on timing, and sometimes both operations sneak through the gap.

The Scammers

At this point, the exploit was clear. Based on device logs and transaction patterns, I identified that they were likely logging into the same account on multiple devices (I found at least three) and pressing the "confirm debit" button simultaneously to exploit the vulnerability.

Clever. But now I knew how they did it.

I couldn't shut down the entire system because of a few bad actors, we had legitimate users with real money at stake. I needed a surgical solution.

The Honeypot

Instead of simply banning their accounts (they'd just create new ones), I decided to trap them.

We saved device fingerprints for every user. So I implemented a multi-layered flagging system:

Layer 1: Device-level bans

Any new account created on a flagged device was automatically banned.

Layer 2: The P2P trap

Here's the brilliant part: If an account was flagged as fraudulent, any P2P transfer they made would automatically flag the recipient wallet too. They could send money between accounts, but none of them could withdraw.

Layer 3: Let them expose themselves

Since they didn't know their wallets were flagged, they'd try to move money around to find a "clean" wallet to withdraw from, inadvertently revealing their entire network of accounts.

I deployed and waited.

A few hours later, it worked perfectly. They took the bait. Unable to withdraw, they started sending money to their other wallets to attempt withdrawals from there. Each transfer exposed another account. In total, they flagged 9 accounts themselves, plus several new ones they frantically created.

Eventually, they gave up.

But Did You Fix the Race Condition?

Yes. I wasn't just waiting around for the next person to try their luck.

The core problem was that credit operations could run concurrently on the same wallet. I needed to ensure they ran sequentially per wallet while still maintaining system performance for different wallets.

The solution needed to work like this:

Wallet 1: Transaction 1 → Transaction 2 (sequential)
    ||
Wallet 2: Transaction 1 → Transaction 2 (sequential)

Wallet 1's transactions happen one by one, but they can still run in parallel with Wallet 2's sequential transactions. This gives us both safety and performance.

The Three-Layer Defense

I implemented three layers of protection, each catching what the previous layer might miss:

1. Sequential Credit Operations Per Wallet (BullMQ Limiter)

BullModule.registerQueue({
  name: 'wallet-credit-queue',
  limiter: {
    max: 1,
    duration: 3000,
    groupKey: 'walletId',
  },
})

This ensures only one job per walletId can be processed at a time. Each wallet has its own isolated queue (via groupKey), running sequentially, but multiple wallets can process jobs in parallel, maximizing throughput without sacrificing safety.

To prevent accidental duplicate jobs (from network retries, etc.), I added queue deduplication:

opts: {
  jobId: `wallet-${walletId}-${uniqueTxnId}`,
}

This ensures only one active job exists per transaction per wallet.

2. Database-Level Safety (Postgres FOR UPDATE)

For every debit/credit operation, the wallet row is locked within a database transaction using Postgres FOR UPDATE:

await this.prisma.$transaction(async (prisma) => {
  // Lock the row for update
  await prisma.$executeRaw`
    SELECT id FROM "wallets" 
    WHERE "id" = ${trx.walletId} 
    FOR UPDATE
  `;

  await prisma.wallet.update({
    where: { id: trx.walletId },
    data: {
      balance: {
        increment: trx.amount.toNumber() + trx.fee.toNumber(),
      },
    },
  });
});

Why this works:

Postgres locks that specific wallet row until the transaction finishes (commit or rollback)
Any other concurrent transaction trying to access the same wallet row must wait
This guarantees sequential execution per wallet without blocking other wallets

Even if two concurrent jobs somehow sneak through the queue layer (due to network retries or edge cases), the database won't allow them to update the same wallet row simultaneously. The second transaction waits until the first finishes.

This prevents:

Double-spending
Race conditions
Data corruption

And it works even under high concurrency or job retries.

3. Atomic Wallet Updates

The $transaction wrapper guarantees that the lock, update, and commit happen as a single atomic operation. If anything fails, there are no partial changes and no corrupted balances.

This makes the operation:

Transaction-safe: All or nothing
Durable: Protected against errors
Consistent: No intermediate states visible to other transactions

The Recovery Process

With the exploit patched and the scammers neutralized, I still had a mess to clean up.

Reconciliation:

I had to reconcile every single transaction from the moment the exploit was discovered. Using the transaction logs and the prevBalance clues, I traced back exactly how much money was legitimately theirs versus how much was "manufactured" through the race condition.

The fraudulent accounts had generated approximately ₦180,000 in fake balances across 9 accounts. Since they couldn't withdraw (thanks to the honeypot), the money never actually left our system but our internal accounting was a disaster.

Legitimate Users:

Here's the part that kept me up at night: Were there legitimate users caught in the crossfire? I audited every transaction during the vulnerability window. Thankfully, the scammers had been the only ones who discovered and exploited it. Our small internal testing group had been using the system normally.

Communication:

I documented everything for Timi and the team. We had to decide: Do we tell the internal testers what happened? We chose transparency. We sent out a message explaining there had been a "security issue that was identified and resolved," without going into the technical details that might give others ideas.

The response was surprisingly positive. People appreciated the honesty and felt more confident knowing we caught it before launch.

The Aftermath and Lessons Learned

What started as a Saturday morning panic call turned into one of the most valuable learning experiences of my career. Here's what this incident taught me:

1. Observability Saved Us

If we hadn't been logging prevBalance and newBalance, I would have been flying blind. That seemingly redundant data point was the smoking gun that led me to the race condition.

After this incident, I became obsessive about observability:

Comprehensive transaction logging
Device fingerprinting (which enabled the honeypot)
Real-time balance monitoring
Anomaly detection alerts

Lesson: Log everything that might help you debug a crisis at 2 AM. Storage is cheap; losing money isn't.

2. Think Like an Attacker

The honeypot strategy came from putting myself in the scammers' shoes. "If I couldn't withdraw from this account, what would I do next?" The answer was obvious: try to move the money to another account.

By anticipating their next move, I turned their escape route into a trap that revealed their entire operation.

Lesson: Security isn't just about prevention; it's about understanding how attackers think and setting up tripwires that expose them when they try to adapt.

3. Testing ≠ Battle-Testing

We thought we were being careful by testing internally before launch. We had test cases. We had a QA process. We thought we were ready.

But we never tested concurrent operations on the same wallet because it didn't occur to us that users would (or could) coordinate simultaneous actions. Real-world usage patterns are creative in ways test cases rarely capture.

Lesson: Stress testing, chaos engineering, and adversarial thinking should be part of your testing strategy, especially for financial systems.

4. Saturday Morning Readiness

I'm grateful this happened before launch, but I'm more grateful it happened on a Saturday morning when I could respond immediately. No meetings, no distractions, just me, my laptop, and a problem to solve.

But what if I hadn't been available? What if I'd been traveling or unreachable? This incident made me realize that for critical systems, you need:

Comprehensive monitoring and alerting
Clear runbooks for common issues
Multiple people who understand the system architecture
The ability to pause operations remotely (which I did)

Lesson: Build systems assuming you won't always be there to save them. Automate what you can, document everything else.

5. The Human Element

Timi's early morning call was crucial. He could have waited until Monday. He could have tried to handle it himself. Instead, he called immediately, even though it was the weekend.

That cultural norm treating issues as urgent, communicating openly, and not being afraid to escalate, made all the difference.

Lesson: Technical solutions matter, but so does team culture. Foster an environment where people feel empowered to raise alarms early and often.

What I'd Do Differently

Looking back, here's what I would change:

1. Concurrent Testing from Day One

I would have tested concurrent operations explicitly, using tools like k6 or Artillery to simulate multiple users hammering the same wallet simultaneously.

2. Security Review Before Internal Testing

Even for "just internal testing," I should have had another engineer review the critical paths especially anything involving money movement.

3. Rate Limiting Earlier

Beyond the race condition fix, I should have implemented rate limiting on sensitive operations. Even if someone found another exploit, rate limiting would slow them down and give us time to respond.

4. Chaos Engineering

I now regularly run chaos experiments: What happens if two requests hit at the exact same millisecond? What if the database is under heavy load? What if network latency causes delays?

The Launch

Two weeks after the incident, after extensive testing, auditing, and monitoring setup, we launched publicly.

The system handled real users beautifully. The three-layer defense held strong. And every time I see a wallet transaction complete successfully, I think about that Saturday morning and the lesson it taught me:

Building secure systems isn't about preventing every possible attack. It's about detecting problems early, responding quickly, and learning from each incident to build something stronger.

The scammers thought they found a goldmine. Instead, they helped us build a fortress.

This was a real incident from a fintech system I built in 2023. All technical details are accurate; some specifics have been generalized to protect privacy. If you're building financial systems, I hope this story helps you avoid the same mistakes I made and gives you ideas for creative solutions when things inevitably go wrong.

Top comments (7)

Victor Awotidebe • Oct 20

The catch strategy was good, more importantly, you built against race conditions. Locking the row on the DB is an excellent solution, as you’ve prevented it irrespective of what happens on the logic layer. That’s how to ensure atomicity

Douglas Muth • Oct 19

I like the "contagion" aspect--ant flagged account/wallet "contaminates" whatever account/wallet it transfers to, thus preventing the attackers from doing damage (withdrawing the money). Nicely done!

Richie Moluno • Oct 19

I like the idea of using a queue to handle the transactions, but in the case of a debit where we expect a post debit action, how do you then trigger this action considering the actual debit is handled asynchronously

Daniel Agoziem • Oct 19

Good catch! The Queue layer is only on the credit as you need response from the debit function to process a withdrawal. So the db transaction and lock mechanism is sufficient.