Harry Do

Posted on Oct 25

How We Handle Concurrency Control in Financial Systems

#architecture #database #systemdesign

The Problem: When Data Integrity Breaks Down

It's the end of a busy financial period. Two team members are working on the same critical financial record—one is finalizing it, the other just discovered an error and is making corrections.

Both click "Save" at nearly the same time. The system accepts both changes.

Later, during a review, someone notices the data doesn't look right. It's neither what the first person entered nor what the second person corrected—it's a corrupted mix of both. Worse, the audit trail is incomplete. No one can tell what happened or when.

This is the nightmare scenario that keeps financial system architects awake at night.

Why Financial Systems Are Different

In a social media app, if two users accidentally overwrite each other's comments, it's annoying. In a financial system, data integrity isn't just important—it's legally mandated.

When you're dealing with money, regulatory compliance, and financial reporting that could affect shareholder decisions or SEC filings, you can't have:

Lost Updates: One person's approved transaction being silently overwritten by another's edit
Inconsistent State: A transaction being approved for financial reporting while someone else is still modifying it
Audit Trail Gaps: Missing records of who changed what and when—a regulatory compliance nightmare
Compliance Violations: Inaccurate financial reports that could trigger investigations, fines, or worse
Cascading Errors: Wrong figures feeding into quarterly reports, tax calculations, and investor statements

In financial systems, every cent must be accounted for, every change must be tracked, and data integrity is non-negotiable.

Our Mission: Protecting Financial Data Integrity

After witnessing the chaos that uncontrolled concurrent access can cause, we set out to build a system with one core principle: First come, first served—and everyone else gets told exactly what's happening.

Our philosophy is simple:

Priority to Speed: The first user to start an operation gets to complete it
No Silent Overwrites: If a second user tries to update based on outdated data, we reject the operation with a clear error message—forcing them to refresh, review the latest changes, and then make their update based on current data
Clarity for Others: Anyone who tries to modify the same data gets a clear, actionable error message
Zero Tolerance for Data Loss: We'd rather block an operation than risk corrupting financial records

Three War Stories: When Concurrency Goes Wrong

Story #1: The Race Condition

Two team members receive an alert about an error in a financial record. They both open it simultaneously and start making corrections.

User A saves their changes. A few seconds later, User B saves.

What should happen?

User A's save goes through. User B gets a clear message: "This record was modified by another user while you were editing. Please refresh and try again."

User B refreshes, sees the fix is already done, and continues their work.

What could go wrong without protection?

Without concurrency control, both saves might succeed. The final data could be a mix of both changes, or worse—one person's entire update could be silently overwritten, causing data loss in financial records.

Story #2: The Moving Target

A supervisor is reviewing a financial record for approval. The data looks good, so they click "Approve."

But there's a problem: while the supervisor had the approval screen open, another user discovered an error and was actively updating that same record.

What should happen?

The system blocks the approval attempt with a message: "This record is currently being modified by another user. Please wait and try again."

Why this matters in financial systems:

The supervisor was about to approve data that was actively being changed. In financial systems, approving a record locks it for regulatory reporting. If they approved incomplete or incorrect data, it could cascade into financial statements, tax calculations, and compliance reports—creating serious regulatory risks.

Story #3: The Time Traveler's Mistake

An approver opens a financial record to review it. They get interrupted by a meeting, leaving their browser tab open for 30 minutes.

While they're away, another user discovers an error and updates the record with corrected values.

The approver returns and, without refreshing, clicks "Approve"—still looking at the old data on their screen.

What should happen?

The system detects they're trying to approve an outdated version. They get a message: "This record has been modified since you opened it. Please refresh to see the latest version before approving."

The financial compliance angle:

The approver made a decision based on stale data. In financial systems, approvers must see current, accurate data before making decisions. Approving outdated data isn't just a technical bug—it's a control failure that auditors flag during compliance reviews.

The Solution: Two Locks for Two Problems

Looking at our three stories, we noticed something interesting: they represent two fundamentally different concurrency problems.

Stories #1 and #2 are about concurrent operations—multiple people trying to modify or approve the same record at the same time. We need to prevent them from stepping on each other's toes.

Story #3 is about version conflicts—someone making decisions based on outdated data. We need to detect when data has changed since they last looked at it.

Different problems require different solutions:

Problem Type	Solution	Which Stories
Concurrent Operations	Pessimistic Locking (Redis)	#1, #2
Version Conflicts	Optimistic Locking	#3

Solution #1: Pessimistic Locking (For Concurrent Operations)

The Challenge

When Emma and James both try to edit Transaction #A2547, or when Lisa tries to approve while Michael is editing, we need to physically prevent them from accessing the same record at the same time. One person gets the lock, everyone else waits.

Think of it like a bathroom door lock—only one person at a time, and everyone else can see it's occupied.

Two Ways to Lock: Database vs Redis

We considered two approaches:

Option 1: Redis Distributed Locks

Before any user touches a record, we check Redis: "Is anyone else working on this record?" If yes, they wait. If no, we create a lock entry in Redis indicating someone is editing it.

Advantages:

Works across multiple servers
Supports batch approval jobs that run for 15+ minutes
Locks automatically expire if something crashes
Doesn't tie up database connections

Downsides:

We need to run Redis (one more thing to maintain)
We have to handle lock logic carefully in code

Option 2: Database Row Locks (SELECT FOR UPDATE)

Use the database's built-in locking with SELECT FOR UPDATE. When a user queries a record for editing, the database locks that row until they're done.

Advantages:

No extra infrastructure needed
Automatic cleanup when transaction commits
Database handles deadlocks automatically

Downsides:

Keeps database connections busy during long operations
Doesn't work for async batch jobs (can't hold a lock across job queues)
Under heavy load, we could run out of database connections

Why We Chose Redis

We went with Redis for one critical reason: batch operations.

Financial systems often need to process hundreds or thousands of records at once (like batch approvals). These operations run as background jobs that might take 15-30 minutes. Database locks can't survive across job queue boundaries—the HTTP request ends, the database transaction commits, and the lock is gone before the background job even starts.

With Redis, we can:

Acquire the lock when the user initiates a batch operation
Store the lock token in the database
Pass it to the background job via message queue
Have the job release the lock when done

Plus, for financial systems, we'd rather sacrifice a bit of infrastructure complexity than risk exhausting our database connection pool during critical processing periods.

Solution #2: Optimistic Locking (For Version Conflicts)

The Problem with Stale Data

Remember Story #3? An approver opened a record, got interrupted, and came back 30 minutes later to approve it—not knowing another user had updated it in the meantime.

We can't lock the record for 30 minutes while someone is away. That would block everyone else from working on it. Instead, we use "optimistic locking"—we assume conflicts are rare, but we verify the data hasn't changed before committing.

How We Detect Version Changes

We track versions two different ways, depending on how the database table works:

Strategy 1: ID-Based Versioning (For Audit Tables)

Some financial tables never delete or overwrite data—for audit compliance. Every edit creates a new record with a new ID, and we mark the old one as deleted.

When someone tries to approve:

Their browser sends: "I want to approve record ID abc123"
Backend checks: "What's the current active record?"
If the current record has a different ID (someone created a new version), we reject the approval
They get told: "This has been modified. Please review the latest version."

Strategy 2: Timestamp-Based Versioning (For Regular Tables)

For tables that update in place, we use the updated_at timestamp as a version number.

When someone tries to approve:

Their browser sends: "I want to approve, and I'm looking at the version from [timestamp]"
Backend checks the current updated_at timestamp
If timestamps don't match → reject the approval
They refresh and see the latest data

How Both Locks Work Together

The two mechanisms form a complete defense system. Every operation goes through both checks:

┌────────���────────────────────────────────────────────────────┐
│                      User Request                            │
│                 (Edit/Approve Record)                        │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │  Pessimistic Lock    │
              │  (Redis Lock Check)  │
              └──────────┬───────────┘
                         │
                    Lock Acquired?
                    │         │
                   Yes        No
                    │         │
                    │         └──► Return Error:
                    │              "Record is being modified"
                    │
                    ▼
              ┌──────────────────────┐
              │  Optimistic Lock     │
              │  (Version Check)     │
              └──────────┬───────────┘
                         │
                   Version Match?
                    │         │
                   Yes        No
                    │         │
                    │         └──► Release Lock
                    │              Return Error:
                    │              "Version conflict detected"
                    │
                    ▼
              ┌──────────────────────┐
              │  Perform Operation   │
              │  (Update/Approve)    │
              └──────────┬───────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │   Release Lock       │
              └──────────────────────┘

Step 1: Pessimistic Lock catches concurrent operations happening right now.

Step 2: Optimistic Lock catches changes that happened earlier while the user was away.

Together, they ensure financial data integrity from every angle.

Implementation Details: How We Built It

This section walks through the actual implementation of our Redis-based locking system.

Choosing the Right Redis Library

We had two options for Redis locking in Go:

bsm/redislock - Simple, works great with a single Redis master
go-redsync/redsync - Implements Redlock algorithm for multi-master Redis clusters

We chose bsm/redislock because our Redis deployment is single-master. For multi-master setups, you'd want go-redsync to handle the distributed consensus problem.

How the Lock System Works

Every lock in Redis follows a simple pattern:

Lock Key Format: lock_event_{resource}_{entity_id}

For example: lock_event_transaction_A2547 when Emma is editing Transaction #A2547.

Lock Lifetime (TTL):

Quick edits: 10 seconds
Data imports: 30 seconds
Batch approvals: 15 minutes

Retry Strategy: If the lock is busy, we retry 3 times with exponential backoff (50ms, 100ms, 200ms). After that, we tell the user someone else is working on it.

Lock Metadata: We store what operation is holding the lock (create/update/delete/approve). This lets us give users helpful error messages like "This record is being approved" instead of generic "Resource locked" errors.

How Locks Work in Practice

When a user tries to edit a financial record, here's what happens:

System generates a lock key based on the record identifier
Check Redis: Is this locked? If yes, what operation is holding it?
If available, create the lock with a unique token and store what operation is happening
Set TTL so it auto-expires (prevents orphaned locks if something crashes)

The lock stored in Redis contains:

A unique key identifying the specific record
A random token proving ownership
Metadata about the operation type (edit/approve/delete)

The token ensures only the lock owner can release it. The operation metadata helps show helpful error messages ("Record is being edited" instead of generic "Resource locked").

Two Ways to Release Locks

Pattern 1: Auto-Release (For Quick Operations)

For normal edits that finish in a few seconds:

Acquire the lock
Do the update
Automatically release when done (even if something crashes). We are using Golang, so putting the lock release in a defer function would be efficient.
TTL: 10-30 seconds

Examples: Editing a field, updating an amount, creating a new record

Pattern 2: Manual Release (For Background Jobs)

For batch operations that take 15+ minutes:

The Problem: When a user initiates a large batch operation, the web request returns immediately, but the actual processing happens in a background job. If we auto-release the lock when the web request finishes, the lock is gone before the job even starts.

The Solution:

Web request acquires the lock
Store the lock token in the database
Pass the token to the background job via message queue
Background job releases the lock when it finishes

This way, the lock survives across the process boundary. If the job crashes, the lock expires after 15 minutes (TTL).

Safe Lock Release with Lua Script:

The manual release uses a Lua script to safely release locks. According to Redis distributed locks documentation, this is the correct way to avoid accidentally releasing another client's lock:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This script ensures we only delete the lock if the token matches—preventing us from accidentally releasing a lock that belongs to another process.

When locks get released:

Job completes successfully → Released immediately
Job fails after max retries → Released (can retry later with fresh lock)
System crashes → Redis auto-expires after TTL

Lessons Learned: Building Concurrency Control for Financial Systems

When to Use Which Lock

Use Pessimistic Locking (Redis) when:

Multiple users are actively editing the same records right now
You need to block concurrent operations completely
Operations might take a while or run in background jobs
You need locks to survive across different servers/processes

Use Optimistic Locking (Version Check) when:

You want to detect if data changed while user was away
Conflicts are rare and you don't want to block everyone
Operations are quick and you just need to verify data freshness at commit time
You want defense-in-depth alongside pessimistic locks

What We Got Right

TTL on everything - No orphaned locks if something crashes
Exponential backoff retries - Give legitimate operations a chance to finish
Operation metadata in locks - Users get helpful error messages
Two-phase approach - Pessimistic + Optimistic catches all scenarios
Lock monitoring - Track acquisition times, contention rates, timeouts
Graceful Redis failures - Circuit breakers prevent cascading failures

The Trade-offs We Made

Performance vs Safety: Yes, locking adds latency. But in financial systems, correctness matters more than speed. We'd rather users wait a fraction of a second than risk data corruption.

Complexity vs Reliability: Redis adds infrastructure to maintain. But it's worth it to avoid database connection exhaustion and support async workflows.

Fine-grained locks: We lock individual records, not entire tables. This reduces contention but requires careful key design.

Final Thoughts

In financial systems, data integrity isn't optional. Every record must be accurate, every change must be tracked, and every concurrent access must be controlled.

The two-lock approach—pessimistic for real-time conflicts, optimistic for stale data—gives us defense in depth. And by choosing Redis over database locks, we can support the long-running batch operations that financial workflows require.

Is it more complex than no locking? Absolutely. Is it worth it? When dealing with financial data, regulatory compliance, and audit trails, the answer is always yes.

DEV Community

How We Handle Concurrency Control in Financial Systems

The Problem: When Data Integrity Breaks Down

Why Financial Systems Are Different

Our Mission: Protecting Financial Data Integrity

Three War Stories: When Concurrency Goes Wrong

Story #1: The Race Condition

Story #2: The Moving Target

Story #3: The Time Traveler's Mistake

The Solution: Two Locks for Two Problems

Solution #1: Pessimistic Locking (For Concurrent Operations)

The Challenge

Two Ways to Lock: Database vs Redis

Option 1: Redis Distributed Locks

Option 2: Database Row Locks (SELECT FOR UPDATE)

Why We Chose Redis

Solution #2: Optimistic Locking (For Version Conflicts)

The Problem with Stale Data

How We Detect Version Changes

Strategy 1: ID-Based Versioning (For Audit Tables)

Strategy 2: Timestamp-Based Versioning (For Regular Tables)

How Both Locks Work Together

Implementation Details: How We Built It

Choosing the Right Redis Library

How the Lock System Works

How Locks Work in Practice

Two Ways to Release Locks

Pattern 1: Auto-Release (For Quick Operations)

Pattern 2: Manual Release (For Background Jobs)

Lessons Learned: Building Concurrency Control for Financial Systems

When to Use Which Lock

What We Got Right

The Trade-offs We Made

Final Thoughts

Top comments (0)