<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bhagirath</title>
    <description>The latest articles on DEV Community by Bhagirath (@solvian97).</description>
    <link>https://dev.to/solvian97</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3779550%2F2e32bda6-f850-47ec-96d7-b707aef97f4e.jpeg</url>
      <title>DEV Community: Bhagirath</title>
      <link>https://dev.to/solvian97</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/solvian97"/>
    <language>en</language>
    <item>
      <title>Exactly Once Is a Lie: Managing Financial Invariants Under Concurrency</title>
      <dc:creator>Bhagirath</dc:creator>
      <pubDate>Wed, 18 Feb 2026 13:51:26 +0000</pubDate>
      <link>https://dev.to/solvian97/exactly-once-is-a-lie-managing-financial-invariants-under-concurrency-5bdl</link>
      <guid>https://dev.to/solvian97/exactly-once-is-a-lie-managing-financial-invariants-under-concurrency-5bdl</guid>
      <description>&lt;p&gt;Most engineers believe in “exactly once” execution — until they build a money movement system.&lt;/p&gt;

&lt;p&gt;Then reality teaches them otherwise.&lt;/p&gt;

&lt;p&gt;In distributed financial systems, requests time out. Providers retry. Webhooks arrive late. Networks drop responses. And somewhere in between, money is in motion.&lt;/p&gt;

&lt;p&gt;The real danger isn’t failure.&lt;br&gt;
The real danger is misclassifying the unknown as failure.&lt;/p&gt;

&lt;p&gt;This article breaks down:&lt;br&gt;
Why exactly-once is a lie&lt;br&gt;
Why UNKNOWN is more dangerous than FAILURE&lt;br&gt;
How to model financial invariants correctly&lt;br&gt;
How to diagnose and scale when settlement delays increase&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Part 1: The Problem — Fast UX, Inconsistent Ledger&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Imagine this scenario:&lt;/p&gt;

&lt;p&gt;Aman has ₹1000. He transfers ₹1000 to Naman.&lt;br&gt;
Your system sends a debit request to Naman’s bank provider.&lt;br&gt;
The provider doesn’t respond within timeout.&lt;br&gt;
Now you have three possible realities:&lt;br&gt;
The provider processed the transfer successfully.&lt;br&gt;
The provider dropped the request.&lt;br&gt;
The provider is still processing it.&lt;/p&gt;

&lt;p&gt;You don’t know which one.&lt;br&gt;
That’s the UNKNOWN state.&lt;/p&gt;

&lt;p&gt;Most systems simplify this into:&lt;br&gt;
-&amp;gt;SUCCESS&lt;br&gt;
-&amp;gt;FAILURE&lt;br&gt;
-&amp;gt;UNKNOWN&lt;/p&gt;

&lt;p&gt;And then they treat UNKNOWN as FAILURE.&lt;br&gt;
So the system refunds ₹1000 to Aman.&lt;br&gt;
Aman sees ₹1000 available again and sends another transfer.&lt;/p&gt;

&lt;p&gt;Later, the original provider request succeeds.&lt;br&gt;
Total transferred: ₹2000. Actual balance: ₹1000.&lt;/p&gt;

&lt;p&gt;Now someone must absorb the ₹1000 loss.&lt;/p&gt;

&lt;p&gt;This isn’t a UX glitch.&lt;br&gt;
This is a violation of a financial invariant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Invariant That Must Never Break&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In financial systems:&lt;/p&gt;

&lt;p&gt;Total debits must never exceed available balance.&lt;br&gt;
Or more formally:&lt;br&gt;
Available Balance + Locked Balance = Ledger Balance must never go negative due to concurrency.&lt;/p&gt;

&lt;p&gt;If your state model allows temporary misclassification of UNKNOWN as FAILURE, you’re silently enabling double-spend.&lt;br&gt;
That’s how real financial losses happen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why “Exactly Once” Is a Lie&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Exactly-once semantics don’t exist across network boundaries. What you actually get is at-least-once delivery with delayed or duplicated signals.&lt;/p&gt;

&lt;p&gt;Safety doesn’t come from transport guarantees. It comes from system design.&lt;/p&gt;

&lt;p&gt;You compensate using:&lt;br&gt;
  Idempotency keys&lt;br&gt;
  Deduplication logic&lt;br&gt;
  Reconciliation jobs&lt;br&gt;
  Strict ledger invariants&lt;br&gt;
  Exactly-once is a transport fantasy. Financial safety is an                   accounting discipline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymk0b55pbrdj11flmp36.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymk0b55pbrdj11flmp36.jpg" alt=" " width="800" height="1869"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: Modeling the UNKNOWN Correctly
&lt;/h2&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;p&gt;UNKNOWN isn’t failure.&lt;br&gt;
UNKNOWN is unsettled liability.&lt;br&gt;
That means funds must not be considered available until final confirmation.&lt;br&gt;
Instead of collapsing states into three buckets, model them as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;INITIATED&lt;/li&gt;
&lt;li&gt;PENDING_EXTERNAL_CONFIRMATION&lt;/li&gt;
&lt;li&gt;SETTLED_SUCCESS&lt;/li&gt;
&lt;li&gt;SETTLED_FAILURE&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key rule:&lt;/p&gt;

&lt;p&gt;Until success or confirmed failure, funds must remain locked.&lt;br&gt;
So instead of refunding immediately, the system:&lt;br&gt;
Moves ₹1000 from Available → Locked&lt;br&gt;
Displays:&lt;br&gt;
Available: ₹0&lt;br&gt;
Locked: ₹1000 (Pending)&lt;br&gt;
Total: ₹1000&lt;/p&gt;

&lt;p&gt;Now Aman understands reality: his money is in motion.&lt;/p&gt;

&lt;p&gt;If the provider later confirms failure → release locked funds.&lt;br&gt;
If the provider confirms success → settle permanently.&lt;/p&gt;

&lt;p&gt;No invariant is broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Is Also a Product Decision&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many teams chase “fast UX.”&lt;/p&gt;

&lt;p&gt;They show:&lt;br&gt;
Immediate success&lt;br&gt;
Or immediate rollback&lt;/p&gt;

&lt;p&gt;Because users “don’t like waiting.”&lt;br&gt;
But showing fake certainty creates real risk.&lt;/p&gt;

&lt;p&gt;Financial UX must reflect system reality.&lt;br&gt;
It’s better to show “Waiting for external confirmation” than “Success” → “Oops, failed” — or worse, silent financial exposure.&lt;br&gt;
This is where engineering and product must align.&lt;br&gt;
Accuracy over illusion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What a Minimal Safe Architecture Looks Like&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ledger Service (Source of Truth)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Maintains Available and Locked balances&lt;/li&gt;
&lt;li&gt;Enforces balance invariants&lt;/li&gt;
&lt;li&gt;Owns all state transitions that affect money&lt;/li&gt;
&lt;li&gt;Ensures: Available + Locked = Ledger Balance&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Transfer Service&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Generates a unique transfer ID before any external call&lt;/li&gt;
&lt;li&gt;Persists the transaction in INITIATED state&lt;/li&gt;
&lt;li&gt;Moves funds from Available → Locked&lt;/li&gt;
&lt;li&gt;Calls the provider only after the state is safely stored&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Provider Adapter Layer&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Attaches idempotency key to every outbound request&lt;/li&gt;
&lt;li&gt;Handles retries safely&lt;/li&gt;
&lt;li&gt;Never assumes timeout equals failure&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Webhook Handler&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Processes provider callbacks idempotently&lt;/li&gt;
&lt;li&gt;Validates transfer ID before any state change&lt;/li&gt;
&lt;li&gt;Transitions PENDING_EXTERNAL_CONFIRMATION → SETTLED_SUCCESS or SETTLED_FAILURE&lt;/li&gt;
&lt;li&gt;Never updates balances without going through ledger rules&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Reconciliation Worker&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Periodically scans stale PENDING transactions&lt;/li&gt;
&lt;li&gt;Queries provider for final status&lt;/li&gt;
&lt;li&gt;Resolves drift between internal state and external settlement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical rule is simple:&lt;br&gt;
Only the ledger is allowed to change balances.&lt;br&gt;
External systems influence state — they do not define truth.&lt;br&gt;
When this boundary is clear, invariants remain enforceable even under retries, delays, or duplicate callbacks.&lt;/p&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: When UNKNOWN Ratio Starts Increasing
&lt;/h2&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;p&gt;Now let’s move from correctness to operations.&lt;br&gt;
Suppose your metrics show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error rate: flat&lt;/li&gt;
&lt;li&gt;CPU: normal&lt;/li&gt;
&lt;li&gt;DB load: normal&lt;/li&gt;
&lt;li&gt;But UNKNOWN transactions increasing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where most engineers start guessing.&lt;br&gt;
Instead, reason through signals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Provider Delayed Finality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If webhooks arrive in bursts after delay and system load is normal:&lt;br&gt;
Likely cause:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provider queue backlog&lt;/li&gt;
&lt;li&gt;Network jitter&lt;/li&gt;
&lt;li&gt;Provider-side throttling
This is delayed finality — not failure.
The risk here is exposure accumulation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you process 10,000 transfers/hour and even 0.1% remain UNKNOWN, that’s 10 pending transfers per hour.&lt;br&gt;
At ₹40,000 average ticket size → ₹4 lakh exposure accumulating per hour.&lt;/p&gt;

&lt;p&gt;Delayed finality becomes financial exposure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Real Pattern Seen in Production&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In one production environment, the UNKNOWN ratio increased from 0.2% to nearly 3% within 30 minutes.&lt;/p&gt;

&lt;p&gt;Error rate remained flat.&lt;br&gt;
CPU usage was stable.&lt;br&gt;
Database load looked normal.&lt;/p&gt;

&lt;p&gt;At first glance, nothing appeared broken.&lt;/p&gt;

&lt;p&gt;The root cause was provider-side queue congestion during peak traffic. Webhooks were delayed by 8–12 minutes due to backlog.&lt;br&gt;
System throughput was approximately 12,000 transfers per hour.&lt;/p&gt;

&lt;p&gt;Average ticket size was around ₹35,000.&lt;br&gt;
Within 45 minutes, locked exposure crossed ₹1 crore.&lt;/p&gt;

&lt;p&gt;No invariant had broken. No double-spend occurred.&lt;br&gt;
But financial exposure was accumulating silently.&lt;/p&gt;

&lt;p&gt;Traffic was throttled before exposure crossed internal safety thresholds.&lt;br&gt;
The lesson wasn’t about error handling.&lt;/p&gt;

&lt;p&gt;It was about understanding that UNKNOWN is time-sensitive risk.&lt;br&gt;
When settlement latency stretches, exposure grows — even if error metrics stay green.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Internal Bottleneck&lt;/strong&gt;&lt;br&gt;
If:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Webhook arrival steady&lt;/li&gt;
&lt;li&gt;UNKNOWN increasing&lt;/li&gt;
&lt;li&gt;DB locks increasing&lt;/li&gt;
&lt;li&gt;Queue depth rising&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the problem is internal.&lt;/p&gt;

&lt;p&gt;Possible causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lock contention&lt;/li&gt;
&lt;li&gt;Ledger write bottleneck&lt;/li&gt;
&lt;li&gt;Idempotency table hotspot&lt;/li&gt;
&lt;li&gt;Serialization conflict&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn’t provider delay. This is internal conflict.&lt;br&gt;
And if you misdiagnose it as provider delay, you scale the wrong layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Mature Systems Do&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They don’t just monitor “error rate.”&lt;/p&gt;

&lt;p&gt;They monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UNKNOWN ratio&lt;/li&gt;
&lt;li&gt;Settlement P95 time&lt;/li&gt;
&lt;li&gt;Exposure amount (₹ locked)&lt;/li&gt;
&lt;li&gt;Reconciliation lag&lt;/li&gt;
&lt;li&gt;Webhook processing latency&lt;/li&gt;
&lt;li&gt;DB lock wait time
Because financial correctness isn’t binary — it’s time-sensitive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4: Scaling Without Breaking Invariants
&lt;/h2&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;p&gt;Now comes the hard part: How do you scale without relaxing safety?&lt;br&gt;
You can’t remove UNKNOWN state. You must contain it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Exposure Caps&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Limit total locked funds per provider.&lt;/li&gt;
&lt;li&gt;If exposure crosses threshold:&lt;/li&gt;
&lt;li&gt;Slow down new transfers&lt;/li&gt;
&lt;li&gt;Or route via secondary provider
This is risk-based throttling.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Circuit Breakers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If settlement latency crosses threshold:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop initiating new transfers&lt;/li&gt;
&lt;li&gt;Notify operations
Better to be temporarily unavailable than financially insolvent.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Automated Reconciliation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Scheduled job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Re-check all PENDING &amp;gt; X minutes&lt;/li&gt;
&lt;li&gt;Query provider status&lt;/li&gt;
&lt;li&gt;Auto-settle where possible
Never rely solely on webhook arrival.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Idempotency Everywhere&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every outbound request:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unique transfer ID&lt;/li&gt;
&lt;li&gt;Stored before calling provider&lt;/li&gt;
&lt;li&gt;Used to reconcile duplicates
But remember: Idempotency prevents duplicates. It doesn’t guarantee exactly-once.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your ledger invariant does that.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;UX That Reflects Truth&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of:&lt;br&gt;
Balance: ₹1000&lt;br&gt;
After transfer: ₹0&lt;br&gt;
Show:&lt;br&gt;
Available: ₹0&lt;br&gt;
Locked: ₹1000 (Pending confirmation)&lt;br&gt;
Total: ₹1000&lt;br&gt;
User sees reality. No fake certainty. No silent exposure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Real Lesson&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;UNKNOWN isn’t a temporary inconvenience.&lt;/p&gt;

&lt;p&gt;It’s a state that tests whether your system respects financial invariants.&lt;/p&gt;

&lt;p&gt;If you treat UNKNOWN as FAILURE, you risk double-spend.&lt;br&gt;
If you treat UNKNOWN as SUCCESS, you risk false confirmation.&lt;br&gt;
If you ignore UNKNOWN growth, you risk accumulating exposure silently.&lt;/p&gt;

&lt;p&gt;Exactly-once execution is a comforting myth.&lt;br&gt;
What actually protects financial systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strict ledger invariants&lt;/li&gt;
&lt;li&gt;Locked funds modeling&lt;/li&gt;
&lt;li&gt;Exposure monitoring&lt;/li&gt;
&lt;li&gt;Delayed finality handling&lt;/li&gt;
&lt;li&gt;Honest UX&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most systems don’t collapse because engineers misunderstand distributed systems.&lt;/p&gt;

&lt;p&gt;They collapse because they optimize for perceived speed before protecting financial invariants.&lt;/p&gt;

&lt;p&gt;Exactly-once execution isn’t real.&lt;br&gt;
Delayed signals are.&lt;br&gt;
Retries are.&lt;br&gt;
Unknown states are.&lt;/p&gt;

&lt;p&gt;In financial systems, invariants don’t care about your timeout values or UX shortcuts.&lt;/p&gt;

&lt;p&gt;They either hold — or money leaks.&lt;/p&gt;

&lt;p&gt;And when money leaks, theory stops mattering.&lt;/p&gt;

</description>
      <category>fintech</category>
      <category>backend</category>
      <category>web3</category>
      <category>infrastructure</category>
    </item>
  </channel>
</rss>
