<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Praval Parikh</title>
    <description>The latest articles on DEV Community by Praval Parikh (@praval_parikh).</description>
    <link>https://dev.to/praval_parikh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3244418%2F4ebadf3b-0d2b-4779-b987-8b95b67031aa.jpg</url>
      <title>DEV Community: Praval Parikh</title>
      <link>https://dev.to/praval_parikh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/praval_parikh"/>
    <language>en</language>
    <item>
      <title>I Built a Scalable Financial Transaction System That Stays Correct Under Load</title>
      <dc:creator>Praval Parikh</dc:creator>
      <pubDate>Sat, 03 Jan 2026 13:28:12 +0000</pubDate>
      <link>https://dev.to/praval_parikh/i-built-a-scalable-financial-transaction-system-that-stays-correct-under-load-3b03</link>
      <guid>https://dev.to/praval_parikh/i-built-a-scalable-financial-transaction-system-that-stays-correct-under-load-3b03</guid>
      <description>&lt;p&gt;Financial systems rarely fail because they are slow.&lt;br&gt;&lt;br&gt;
They fail because they are &lt;strong&gt;wrong&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A duplicated transaction, an out-of-order withdrawal, or a race condition in a balance update is enough to permanently break user trust. As traffic grows, correctness becomes harder not easier.&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk through how I designed a &lt;strong&gt;scalable financial transaction system&lt;/strong&gt; that prioritizes &lt;strong&gt;correctness first&lt;/strong&gt;, while still achieving &lt;strong&gt;high throughput under heavy concurrency&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Source code:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/plebsicle/microfin" rel="noopener noreferrer"&gt;https://github.com/plebsicle/microfin&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Naive Financial Systems Break at Scale
&lt;/h2&gt;

&lt;p&gt;A typical starting point looks simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A request hits the server
&lt;/li&gt;
&lt;li&gt;The server validates it
&lt;/li&gt;
&lt;li&gt;A database transaction updates the balance
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works fine at low traffic. Under concurrency, it quickly breaks.&lt;/p&gt;

&lt;p&gt;Common failure modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple concurrent updates to the same balance&lt;/li&gt;
&lt;li&gt;Database connection exhaustion&lt;/li&gt;
&lt;li&gt;Lock contention increasing latency&lt;/li&gt;
&lt;li&gt;Race conditions causing incorrect balances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even with database transactions, allowing &lt;strong&gt;concurrent writes to shared financial state&lt;/strong&gt; is dangerous. Locking harder reduces throughput, and scaling vertically only delays the problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Direct concurrent writes to financial state do not scale safely.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Defining the Non-Negotiables
&lt;/h2&gt;

&lt;p&gt;Before choosing any technology, I defined the system requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong consistency for account balances
&lt;/li&gt;
&lt;li&gt;Sequential execution of transactions per user/account
&lt;/li&gt;
&lt;li&gt;High throughput under concurrent load
&lt;/li&gt;
&lt;li&gt;Horizontal scalability
&lt;/li&gt;
&lt;li&gt;Fault tolerance
&lt;/li&gt;
&lt;li&gt;A system that is easy to reason about
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every architectural decision follows from these constraints.&lt;/p&gt;




&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;

&lt;p&gt;The key idea is to &lt;strong&gt;separate request ingestion from state mutation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At a high level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests are accepted and validated by application servers&lt;/li&gt;
&lt;li&gt;Valid transactions are queued&lt;/li&gt;
&lt;li&gt;Transactions are processed sequentially per user&lt;/li&gt;
&lt;li&gt;Durable state is updated in the database&lt;/li&gt;
&lt;li&gt;Caching is used purely as a performance optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows the system to absorb traffic spikes without immediately mutating shared state.&lt;/p&gt;




&lt;h2&gt;
  
  
  System Architecture Diagram
&lt;/h2&gt;

&lt;p&gt;The diagram below shows the high-level architecture of the system and how requests flow through different components, from request ingestion to durable storage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw84zjuciavze3sna4jdu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw84zjuciavze3sna4jdu.png" alt="High-level architecture of the financial transaction system" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each incoming request is first load-balanced by Nginx, validated by application servers, and then published to Kafka. Kafka enforces strict per-user ordering before workers update PostgreSQL (the system of record) and Redis (the cache).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Insight: Ordering Beats Locking
&lt;/h2&gt;

&lt;p&gt;The most important insight in this design is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Correct ordering is more scalable than locking.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of allowing concurrent updates and then trying to protect state with locks, the system ensures that &lt;strong&gt;transactions for a given user are never processed concurrently&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is achieved through &lt;strong&gt;deterministic ordering&lt;/strong&gt;, not mutual exclusion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Enforcing Sequential Execution with Kafka Partitioning
&lt;/h2&gt;

&lt;p&gt;Each transaction is associated with a &lt;strong&gt;user ID / account number&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
A hash of this identifier determines which Kafka partition the transaction is sent to.&lt;/p&gt;

&lt;p&gt;This gives us three critical guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All transactions for a user always go to the same partition&lt;/li&gt;
&lt;li&gt;Each partition is consumed by exactly one worker&lt;/li&gt;
&lt;li&gt;Kafka guarantees ordering within a partition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No two transactions for the same user are processed concurrently&lt;/li&gt;
&lt;li&gt;Race conditions are eliminated&lt;/li&gt;
&lt;li&gt;Database-level locking is unnecessary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scaling is achieved by increasing the number of partitions and workers, while preserving per-user ordering.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens When a User Double-Clicks?
&lt;/h2&gt;

&lt;p&gt;Users double-click buttons, retry requests, or refresh pages during slow network conditions.&lt;/p&gt;

&lt;p&gt;In this system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each request is treated as a distinct transaction&lt;/li&gt;
&lt;li&gt;Multiple similar requests may be enqueued&lt;/li&gt;
&lt;li&gt;They are processed &lt;strong&gt;strictly sequentially&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than rejecting or deduplicating requests at the application layer, the system relies on &lt;strong&gt;deterministic execution order&lt;/strong&gt; to guarantee correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Kafka
&lt;/h2&gt;

&lt;p&gt;Kafka isn’t used here just as a queue, it’s a &lt;strong&gt;core correctness mechanism&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Kafka provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong ordering guarantees within partitions
&lt;/li&gt;
&lt;li&gt;Horizontal scalability through partitioning
&lt;/li&gt;
&lt;li&gt;Fault tolerance via replication
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka producers are configured with &lt;strong&gt;idempotency enabled&lt;/strong&gt;. This ensures that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate messages are not written during retries&lt;/li&gt;
&lt;li&gt;Ordering is preserved&lt;/li&gt;
&lt;li&gt;Exactly-once &lt;em&gt;production&lt;/em&gt; semantics are achieved per producer session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is &lt;strong&gt;producer-side idempotency&lt;/strong&gt;, not consumer-side deduplication. Correctness comes from ordering, not from rejecting duplicate-looking requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  PostgreSQL as the Source of Truth
&lt;/h2&gt;

&lt;p&gt;PostgreSQL acts as the &lt;strong&gt;system of record&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why PostgreSQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong ACID guarantees
&lt;/li&gt;
&lt;li&gt;Mature transactional behavior
&lt;/li&gt;
&lt;li&gt;Predictable correctness under concurrency
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PgBouncer is used for connection pooling to prevent database exhaustion under heavy load.&lt;/p&gt;

&lt;p&gt;All balance updates occur inside database transactions, ensuring durability and correctness even in failure scenarios.&lt;/p&gt;




&lt;h2&gt;
  
  
  Redis: Performance Optimization, Not a Dependency
&lt;/h2&gt;

&lt;p&gt;Redis is used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session management
&lt;/li&gt;
&lt;li&gt;Caching frequently accessed data
&lt;/li&gt;
&lt;li&gt;Reducing read load on PostgreSQL
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Redis is &lt;strong&gt;not part of the correctness path&lt;/strong&gt;. There is no distributed transaction protocol between PostgreSQL and Redis. If Redis fails, the system remains correct — only performance is affected.&lt;/p&gt;




&lt;h2&gt;
  
  
  End-to-End Transaction Flow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Client sends a transaction request
&lt;/li&gt;
&lt;li&gt;Nginx load-balances the request
&lt;/li&gt;
&lt;li&gt;Application server validates it
&lt;/li&gt;
&lt;li&gt;Transaction is published to Kafka
&lt;/li&gt;
&lt;li&gt;Kafka assigns it to a partition via hashing
&lt;/li&gt;
&lt;li&gt;A worker processes transactions sequentially
&lt;/li&gt;
&lt;li&gt;PostgreSQL updates occur inside a transaction
&lt;/li&gt;
&lt;li&gt;Redis is updated opportunistically
&lt;/li&gt;
&lt;li&gt;Response is returned to the client
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Load Testing and Results
&lt;/h2&gt;

&lt;p&gt;The system was stress-tested using &lt;strong&gt;K6&lt;/strong&gt; with scenarios including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Account creation
&lt;/li&gt;
&lt;li&gt;User sign-ins
&lt;/li&gt;
&lt;li&gt;Deposits, withdrawals, and transfers
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sustained &lt;strong&gt;3600+ requests per second&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;100% success rate&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Sub-second p95 latencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Correctness was maintained throughout.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability (and Its Limits)
&lt;/h2&gt;

&lt;p&gt;Prometheus and Grafana are used to monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request rate&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;CPU and memory usage of application servers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Infrastructure-level metrics for Kafka, Redis, and PostgreSQL are not instrumented in this version.&lt;/p&gt;




&lt;h2&gt;
  
  
  Trade-offs and Limitations
&lt;/h2&gt;

&lt;p&gt;This system intentionally avoids:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Semantic request deduplication
&lt;/li&gt;
&lt;li&gt;Distributed transactions across PostgreSQL and Redis
&lt;/li&gt;
&lt;li&gt;Full infrastructure-level observability
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These trade-offs keep the system simpler and more reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’d Improve Next
&lt;/h2&gt;

&lt;p&gt;Future improvements could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kafka, Redis, and PostgreSQL metrics
&lt;/li&gt;
&lt;li&gt;Database sharding strategies
&lt;/li&gt;
&lt;li&gt;Advanced failure recovery mechanisms
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ordering simplifies concurrency
&lt;/li&gt;
&lt;li&gt;Correctness matters more than raw speed
&lt;/li&gt;
&lt;li&gt;Distributed systems are about trade-offs, not perfection
&lt;/li&gt;
&lt;li&gt;Designing for correctness early prevents painful rewrites
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scalability isn’t about handling more requests —&lt;br&gt;&lt;br&gt;
it’s about handling them &lt;strong&gt;correctly&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔗 Project Link
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/plebsicle/microfin" rel="noopener noreferrer"&gt;https://github.com/plebsicle/microfin&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
