Part 1: The Problem Space - Why Modern Banking Infrastructure is Broken
Series: Building a 100K TPS Financial Ledger
Part: 1 of 7
Reading Time: 8 minutes
Introduction
Imagine you're the CTO of a major bank. It's Black Friday, and your payment processing system just hit a wall. Transactions are queueing up. Customers can't pay. Revenue is bleeding. Your core banking system—the one that cost $50 million to implement in 2005—is choking on modern transaction volumes.
This isn't a hypothetical. It's happening at banks around the world, right now.
I recently spent several weeks designing a reference architecture for a high-performance financial ledger system. The challenge: handle 100,000+ transactions per second with five-nines availability (99.999% uptime), maintain perfect financial correctness, and meet strict regulatory requirements.
This is Part 1 of a 7-part series where I'll share everything I learned. We'll start by understanding why this problem exists and why it's so hard to solve.
The Core Banking Crisis
Legacy Systems Built for a Different Era
Most Tier-1 banks run on core banking systems designed in the 1980s-2000s. These systems were architected before:
- Cloud computing - Everything ran on mainframes and AS/400s
- Real-time payments - Batch processing overnight was acceptable
- Mobile banking - Branch transactions were the norm
- Fintech competition - Banks had monopolies on financial services
- Regulatory complexity - Compliance was simpler
The numbers tell the story:
| Then (2000) | Now (2025) |
|---|---|
| 1,000 TPS peak | 100,000+ TPS sustained |
| Batch overnight | Real-time settlement |
| 99% uptime acceptable | 99.999% required |
| Single-region | Multi-region, global |
| Mainframe | Distributed cloud |
These legacy systems can't be easily upgraded. They're:
- Written in COBOL - Limited talent pool
- Monolithic - Can't scale horizontally
- Stateful - Hard to distribute
- Undocumented - Original engineers retired
- Business-critical - Can't afford downtime for rewrites
The Real-Time Payment Revolution
FedNow, RTP (Real-Time Payments), and instant settlement systems have changed customer expectations. You can Venmo someone instantly, but your bank transfer takes 3 days? That gap is widening.
Modern requirements:
- Instant balance updates
- 24/7/365 availability
- Sub-second transaction confirmation
- Real-time fraud detection
- Immediate reconciliation
Legacy batch systems process transactions overnight. Modern fintech processes them in milliseconds.
The Fintech Challenge
Stripe, Square, Revolut, Chime - they're not burdened by legacy systems. They can:
- Deploy multiple times per day
- Scale horizontally on cloud infrastructure
- Adopt new technologies quickly
- Iterate on customer feedback rapidly
Traditional banks are losing market share to companies that didn't exist 10 years ago.
The Requirements Dilemma
Building modern banking infrastructure requires balancing seemingly contradictory requirements:
1. Performance vs. Correctness
Performance demands:
- 100,000+ transactions per second
- Sub-50ms p99 latency
- Minimal resource consumption
- Horizontal scalability
Correctness demands:
- Every cent accounted for
- No duplicate transactions
- No lost transactions
- Perfect audit trail
- Atomic operations
Most systems optimize for one at the expense of the other. Financial systems need both.
2. Availability vs. Consistency
The CAP theorem tells us we can't have all three: Consistency, Availability, Partition tolerance.
Banking reality:
- Can't sacrifice consistency (money is exact, not "eventual")
- Can't sacrifice availability (downtime = lost revenue + regulatory issues)
- Can't avoid network partitions (they WILL happen)
Traditional databases force you to choose. Financial systems need a different approach.
3. Innovation vs. Regulation
Regulators require:
- Complete audit trails
- Data retention (7-10 years)
- Immutable records
- Disaster recovery capability
- Third-party audits
- Compliance certifications (SOC 2, ISO 27001, etc.)
Business needs:
- Fast feature development
- Competitive time-to-market
- Cost efficiency
- Modern architectures
- Cloud deployment
These often conflict. Compliance slows innovation. But non-compliance isn't an option.
Why Existing Solutions Fall Short
General-Purpose Databases
PostgreSQL, MySQL, MongoDB - Excellent databases, but not optimized for ledger workloads.
Challenges:
- Not designed for append-only patterns
- Don't enforce double-entry bookkeeping at schema level
- Performance degrades with transaction volume
- Require complex application logic for financial correctness
- Horizontal scaling is difficult
Distributed SQL Databases
CockroachDB, Spanner, TiDB - Better for scale, but:
Challenges:
- Higher latency (consensus overhead)
- Complex operational model
- Expensive at scale
- Still not ledger-optimized
- Consistency comes at performance cost
NoSQL Databases
Cassandra, DynamoDB - Great for availability and scale, but:
Challenges:
- Eventual consistency (unacceptable for money)
- No ACID guarantees across records
- Complex application logic required
- Difficult reconciliation
- Not built for financial workloads
Blockchain/DLT
Ethereum, Hyperledger - Immutable and distributed, but:
Challenges:
- Extremely slow (10-100 TPS max)
- High latency (seconds to minutes)
- Complex consensus mechanisms
- Expensive operations
- Not designed for traditional banking
The Real Challenge: It's Not Just Technical
Building high-performance financial infrastructure isn't just a technical problem. It's also:
Organizational
- Risk aversion - Banks can't afford to fail
- Regulatory scrutiny - Every change is audited
- Stakeholder complexity - Multiple departments, competing priorities
- Change management - Migrating from legacy systems without downtime
Financial
- Massive investment - Core banking replacements cost $100M-$1B
- Long timelines - 5-10 year implementation
- Opportunity cost - Resources diverted from other initiatives
- Risk of failure - High-profile core banking failures are common
Human
- Skills gap - Modern distributed systems expertise is rare
- Institutional knowledge - Only a few people understand the legacy system
- Resistance to change - "If it ain't broke, don't fix it" mentality
- Burnout - Critical systems run on skeleton crews
What Success Looks Like
A modern financial ledger system needs to achieve ALL of these:
Performance
- ✅ 100,000+ transactions per second sustained
- ✅ Sub-50ms p99 latency
- ✅ Linear scalability with nodes
- ✅ Efficient resource utilization
Correctness
- ✅ ACID guarantees for all transactions
- ✅ Double-entry bookkeeping enforced
- ✅ No lost or duplicate transactions
- ✅ Perfect reconciliation
- ✅ Immutable audit trail
Reliability
- ✅ 99.999% availability (max 5.26 min/year downtime)
- ✅ Multi-region disaster recovery
- ✅ Automatic failover
- ✅ Zero data loss (RPO = 0)
- ✅ Fast recovery (RTO < 5 minutes)
Operational
- ✅ Observable and debuggable
- ✅ Secure by default
- ✅ Regulatory compliant
- ✅ Cost-effective at scale
- ✅ Maintainable long-term
Business
- ✅ Migration path from legacy systems
- ✅ Incremental adoption possible
- ✅ Reasonable implementation timeline
- ✅ Acceptable risk profile
The Path Forward
So how do we build a system that achieves all of this?
Over the next 6 parts of this series, I'll share a complete reference architecture that addresses each of these challenges:
- Part 2: Core Architecture - Hot + Historical pattern with CQRS
- Part 3: NFR Deep Dive - Achieving 100K TPS with five-nines availability
- Part 4: Financial Correctness - Double-entry bookkeeping at the database level
- Part 5: Operational Excellence - Disaster recovery and observability
- Part 6: Technology Choices - Why specific technologies won
- Part 7: Lessons Learned - What surprised me and what I'd do differently
The complete reference architecture is open source and available at:
Key Takeaways
Legacy banking systems are fundamentally incompatible with modern requirements - They can't be incrementally improved; they need rethinking.
The requirements paradox is real - Performance vs. correctness, availability vs. consistency, innovation vs. regulation. All must be solved simultaneously.
Existing database technologies aren't optimized for ledgers - General-purpose solutions require complex application logic and still underperform.
It's not just a technical problem - Organizational, financial, and human factors are equally critical.
Success requires purpose-built architecture - Specialized solutions for specialized problems.
What's Next?
In Part 2, we'll dive into the core architecture: the Hot + Historical pattern that separates high-speed transactional writes from immutable audit storage, and how CQRS enables us to optimize read and write paths independently.
Questions to ponder until then:
- What would you sacrifice first: performance, correctness, or availability?
- How would you migrate a bank's entire transaction history to a new system with zero downtime?
- Is eventual consistency ever acceptable when dealing with money?
Drop your thoughts in the comments. I'd love to hear about your experiences with financial systems or high-throughput architectures.
About this series:
This is based on a real architecture I designed for a technical challenge. While I didn't get the job, the work was too valuable to keep private. I've open-sourced the complete reference architecture (MIT + Apache 2.0 licensed) so the community can learn from it.
Next in series: Part 2: Core Architecture - Hot + Historical with CQRS (coming next week)
Follow me for more posts on distributed systems, software architecture, and building production-grade financial infrastructure.
Top comments (0)