Michael Harris

Posted on Nov 4

The Problem Space: Why Modern Banking Infrastructure is Broken

#distributedsystems #architecture #fintech #performance

Part 1: The Problem Space - Why Modern Banking Infrastructure is Broken

Series: Building a 100K TPS Financial Ledger
Part: 1 of 7
Reading Time: 8 minutes

Introduction

Imagine you're the CTO of a major bank. It's Black Friday, and your payment processing system just hit a wall. Transactions are queueing up. Customers can't pay. Revenue is bleeding. Your core banking system—the one that cost $50 million to implement in 2005—is choking on modern transaction volumes.

This isn't a hypothetical. It's happening at banks around the world, right now.

I recently spent several weeks designing a reference architecture for a high-performance financial ledger system. The challenge: handle 100,000+ transactions per second with five-nines availability (99.999% uptime), maintain perfect financial correctness, and meet strict regulatory requirements.

This is Part 1 of a 7-part series where I'll share everything I learned. We'll start by understanding why this problem exists and why it's so hard to solve.

The Core Banking Crisis

Legacy Systems Built for a Different Era

Most Tier-1 banks run on core banking systems designed in the 1980s-2000s. These systems were architected before:

Cloud computing - Everything ran on mainframes and AS/400s
Real-time payments - Batch processing overnight was acceptable
Mobile banking - Branch transactions were the norm
Fintech competition - Banks had monopolies on financial services
Regulatory complexity - Compliance was simpler

The numbers tell the story:

Then (2000)	Now (2025)
1,000 TPS peak	100,000+ TPS sustained
Batch overnight	Real-time settlement
99% uptime acceptable	99.999% required
Single-region	Multi-region, global
Mainframe	Distributed cloud

These legacy systems can't be easily upgraded. They're:

Written in COBOL - Limited talent pool
Monolithic - Can't scale horizontally
Stateful - Hard to distribute
Undocumented - Original engineers retired
Business-critical - Can't afford downtime for rewrites

The Real-Time Payment Revolution

FedNow, RTP (Real-Time Payments), and instant settlement systems have changed customer expectations. You can Venmo someone instantly, but your bank transfer takes 3 days? That gap is widening.

Modern requirements:

Instant balance updates
24/7/365 availability
Sub-second transaction confirmation
Real-time fraud detection
Immediate reconciliation

Legacy batch systems process transactions overnight. Modern fintech processes them in milliseconds.

The Fintech Challenge

Stripe, Square, Revolut, Chime - they're not burdened by legacy systems. They can:

Deploy multiple times per day
Scale horizontally on cloud infrastructure
Adopt new technologies quickly
Iterate on customer feedback rapidly

Traditional banks are losing market share to companies that didn't exist 10 years ago.

The Requirements Dilemma

Building modern banking infrastructure requires balancing seemingly contradictory requirements:

1. Performance vs. Correctness

Performance demands:

100,000+ transactions per second
Sub-50ms p99 latency
Minimal resource consumption
Horizontal scalability

Correctness demands:

Every cent accounted for
No duplicate transactions
No lost transactions
Perfect audit trail
Atomic operations

Most systems optimize for one at the expense of the other. Financial systems need both.

2. Availability vs. Consistency

The CAP theorem tells us we can't have all three: Consistency, Availability, Partition tolerance.

Banking reality:

Can't sacrifice consistency (money is exact, not "eventual")
Can't sacrifice availability (downtime = lost revenue + regulatory issues)
Can't avoid network partitions (they WILL happen)

Traditional databases force you to choose. Financial systems need a different approach.

3. Innovation vs. Regulation

Regulators require:

Complete audit trails
Data retention (7-10 years)
Immutable records
Disaster recovery capability
Third-party audits
Compliance certifications (SOC 2, ISO 27001, etc.)

Business needs:

Fast feature development
Competitive time-to-market
Cost efficiency
Modern architectures
Cloud deployment

These often conflict. Compliance slows innovation. But non-compliance isn't an option.

Why Existing Solutions Fall Short

General-Purpose Databases

PostgreSQL, MySQL, MongoDB - Excellent databases, but not optimized for ledger workloads.

Challenges:

Not designed for append-only patterns
Don't enforce double-entry bookkeeping at schema level
Performance degrades with transaction volume
Require complex application logic for financial correctness
Horizontal scaling is difficult

Distributed SQL Databases

CockroachDB, Spanner, TiDB - Better for scale, but:

Challenges:

Higher latency (consensus overhead)
Complex operational model
Expensive at scale
Still not ledger-optimized
Consistency comes at performance cost

NoSQL Databases

Cassandra, DynamoDB - Great for availability and scale, but:

Challenges:

Eventual consistency (unacceptable for money)
No ACID guarantees across records
Complex application logic required
Difficult reconciliation
Not built for financial workloads

Blockchain/DLT

Ethereum, Hyperledger - Immutable and distributed, but:

Challenges:

Extremely slow (10-100 TPS max)
High latency (seconds to minutes)
Complex consensus mechanisms
Expensive operations
Not designed for traditional banking

The Real Challenge: It's Not Just Technical

Building high-performance financial infrastructure isn't just a technical problem. It's also:

Organizational

Risk aversion - Banks can't afford to fail
Regulatory scrutiny - Every change is audited
Stakeholder complexity - Multiple departments, competing priorities
Change management - Migrating from legacy systems without downtime

Financial

Massive investment - Core banking replacements cost $100M-$1B
Long timelines - 5-10 year implementation
Opportunity cost - Resources diverted from other initiatives
Risk of failure - High-profile core banking failures are common

Human

Skills gap - Modern distributed systems expertise is rare
Institutional knowledge - Only a few people understand the legacy system
Resistance to change - "If it ain't broke, don't fix it" mentality
Burnout - Critical systems run on skeleton crews

What Success Looks Like

A modern financial ledger system needs to achieve ALL of these:

Performance

✅ 100,000+ transactions per second sustained
✅ Sub-50ms p99 latency
✅ Linear scalability with nodes
✅ Efficient resource utilization

Correctness

✅ ACID guarantees for all transactions
✅ Double-entry bookkeeping enforced
✅ No lost or duplicate transactions
✅ Perfect reconciliation
✅ Immutable audit trail

Reliability

✅ 99.999% availability (max 5.26 min/year downtime)
✅ Multi-region disaster recovery
✅ Automatic failover
✅ Zero data loss (RPO = 0)
✅ Fast recovery (RTO < 5 minutes)

Operational

✅ Observable and debuggable
✅ Secure by default
✅ Regulatory compliant
✅ Cost-effective at scale
✅ Maintainable long-term

Business

✅ Migration path from legacy systems
✅ Incremental adoption possible
✅ Reasonable implementation timeline
✅ Acceptable risk profile

The Path Forward

So how do we build a system that achieves all of this?

Over the next 6 parts of this series, I'll share a complete reference architecture that addresses each of these challenges:

Part 2: Core Architecture - Hot + Historical pattern with CQRS
Part 3: NFR Deep Dive - Achieving 100K TPS with five-nines availability
Part 4: Financial Correctness - Double-entry bookkeeping at the database level
Part 5: Operational Excellence - Disaster recovery and observability
Part 6: Technology Choices - Why specific technologies won
Part 7: Lessons Learned - What surprised me and what I'd do differently

The complete reference architecture is open source and available at:

🔗 GitHub Repository

Key Takeaways

Legacy banking systems are fundamentally incompatible with modern requirements - They can't be incrementally improved; they need rethinking.
The requirements paradox is real - Performance vs. correctness, availability vs. consistency, innovation vs. regulation. All must be solved simultaneously.
Existing database technologies aren't optimized for ledgers - General-purpose solutions require complex application logic and still underperform.
It's not just a technical problem - Organizational, financial, and human factors are equally critical.
Success requires purpose-built architecture - Specialized solutions for specialized problems.

What's Next?

In Part 2, we'll dive into the core architecture: the Hot + Historical pattern that separates high-speed transactional writes from immutable audit storage, and how CQRS enables us to optimize read and write paths independently.

Questions to ponder until then:

What would you sacrifice first: performance, correctness, or availability?
How would you migrate a bank's entire transaction history to a new system with zero downtime?
Is eventual consistency ever acceptable when dealing with money?

Drop your thoughts in the comments. I'd love to hear about your experiences with financial systems or high-throughput architectures.

About this series:
This is based on a real architecture I designed for a technical challenge. While I didn't get the job, the work was too valuable to keep private. I've open-sourced the complete reference architecture (MIT + Apache 2.0 licensed) so the community can learn from it.

Next in series: Part 2: Core Architecture - Hot + Historical with CQRS (coming next week)

Follow me for more posts on distributed systems, software architecture, and building production-grade financial infrastructure.