Stripe · Databases · 17 May 2026
Stripe processed over $1 trillion in payment volume in 2023 while maintaining 99.999% uptime — five nines, fewer than 6 minutes of downtime all year. The infrastructure secret is a database platform called DocDB and a migration engine that moves petabytes of financial data between shards without any application knowing it happened.
- $1T+ payment volume 2023
- 99.999% uptime achieved
- 5M database queries/sec
- 1.5 PB migrated in 2023
- Thousands of shards managed
- Zero-downtime migrations
The Story
- $1T+ — Payment volume processed by Stripe in 2023 — making their database reliability requirements some of the most demanding in the industry
- 99.999% — Uptime achieved — five nines means less than 5.26 minutes of total downtime per year across all Stripe APIs and payment processing
- 5M QPS — Database queries per second sustained across Stripe's DocDB fleet — comparable to some of the largest databases in the world
- 1.5 PB — Data migrated between shards in 2023 alone using the Data Movement Platform — transparently to all applications
When Stripe launched in 2011, they chose MongoDB (a document-oriented NoSQL database that stores data as flexible JSON-like documents rather than fixed relational table schemas, offering developer productivity advantages for rapid iteration) because it was more developer-friendly than standard relational databases for a fast-moving startup. Over the next decade, as Stripe grew from a startup to a financial infrastructure company processing trillion-dollar payment volumes, the team built a layer on top of MongoDB that they call DocDB — a Database-as-a-Service (an abstraction layer that gives application developers a simple API for data access while hiding all the complexity of sharding, replication, failover, and migrations beneath it). DocDB handles horizontal sharding (a database scaling technique that distributes data rows across multiple independent database instances (shards) based on a partition key, so no single instance holds all the data and traffic is distributed) across thousands of shards, manages replication for high availability, and — crucially — enables the zero-downtime data migrations that allow Stripe's database fleet to scale continuously without ever taking payments offline.
The central innovation of DocDB is its Data Movement Platform — a system that can migrate chunks of data between shards while both the source and target shards continue serving live production traffic. This capability is essential for Stripe's operations: as certain merchants grow rapidly and their shard fills up, it needs to be split. As the fleet evolves and some shards become underutilized, they can be consolidated. When a new MongoDB version is released, shards can be upgraded by fork-lifting (migrating data to a new instance running the target version, avoiding multi-step in-place upgrades that pass through each intermediate version) to the new version rather than performing multi-step in-place upgrades. All of these operations have one requirement in common: Stripe cannot stop accepting payments while they happen.
THE FIVE NINES CONSTRAINT
99.999% uptime means less than 5.26 minutes of downtime per year. For a payment processor, downtime is not just SLA violation — it's merchants unable to complete sales, customers unable to pay, and real-time revenue loss for the businesses Stripe serves. Every database operation — migration, split, consolidation, upgrade — must happen transparently. The constraint is absolute: there is no maintenance window at Stripe's scale.
The Six-Step Migration Protocol
The Data Movement Platform executes every shard migration through a six-step protocol: (1) register the migration plan in the chunk metadata service (a central catalog that tracks which data chunks live on which shards — the source of truth for query routing across Stripe's fleet), (2) build indexes on the target shard before data arrives (avoiding the performance penalty of indexing after a large data load), (3) bulk-copy a snapshot of the chunk from source to target, (4) stream async replication to apply changes made on the source since the snapshot was taken, (5) perform correctness checks to verify data consistency, (6) switch traffic to the target and deregister the chunk from the source. Steps 3 and 4 were where Stripe hit unexpected engineering challenges — and where the most creative solutions emerged.
Problem
Shard Splits and Consolidations Required Downtime
Without the Data Movement Platform, scaling Stripe's database fleet required either accepting downtime during shard operations or building complex dual-write logic for every migration. As Stripe's fleet grew to thousands of shards, this was operationally unsustainable and created real risk for every migration event.
Cause
Financial Data Cannot Tolerate Inconsistency
Payment data has zero tolerance for consistency errors — a payment record that exists on the source shard but hasn't yet appeared on the target is a payment that could be double-charged, lost, or corrupted if traffic switches at the wrong moment. The six-step protocol was designed specifically to guarantee that by the time traffic switches, the target is exactly consistent with the source including all writes made during migration.
Solution
CDC Replication + Correctness Verification
Stripe solved the consistency problem using Change Data Capture (a technique that continuously reads the MongoDB operation log (oplog) to stream every write applied to the source shard to the target, keeping the target synchronized even as live traffic modifies the source data) streaming from the source shard's oplog. After CDC replication catches up to near-real-time, correctness checks compare source and target before traffic is switched. The switch itself is atomic from the application's perspective.
Result
1.5 Petabytes Moved in 2023 Transparently
In 2023 alone, Stripe migrated 1.5 petabytes of data between shards, consolidated thousands of databases through bin packing, and upgraded the entire MongoDB fleet — all with zero application downtime and no payment processing interruptions. 99.999% uptime was maintained throughout.
DocDB's ability to migrate data between shards in a consistent, granular and reliable way has made it significantly easier for Stripe to scale.
— — Jimmy Morzaria, Suraj Narkhede — via Stripe Engineering Blog, June 2024
⚠️
The Bulk Load Throughput Problem
Step 3 of the migration — bulk loading a snapshot of the chunk onto the target shard — hit a significant throughput limitation during testing. Stripe's engineering team tried batching writes and tuning DocDB engine parameters, but neither approach resolved the bottleneck. The root cause was an impedance mismatch between the bulk loader and the target shard's write path: the target shard was not optimized for sequential ingestion at high speeds. The engineering team eventually solved this by building purpose-built bulk import tooling with different I/O patterns than the standard DocDB write path.
🗃️
Stripe manages thousands of DocDB shards — and periodically performs bin-packing consolidations where underutilized shards are merged to reduce operational overhead and hardware costs. In 2023 they reduced the total number of underlying DocDB shards by approximately three-quarters through such consolidation, migrating 1.5 petabytes of data in the process.
⬆️
The Fork-Lift Upgrade Strategy
Traditional in-place database major version upgrades require going through each intermediate version sequentially — upgrading from MongoDB 4.0 to 5.0 to 6.0, for example, each step requiring careful validation. Stripe's Data Movement Platform enables a fork-lift strategy : provision a new shard running the target version, migrate the data to it, switch traffic, decommission the old shard. Any version can jump to any other version in a single migration step. This eliminates the risk accumulation of multi-step in-place upgrades.
ℹ️
DocDB: Not a Rewrite, an Extension
A key decision in Stripe's database evolution was building DocDB on top of MongoDB Community rather than replacing MongoDB with a different database. This preserved compatibility with existing application code, the existing data model, and years of operational knowledge. The extensions — sharding, proxy routing, migration tooling — were added as a platform layer, not a fork. This pragmatic approach to building on existing foundations rather than starting from scratch is characteristic of Stripe's infrastructure philosophy.
The Fix
DocDB Architecture: The Database-as-a-Service Abstraction
DocDB's architecture is a three-tier system sitting between Stripe's application code and raw MongoDB instances. The Database Proxy is the entry point for all application read/write requests — it performs access control checks, validates queries, and routes requests to the correct shard by consulting the chunk metadata service. The Chunk Metadata Service maintains the authoritative map of which data chunks live on which shards. The Database Shards are replicated MongoDB instances that store the actual data. Applications talk only to the proxy; they are completely unaware of sharding, shard splits, or migrations in progress.
# Simplified 6-step Data Movement Platform migration flow
# Each step is atomic and resumable — migrations can be paused and continued
class DataMovementPlatform:
def migrate_chunk(self, chunk_id: str, source_shard: str, target_shard: str):
# Step 1: Register migration plan — makes migration visible to monitoring
self.chunk_metadata.register_migration(
chunk_id=chunk_id,
source=source_shard,
target=target_shard
)
# Step 2: Pre-build indexes on target BEFORE data arrives
# Avoids the performance penalty of indexing a large loaded dataset
self.build_indexes_on_target(target_shard, chunk_id)
# Step 3: Bulk copy snapshot at time T
# Uses purpose-built I/O patterns for high-throughput sequential writes
snapshot_timestamp = self.bulk_copy_snapshot(chunk_id, source_shard, target_shard)
# Step 4: Stream CDC replication — catch up all writes since snapshot
# Reads MongoDB oplog on source; applies to target until near-real-time
self.cdc_replicate_to_target(
source_shard, target_shard, since=snapshot_timestamp
)
# Step 5: Correctness verification — compare source and target
# Financial data requires full consistency before any traffic switch
assert self.verify_consistency(chunk_id, source_shard, target_shard)
# Step 6: Atomic traffic switch — update chunk metadata, switch routing
self.chunk_metadata.set_active_shard(chunk_id, target_shard)
# Applications querying the chunk now get routed to target
# Deregister from source after confirmation
self.chunk_metadata.deregister_from_source(chunk_id, source_shard)
BIN-PACKING: REDUCING THE FLEET BY 75%
In 2023, Stripe used the Data Movement Platform to bin-pack thousands of underutilized shards into a smaller number of larger shards. Bin-packing is the reverse of splitting: instead of one shard becoming two, many small shards are consolidated into fewer shards with more data. This reduced the total number of DocDB shards by approximately 75% while moving 1.5 petabytes — dramatically reducing operational overhead and hardware costs without any application code changes.
ℹ️
Multitenant to Single-Tenant: Isolation on Demand
DocDB supports migrating a large merchant's data from a shared multitenant shard (multiple merchants on one shard) to a dedicated single-tenant shard (one merchant per shard). This is done transparently via the Data Movement Platform: the merchant's data is migrated to a dedicated shard, traffic routing is updated atomically, and the merchant gets dedicated resources without any downtime or visible change in behavior. This capability is increasingly important as Stripe's largest customers grow to Shopify, Amazon, and OpenAI scale.
✅
The Heat Management System: Next Chapter
At the time of the June 2024 blog post, Stripe was prototyping a heat management system that proactively balances data across shards based on real-time access patterns. Rather than waiting for a shard to become a bottleneck and then splitting it reactively, the heat management system would detect access pattern shifts and pre-emptively migrate hot data to shards with more capacity. Reactive sharding at Stripe's scale will eventually give way to predictive sharding.
Correctness verification (Step 5) is the most cautious part of the migration protocol, and deliberately so. The platform compares a sample of records between source and target shards after CDC replication has caught up. For financial data, even a single inconsistency before the traffic switch would be unacceptable — a payment that exists on the source but not on the target could be double-charged or lost if the switch happens before it replicates. The verification step is the safety gate that makes five-nines availability compatible with live shard migrations. The cost is time — migrations take longer because of the verification window — but that cost is the explicit price of correctness guarantees on financial data.
⚠️
The Bulk Load Throughput Engineering Challenge
During testing, Stripe found that standard MongoDB write patterns were insufficiently fast for bulk data loading during shard migrations. Batching writes and tuning engine parameters both failed to resolve the throughput bottleneck. The root cause: the standard MongoDB write path is optimized for low-latency individual writes , not for high-throughput sequential bulk loads. The engineering team built custom I/O patterns specifically for the bulk copy phase of migrations — patterns that bypassed some standard write overhead in favor of throughput.
THE OPLOG AND FINANCIAL CONSISTENCY
MongoDB's oplog (a capped collection that stores all write operations in order, used for replication across MongoDB replica sets) is the technical foundation of CDC replication in DocDB. Every write to the source shard appears in the oplog in order. By replaying the oplog on the target shard in sequence, the Data Movement Platform guarantees that every write applied to the source during migration is also applied to the target — preserving full consistency of financial records. The oplog is not just a replication mechanism: it is a linearizable history of financial truth.
Architecture
DocDB's architecture enforces a clean separation between application code and database topology. Applications at Stripe never connect directly to MongoDB instances — they connect to the Database Proxy, which is the single point of truth for routing, access control, and scalability decisions. This indirection is what makes zero-downtime migrations possible: the proxy can update its routing table atomically as migrations complete, and applications never see anything other than consistent data.
DocDB Architecture: Three-Tier Database-as-a-Service
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Data Movement Platform: Six-Step Migration Protocol
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
THE PROXY MAKES MIGRATIONS TRANSPARENT
The Database Proxy's role is the architectural key to zero-downtime migrations. By abstracting away shard topology from application code , the proxy can update routing atomically at Step 6 — the traffic switch — without any application restarting, reconnecting, or changing behavior. Applications see a continuous stream of consistent reads and writes before and after the switch. The migration is completely invisible from the application layer, which is the entire point.
ℹ️
Change Data Capture: Reading the Oplog
MongoDB maintains an oplog (operation log — a capped MongoDB collection that records every write operation applied to the database, used for replication and CDC streaming) that records every write in sequence. DocDB's CDC service reads this oplog on the source shard and replays every operation on the target shard in order. This keeps the target continuously synchronized with the source during the migration window. When replication lag drops to near-zero, the correctness verification and traffic switch can proceed safely.
✅
Transparent Application Layer: The Developer Experience
Stripe's application engineers interact with DocDB through a simple API: read a document, write a document, query by index. They never configure sharding keys, never think about which shard holds a specific customer's data, and never coordinate with the database infrastructure team before their code ships. The abstraction layer is what makes it possible for Stripe's product engineering velocity to be decoupled from its database scaling complexity — two teams that would otherwise be in each other's way operate independently.
Lessons
Stripe's DocDB and Data Movement Platform represent a decade of investment in making financial database operations invisible to application code. The lessons here are about architectural abstraction, the price of correctness, and why migration tooling is a competitive advantage.
- 01. A database abstraction layer is an operational multiplier. Stripe's applications never talk to MongoDB directly — they talk to the proxy. This indirection cost engineering time upfront but enabled zero-downtime migrations, transparent sharding, and fleet-wide upgrades for a decade of scale growth. The abstraction layer is where scaling strategies live.
- 02. Change Data Capture (reading a database's operation log to stream every change to a downstream consumer in real time) is the foundation of live migration. Without CDC, migrating a live database requires a maintenance window. With CDC, you copy a snapshot, stream the delta, verify consistency, then switch traffic atomically. Build CDC capability into your database infrastructure before you need live migrations.
- 03. Pre-build indexes on the target before loading data. Loading data first and then building indexes on a large dataset is far more expensive than building the indexes on empty data and then inserting. For petabyte-scale migrations, this ordering difference can be the difference between hours and days. Stripe explicitly sequences index creation before bulk data arrival.
- 04. Gradual traffic restoration and correctness verification before the switch are not optional for financial data. A migration that completes fast but introduces even a single data inconsistency is worse than a slow correct migration. For domains where correctness is non-negotiable, treat Step 5 (verification) as the most important step in your migration protocol.
- 05. Bin-packing (consolidating many small, underutilized database shards into fewer larger shards to reduce operational overhead and hardware costs) is as important as shard splitting for long-term database fleet health. As traffic patterns shift, some shards become cold. Without consolidation, you accumulate operational overhead and hardware waste. Plan for bidirectional shard topology management from day one.
⚠️
The Correctness vs Speed Tradeoff
Stripe's Data Movement Platform deliberately accepts slower migrations in exchange for guaranteed correctness. The CDC replication phase, the correctness verification step, and the atomic traffic switch all add latency to the migration timeline that a less careful system could avoid. For a company processing $1 trillion in payments, data inconsistency risk is not a speed-for-correctness tradeoff — it's a business continuity risk. The migration protocol encodes this priority explicitly.
MIGRATION TOOLING AS INFRASTRUCTURE
Stripe's Data Movement Platform is not a script that gets run during migrations — it is production infrastructure that runs continuously, managing ongoing shard operations across thousands of databases. The platform has its own SLOs, its own monitoring, its own oncall rotation. Building migration tooling as first-class infrastructure rather than ad-hoc tooling is what enables Stripe to migrate petabytes per year without extraordinary engineering effort per migration.
Stripe moved 1.5 petabytes of financial data between database shards in 2023 and nobody noticed — which is either the most boring success story in engineering history or the most impressive one.
TechLogStack — built at scale, broken in public, rebuilt by engineers
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack → (interactive diagrams, source links, and the full reader experience).
Top comments (0)