- $1T+ payment volume processed in 2023 while maintaining 99.999% uptime
- 5M database queries/second sustained across Stripe's DocDB fleet
- 1.5 petabytes migrated between shards in 2023 — transparently to all applications
- Thousands of shards managed; reduced by ~75% through bin-packing consolidation
- Six-step migration protocol — snapshot, CDC replication, correctness verification, atomic switch
- Zero downtime — applications never see a migration in progress; the proxy routes transparently
Stripe processed over $1 trillion in payment volume in 2023 while maintaining 99.999% uptime — five nines, fewer than 6 minutes of downtime all year. The infrastructure secret is a database platform called DocDB and a migration engine that moves petabytes of financial data between shards without any application knowing it happened.
The Story
DocDB's ability to migrate data between shards in a consistent, granular and reliable way has made it significantly easier for Stripe to scale.
— Jimmy Morzaria, Suraj Narkhede, via Stripe Engineering Blog, June 2024
When Stripe launched in 2011, they chose MongoDB (a document-oriented NoSQL database that stores data as flexible JSON-like documents rather than fixed relational table schemas) because it was more developer-friendly for a fast-moving startup. Over the next decade, as Stripe grew into a financial infrastructure company processing trillion-dollar payment volumes, the team built a layer on top of MongoDB called DocDB — a Database-as-a-Service (an abstraction layer that gives application developers a simple API for data access while hiding all the complexity of sharding, replication, failover, and migrations beneath it).
DocDB handles horizontal sharding (distributing data rows across multiple independent database instances based on a partition key, so no single instance holds all the data) across thousands of shards, manages replication for high availability, and enables zero-downtime data migrations. As certain merchants grow rapidly and their shard fills up, it needs to be split. As the fleet evolves and some shards become underutilised, they can be consolidated. When MongoDB releases a new version, shards can be fork-lifted (migrating data to a new instance running the target version, avoiding multi-step in-place upgrades) to the new version rather than performing risky in-place upgrades. All of these operations have one requirement in common: Stripe cannot stop accepting payments while they happen.
Problem
Shard Splits and Consolidations Required Downtime
Without the Data Movement Platform, scaling Stripe's database fleet required either accepting downtime during shard operations or building complex dual-write logic for every migration. As Stripe's fleet grew to thousands of shards, this was operationally unsustainable and created real risk for every migration event.
Cause
Financial Data Cannot Tolerate Inconsistency
Payment data has zero tolerance for consistency errors — a payment record that exists on the source shard but hasn't yet appeared on the target is a payment that could be double-charged, lost, or corrupted if traffic switches at the wrong moment. The six-step protocol was designed specifically to guarantee that by the time traffic switches, the target is exactly consistent with the source including all writes made during migration.
Solution
CDC Replication + Correctness Verification
Stripe solved the consistency problem using Change Data Capture (a technique that continuously reads the MongoDB operation log (oplog) to stream every write applied to the source shard to the target, keeping the target synchronised even as live traffic modifies the source data). After CDC replication catches up to near-real-time, correctness checks compare source and target before traffic is switched. The switch itself is atomic from the application's perspective.
Result
1.5 Petabytes Moved in 2023 Transparently
In 2023 alone, Stripe migrated 1.5 petabytes of data between shards, consolidated thousands of databases through bin packing, and upgraded the entire MongoDB fleet — all with zero application downtime and no payment processing interruptions.
The Fix
DocDB Architecture: The Database-as-a-Service Abstraction
DocDB's architecture is a three-tier system sitting between Stripe's application code and raw MongoDB instances. The Database Proxy is the entry point for all application read/write requests — it performs access control checks, validates queries, and routes requests to the correct shard by consulting the chunk metadata service. The Chunk Metadata Service maintains the authoritative map of which data chunks live on which shards. The Database Shards are replicated MongoDB instances that store the actual data. Applications talk only to the proxy; they are completely unaware of sharding, shard splits, or migrations in progress.
- $1T+ — payment volume processed in 2023; the financial stakes that make data consistency non-negotiable
- 5M QPS — database queries per second sustained throughout fleet-wide migrations
- 1.5 PB — data migrated between shards in 2023 alone, transparently to all applications
- ~75% — reduction in total DocDB shard count through bin-packing consolidation in 2023
# Simplified 6-step Data Movement Platform migration flow
# Each step is atomic and resumable — migrations can be paused and continued
class DataMovementPlatform:
def migrate_chunk(self, chunk_id: str, source_shard: str, target_shard: str):
# Step 1: Register migration plan — visible to monitoring and audit
self.chunk_metadata.register_migration(
chunk_id=chunk_id,
source=source_shard,
target=target_shard
)
# Step 2: Pre-build indexes on target BEFORE data arrives
# Building indexes on empty data is far cheaper than on a loaded dataset
self.build_indexes_on_target(target_shard, chunk_id)
# Step 3: Bulk copy snapshot at time T
# Uses purpose-built I/O patterns for high-throughput sequential writes
# (Standard MongoDB write path is too slow for petabyte-scale bulk loads)
snapshot_timestamp = self.bulk_copy_snapshot(chunk_id, source_shard, target_shard)
# Step 4: Stream CDC replication — catch up all writes since snapshot
# Reads MongoDB oplog on source; applies to target until near-real-time
# Target is now fully synchronised with source
self.cdc_replicate_to_target(
source_shard, target_shard, since=snapshot_timestamp
)
# Step 5: Correctness verification — compare source and target
# Financial data requires FULL consistency before any traffic switch
# A single inconsistency = potential double charge or lost payment
assert self.verify_consistency(chunk_id, source_shard, target_shard)
# Step 6: Atomic traffic switch — update chunk metadata, switch routing
# Applications querying this chunk now get routed to target
# The proxy handles this transparently; no application restart needed
self.chunk_metadata.set_active_shard(chunk_id, target_shard)
self.chunk_metadata.deregister_from_source(chunk_id, source_shard)
Correctness verification (Step 5) is the most cautious part of the migration protocol. The platform compares a sample of records between source and target shards after CDC replication has caught up. For financial data, even a single inconsistency before the traffic switch would be unacceptable — a payment that exists on the source but not on the target could be double-charged or lost. The verification step is the safety gate that makes five-nines availability compatible with live shard migrations. The cost is time — migrations take longer because of the verification window — but that cost is the explicit price of correctness guarantees on financial data.
The bulk load throughput engineering challenge
During testing, Stripe found that standard MongoDB write patterns were insufficiently fast for bulk data loading during shard migrations. Batching writes and tuning engine parameters both failed to resolve the throughput bottleneck. The root cause: the standard MongoDB write path is optimised for low-latency individual writes, not for high-throughput sequential bulk loads. The engineering team built custom I/O patterns specifically for the bulk copy phase — patterns that bypassed some standard write overhead in favour of throughput.
The fork-lift upgrade strategy
Traditional in-place database major version upgrades require going through each intermediate version sequentially — upgrading from MongoDB 4.0 to 5.0 to 6.0, each step requiring careful validation. Stripe's Data Movement Platform enables a fork-lift strategy: provision a new shard running the target version, migrate the data to it, switch traffic, decommission the old shard. Any version can jump to any other version in a single migration step. This eliminates the risk accumulation of multi-step in-place upgrades.
Multitenant to single-tenant: isolation on demand
DocDB supports migrating a large merchant's data from a shared multitenant shard (multiple merchants on one shard) to a dedicated single-tenant shard (one merchant per shard). This is done transparently via the Data Movement Platform: the merchant's data is migrated to a dedicated shard, traffic routing is updated atomically, and the merchant gets dedicated resources without any downtime or visible change in behaviour. This capability is increasingly important as Stripe's largest customers grow to Shopify, Amazon, and OpenAI scale.
Architecture
DocDB's architecture enforces a clean separation between application code and database topology. Applications at Stripe never connect directly to MongoDB instances — they connect to the Database Proxy, which is the single point of truth for routing, access control, and scalability decisions. This indirection is what makes zero-downtime migrations possible: the proxy can update its routing table atomically as migrations complete, and applications never see anything other than consistent data.
DocDB Architecture: Three-Tier Database-as-a-Service
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Data Movement Platform: Six-Step Migration Protocol
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Lessons
A database abstraction layer is an operational multiplier. Stripe's applications never talk to MongoDB directly — they talk to the proxy. This indirection cost engineering time upfront but enabled zero-downtime migrations, transparent sharding, and fleet-wide upgrades for a decade of scale growth. The abstraction layer is where scaling strategies live.
Change Data Capture (reading a database's operation log to stream every change to a downstream consumer in real time) is the foundation of live migration. Without CDC, migrating a live database requires a maintenance window. With CDC, you copy a snapshot, stream the delta, verify consistency, then switch traffic atomically. Build CDC capability into your database infrastructure before you need live migrations.
Pre-build indexes on the target before loading data. Loading data first and then building indexes on a large dataset is far more expensive than building indexes on empty data and then inserting. For petabyte-scale migrations, this ordering difference can be the difference between hours and days.
Correctness verification before the traffic switch is not optional for financial data. A migration that completes fast but introduces even a single data inconsistency is worse than a slow correct migration. For domains where correctness is non-negotiable, treat Step 5 (verification) as the most important step in your migration protocol.
Bin-packing (consolidating many small underutilised shards into fewer larger shards) is as important as shard splitting for long-term database fleet health. As traffic patterns shift, some shards become cold. Without consolidation, you accumulate operational overhead and hardware waste. Plan for bidirectional shard topology management from day one.
Engineering Glossary
Bin-packing — consolidating many small, underutilised database shards into fewer larger shards to reduce operational overhead and hardware costs. The reverse of shard splitting. Stripe reduced their total shard count by ~75% through bin-packing in 2023 while moving 1.5 petabytes.
Change Data Capture (CDC) — a technique that reads a database's operation log (oplog in MongoDB) to continuously stream every write applied to the source to a downstream consumer in real time. The foundation of Stripe's live migration capability — allows the target shard to stay synchronised with the source while both are serving live traffic.
Chunk metadata service — the central catalog that tracks which data chunks live on which shards, serving as the source of truth for query routing across Stripe's fleet. Updated atomically during the traffic switch step of every migration.
Database-as-a-Service (DBaaS) — an abstraction layer that gives application developers a simple API for data access while hiding the complexity of sharding, replication, failover, and migrations beneath it. DocDB is Stripe's internal DBaaS built on top of MongoDB Community.
Fork-lift upgrade — migrating data to a new database instance running the target version, then switching traffic, rather than performing multi-step in-place upgrades through each intermediate version. Enables any version jump in a single migration step.
Horizontal sharding — distributing data rows across multiple independent database instances (shards) based on a partition key, so no single instance holds all the data and traffic is distributed. Managed transparently by DocDB's proxy and chunk metadata service.
Oplog — MongoDB's operation log; a capped collection that records every write operation applied to the database in sequence, used for replication across MongoDB replica sets. The technical foundation of CDC streaming in DocDB — replaying the oplog on the target shard keeps it synchronised with the source during migration.
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack →
(Interactive diagrams, source links, and the full reader experience)
TechLogStack — built at scale, broken in public, rebuilt by engineers.
Top comments (0)