Multi-Region Disaster Recovery: What Your RPO/RTO Decisions Actually Cost

#multi #region #rpo #rto

Multi-Region Disaster Recovery: What Your RPO/RTO Decisions Actually Cost

Every RPO and RTO target in your DR plan has a line item attached to it. A 15-minute RPO costs a specific amount per month. A 5-minute RPO costs roughly twice that. Most teams discover these numbers on their cloud bill, not during architecture review.

This piece works through the cost structure of each DR tier, using a representative 3-tier application as the base case. By the end you will have a model you can apply to your own workload.

Your RPO Is a Price Tag, Not a Policy

RPO and RTO are often treated as compliance checkboxes, agreed in a governance meeting and forgotten until an incident. They are actually financial commitments. Honoring a 5-minute RPO on a write-heavy PostgreSQL database costs real money every hour the database runs.

The cost driver is replication. Tighter RPO means more frequent replication, which means more cross-region data transfer, more replication instances, and in some cases synchronous writes that add latency to every transaction.

Each step right on this diagram roughly doubles the monthly infrastructure cost relative to a single-region baseline. The jump from warm standby to active-active is smaller than most teams expect, which is the source of a common budget miscalculation.

Active-Active vs Active-Passive: The 50% Illusion

Teams frequently choose active-passive to avoid the cost of active-active, then discover that warm standby still costs 60 to 70% of a full active-active deployment. The reason is that "passive" does not mean "off."

A warm standby runs your full stack at reduced capacity in the DR region. Your database replica is running. Your application tier is running at minimum scale. Your load balancer and networking are provisioned. All of that costs money continuously, not just during a failover.

DR Tier	Monthly Cost Multiplier	RTO	RPO	What Is Running in DR Region
Backup and restore	1.1x	4-24 hours	1-24 hours	Nothing, restore from S3
Warm standby	1.6x	15-60 min	15-60 min	Scaled-down app, replica DB
Active-passive hot	1.8x	5-15 min	5-15 min	Full stack, scaled-down
Active-active	2.0x	Under 1 min	Near-zero	Full stack, full scale

For a $10,000 per month single-region deployment, warm standby costs $16,000 and active-active costs $20,000. The difference is $4,000, not $10,000. If your business case justifies warm standby at $16,000, it probably justifies active-active at $20,000. The gap between "somewhat protected" and "fully protected" is narrower than the headline costs suggest.

The case for active-passive holds when your RTO tolerance is measured in minutes rather than seconds. If a 15-minute outage is acceptable, warm standby is the right call. If it is not, the $4,000 difference is a straightforward investment. Kubernetes autoscaling for cost efficiency reduces the DR region standby cost further by right-sizing the passive fleet.

The Replication Tax: Where the Real Money Goes

Cross-region replication has two cost components: the compute cost of running replica infrastructure and the transfer cost of moving data between regions. Transfer cost is the one that surprises teams.

AWS charges $0.02 per GB for data transferred between US-East and EU-West. That adds $2,000 per month for every 100TB replicated. A write-heavy application generating 10TB of database changes per day incurs $60,000 per year in transfer charges alone, before touching compute.

Synchronous replication costs more than transfer fees. Achieving RPO under 5 minutes on a PostgreSQL database requires synchronous commits, which means every write waits for the DR replica to acknowledge before returning success. Cross-region round-trip latency between US-East and EU-West is 80 to 120ms. Every write in your application now has an 80ms floor on its response time. This is why near-zero RPO targets often force cloud architecture decisions that have broader performance implications.

RDS Multi-AZ, which is in-region rather than cross-region, doubles the database instance cost and adds $0.02 per GB in synchronous I/O charges. It does not protect against a regional outage. Teams frequently confuse Multi-AZ availability (for hardware failures) with DR readiness (for regional failures). They are different products at different price points.

A Real 3-Tier App DR Cost Model

The base case: a 3-tier web application running in us-east-1, consisting of an application layer on EKS, a PostgreSQL database on RDS, and static assets on S3. Single-region cost is $10,000 per month.

Component	Single Region	Backup/Restore	Warm Standby	Active-Active
Application tier (EKS)	$4,000	$0	$1,200	$4,000
Database (RDS)	$3,000	$300 (snapshot)	$2,100	$3,000
Cross-region transfer	$0	$200	$800	$1,200
S3 replication	$0	$0	$200	$200
Networking and LB	$1,500	$0	$600	$1,500
Route 53 health checks	$0	$0	$50	$50
Monthly total	$10,000	$11,000	$16,450	$19,950
Annual DR premium	-	$12,000	$77,400	$119,400

The backup and restore tier adds only $12,000 per year but delivers a 4 to 24 hour RTO. For internal tools and non-revenue workloads, this is often the right answer.

Warm standby at $77,400 per year is the most common choice for production SaaS. The 15 to 60 minute RTO is acceptable for most applications that are not processing real-time payments or trading. The cost scales predictably: a $50,000 per month application at warm standby costs roughly $380,000 per year in DR overhead.

Matching DR Spend to Business Downtime Cost

The right DR tier is the cheapest one where the annual DR premium is less than the expected annual cost of downtime without it. This calculation requires knowing your revenue-per-minute during peak hours.

Revenue per Minute (Peak)	Acceptable RTO	Recommended DR Tier	Annual DR Investment
Under $500	Hours	Backup and restore	$10,000-20,000
$500-$2,000	15-60 min	Warm standby	$50,000-150,000
$2,000-$10,000	5-15 min	Active-passive hot	$80,000-250,000
Over $10,000	Under 1 min	Active-active	$100,000-400,000

The break-even math for warm standby: if your application generates $1,000 per minute in revenue and you experience one 2-hour outage per year, your expected downtime cost is $120,000. Warm standby for a $10,000 per month application costs $77,400 per year. The investment pays for itself in less than one full incident.

FinOps cost allocation practices make this calculation easier by attributing DR costs directly to the revenue streams they protect, rather than pooling them into shared infrastructure overhead.

Teams that skip this math tend to either over-provision DR (paying for active-active when warm standby covers the risk) or under-provision it (using backup-and-restore for payment processing). Both are expensive in different ways. The downtime cost of under-provisioned DR is visible on P&L reports. The waste cost of over-provisioned DR only shows up when someone runs cloud cost optimization across the full infrastructure spend.

Build the downtime cost model before the architecture review. It makes every DR design decision a financial decision with clear inputs rather than a risk conversation with no anchor.