AWS Multi Region Failover Infrastructure

#aws #devops #cloud #region

Introduction

So you have selected a region closer to your target customer you want to make it Highly-Available. For that typically we create different subnets spanning over 2 or more AZs (Availability Zone) and call it a day.
But hey, a Region Consists of multiple AZs, what if the whole region gets down? Also, what if your target customers are from different regions?
Both the cases we can simply use AWS Global Accelerator which does the following things -

Routes user traffic through the AWS global network (not the public internet), reducing latency and jitter.
Continuously monitors endpoint health and routes traffic away from unhealthy ones.
Supports intelligent traffic routing to multiple AWS regions or endpoints based on health, geography, or weights.

Why are we here?

Well, when talking about a multi-region failover architecture, it's not that straight forward. A typical application consists of user facing Frontend, Backend, DataBase layer etc. all needs to be designed in a way that at any time the application serves the same thing from any of the selected regions.

This blog will discuss about those decisions.

1. Stateless Frontend & Backend: The First Step Toward Scalability

The first thing to do is to make your application stateless, meaning it won't save any data in itself but will fetch the data from a DB.
For a better performance and faster deployment time we can containerise the applications, because, why not!

And the same application will be deployed across multiple regions. Writing the infra using an IaC tool like Terraform simplifies things here.

2. Shared & Synced Data Layer Across Regions

Now comes the most challenging part. We will have multiple db instances across multiple regions but they need active syncing else not all the users across the regions will get the same results. The solution is to deploy db across 2 regions considering one primary and other secondary (read replica).
AWS aurora global DB can be used here, but it does not support failover automatically (promotion of database), and the db endpoint will change. For that a route53 (it's global) record can be used to to provide a custom db url like (db.my_app.com).

But since as of May24,2025 AWS does not support db promotion upon a failover (we still have to do it manually, a lambda function can be used to promote the db of another region to become the master (read, write).

3. Static Contents with cross region replication

For static assets we usually use an s3 bucket along with cloudfront to serve the contents. Since the buckets are also regional, we have to have multiple buckets across different regions and we can just use s3 cross region replication, the data will be synced across multiple regions automatically, and on cloudfront we can define a primary and secondary region, it will automatically serve from the healthy region.

Final Infrastructure

Finally here's a sample infrastructure where the
a. Frontend is a static website served from s3 bucket using cloudfront
b. The other static contents are served from again - s3 bucket
c. The business logic layer (Backend) is an ecs cluster spanned across multiple AZs and Regions at the same time
d. The DB has 1 master in one AZ, others are read replicas. Will be promoted by a lambda function which can run periodically or manually if needed.