π Overview
In real-world systems, downtime is not an option. Cloud applications must survive instance failures, service outages, and even full regional failures.
In this blog, Iβll walk you through a hands-on AWS Multi-Region Disaster Recovery (DR) architecture that automatically shifts traffic to a secondary region without manual intervention.
This project simulates how enterprise systems survive regional outages.
ποΈ Architecture Diagram
π§ Architecture Components
Primary Region (ap-south-1)
- Application Load Balancer (ALB)
- Auto Scaling Group (EC2 instances from AMI)
- Amazon RDS (Primary)
- Amazon EFS (Shared storage)
Disaster Recovery Region (us-east-1)
- Application Load Balancer (ALB)
- Auto Scaling Group (EC2 from copied AMI)
- Amazon RDS Read Replica
- Amazon EFS backup / replicated data
Global Services
- Amazon Route 53 (DNS Failover & Health Checks) ## π Disaster Recovery Flow
- User traffic enters through Amazon Route 53
- Route 53 routes traffic to Primary ALB (ap-south-1)
- Health checks continuously monitor application health
- On failure detection:
- Route 53 redirects traffic to DR ALB (us-east-1)
- Auto Scaling launches EC2 instances from copied AMI
- RDS Read Replica is promoted to Primary
- Application becomes available from DR region without manual intervention
π Implementation Details
Detailed, step-by-step implementation guides with screenshots and diagrams
are available in the /steps directory:
- EC2 & AMI creation
- ALB & Auto Scaling setup
- Route 53 DNS failover
- RDS cross-region replication & promotion
- EFS backup strategy
Step 1: EC2 & AMI Setup (Primary Region)
In this step, we launch an EC2 instance in the primary region
(ap-south-1) and deploy an NGINX application.
Objective
Launch an EC2 instance and prepare a reusable AMI for Auto Scaling and DR.
Why This Step?
- EC2 hosts the application
- AMI ensures consistent server configuration
- Enables fast recovery in another region
Services Used
- EC2
- AMI
- Security Groups
Implementation Steps
1οΈβ£ Launch EC2
2οΈβ£ Install Application
sudo yum install nginx -y
sudo systemctl start nginx
Verify Application
Open browser
http://<ec2-public-ip>
Create AMI
- EC2--> Action --> Create Image
- Name: primary-app-ami
OUTCOME
- Application is running
- AMI created for auto scalling and DR region
Step 2: ALB & Auto Scaling Setup
Objective
Ensure high availability and self-healing EC2 infrastructure.
Why This Step?
- ALB distributes traffic
- Auto Scaling replaces failed instances automatically
Services Used
- ALB
- Target Group
- Auto Scaling Group
- Launch Template
Implementation Steps
1οΈβ£ Create Target Group
- Type: Instance
- Protocol: HTTP
- Health check path: / git pull --rebase origin main
2οΈβ£ Create Application Load Balancer
- Internet-facing
- Select ALL AZs (Best Practice)
- Attach target group
3οΈβ£ Create Launch Template
- Use AMI from Step 1
- Select correct VPC Security Group
4οΈβ£ Create Auto Scaling Group
- Desired: 2
- Min: 1
- Max: 4
- Attach ALB
Outcome
- EC2 instances auto-heal
- Application always available
Step 3: Route 53 Failover Routing
Route 53 continuously monitors the health of the primary ALB
and redirects traffic to the DR region during failure.
Objective
Automatically route traffic to DR region during failure.
Why This Step?
- DNS-based failover
- No manual intervention needed
Services Used
- Route 53
- Health Checks
- ALB
Implementation Steps
1οΈβ£ Create Hosted Zone
Domain:
riteshdev.me
2οΈβ£ Create Health Check
- Endpoint: Primary ALB
- Path: /
- Failure threshold: 3
3οΈβ£ Create DNS Records
Primary Record
- Routing: Failover (Primary)
- Alias β Primary ALB
- Evaluate target health: Yes
Secondary Record
- Routing: Failover (Secondary)
- Alias β DR ALB
Outcome
- Traffic shifts automatically on failure
Step 4: RDS Disaster Recover
A cross-region read replica is maintained and promoted during
regional failure.
Objective
Protect application data using cross-region replication.
Why This Step?
- EC2 is stateless
- Database must survive region failure
Services Used
- RDS
- Cross-Region Read Replica
Implementation Steps
1οΈβ£ Create Primary RDS
- Region: ap-south-1
- Engine: MySQL
- Optional: Multi-AZ
2οΈβ£ Create Read Replica
- Region: us-east-1
- Continuous replication
3οΈβ£ Promote Replica (DR Test)
- RDS β Promote read replica
Outcome
- Near-zero data loss
- Production-ready DR database
π― Key Objectives
- Build highly available infrastructure across multiple Availability Zones and Regions
- Implement automatic DNS-based failover
- Enable stateless application recovery using AMIs and Auto Scaling
- Protect stateful data (RDS & EFS) against regional failures
- Validate DR using real failure simulations
π οΈ AWS Services Used
| Category | Services |
|---|---|
| Compute | EC2, AMI, Auto Scaling |
| Networking | VPC, ALB, Route 53 |
| Storage | EBS Snapshots, EFS |
| Database | RDS (Primary + Cross-Region Read Replica) |
| Security | IAM, Security Groups |
| Monitoring | Route 53 Health Checks |
π Regions Used
- Primary Region: ap-south-1
- Disaster Recovery Region: us-east-1
π Project Structure
aws-multi-region-dr/
β
βββ architecture/
β βββ dr-architecture.png
β
βββ steps/
β βββ ec2-setup.md
β βββ alb-asg.md
β βββ route53.md
β βββ rds-dr.md
β
βββ screenshots/
β
βββ README.md
π Disaster Recovery Flow
- User traffic enters through Route 53
- Primary Application Load Balancer (ALB) serves traffic from ap-south-1
- Route 53 continuously monitors application health
- On failure:
- Traffic is automatically routed to the DR region
- Auto Scaling launches EC2 instances from the copied AMI
- RDS Read Replica is promoted to primary database
β No manual intervention required
π§ͺ Failure Scenarios Tested
- Primary EC2 instance stopped
- Application (NGINX) service stopped
- Auto Scaling instance termination
- Route 53 failover validation
- RDS Read Replica promotion
β‘ Result: Application remained accessible via the DR region
π RTO & RPO (Design Targets)
| Metric | Value |
|---|---|
| RTO (Recovery Time Objective) | ~1β2 minutes |
| RPO (Recovery Point Objective) | Seconds (replication lag) |
π‘ Why This Project Stands Out
- Real production-style disaster recovery design
- Hands-on failure testing (not just theoretical concepts)
- Clean and modular documentation
- Covers both stateless (EC2) and stateful (RDS, EFS) components
- Strong interview-ready cloud project
This project simulates how enterprise systems survive regional outages.
π§ Key Learnings
- Difference between EC2 failover vs RDS failover
- DNS-based failover using Route 53
- Importance of AMI-based recovery
- Cross-region replication trade-offs
- Auto Scaling behavior during instance and service failures
π Future Enhancements
- Infrastructure automation using Terraform
- CI/CD pipeline integration
- CloudWatch alarms and notifications
- Centralized AWS Backup policies
- S3 cross-region replication
π£ About the Author
Ritesh
Aspiring Cloud & DevOps Engineer
Focused on building resilient, scalable, and secure AWS architectures
β For Recruiters
This repository demonstrates:
- Cloud architecture design skills
- Disaster recovery planning and execution
- Operational and troubleshooting mindset
- Strong technical documentation practices
π Please explore the **/steps directory

















Top comments (0)