Mary Mutua

Posted on Apr 28

Building a 3-Tier Multi-Region High Availability Architecture with Terraform

#architecture #aws #devops #terraform

Day 27 of my Terraform journey moved from a single-region scalable app to a multi-region high-availability design.

Yesterday, I built a scalable web application in one AWS region. Today, I expanded that pattern into a 3-tier architecture spread across two regions using:

a VPC per region
an ALB per region
an Auto Scaling Group per region
a primary Multi-AZ RDS instance
a cross-region read replica
optional Route53 failover DNS
reusable Terraform modules
remote state with S3 and DynamoDB

GitHub reference:

https://github.com/mary20205090/30-day-Terraform-Challenge/tree/main/day_27

Project Structure

For Day 27, I separated the infrastructure into five focused modules:

day27-multi-region-ha/
├── modules/
│   ├── vpc/
│   ├── alb/
│   ├── asg/
│   ├── rds/
│   └── route53/
├── envs/
│   └── prod/
├── bootstrap/
├── backend.tf
└── provider.tf

The goal was not just to make the stack work.

The goal was to make the design reusable, understandable, and safe to change across regions.

Why Five Modules Instead of One?

I split the project into five modules because each part has a different responsibility.

The vpc module owns networking:

VPC
public subnets
private subnets
internet gateway
NAT gateways
route tables

The alb module owns traffic entry:

Application Load Balancer
target group
listener
ALB security group

The asg module owns compute and scaling:

launch template
instance security group
Auto Scaling Group
scaling policies
CloudWatch CPU alarms

The rds module owns the database layer:

DB subnet group
RDS security group
primary RDS instance
cross-region replica logic

The route53 module owns failover DNS:

health checks
primary failover record
secondary failover record

If everything lived in one large file, it would still work, but it would be harder to reuse and harder to reason about.

Modules make the boundaries clear.

How the Modules Connect

The most important part of today was understanding the data flow between modules.

The VPC modules create the base networking:

module.vpc_primary.vpc_id
module.vpc_primary.public_subnet_ids
module.vpc_primary.private_subnet_ids

module.vpc_secondary.vpc_id
module.vpc_secondary.public_subnet_ids
module.vpc_secondary.private_subnet_ids

Those outputs feed the rest of the stack.

The ALB module in the primary region creates a target group:

module.alb_primary.target_group_arn

That output flows into the primary ASG module:

target_group_arns = [module.alb_primary.target_group_arn]

That tells the Auto Scaling Group where its EC2 instances should register.

Then the RDS primary module creates the main database and outputs:

module.rds_primary.db_instance_arn

That output flows into the replica module:

replicate_source_db = module.rds_primary.db_instance_arn

That tells AWS to create the secondary database as a cross-region read replica of the primary database.

This closes the loop:

VPC → ALB → Target Group → ASG → Primary RDS → Cross-Region Replica

That is what made the architecture feel like one connected system instead of separate AWS resources.

Deployment Output

After applying the Terraform plan, Terraform returned regional ALB DNS names.

I verified the primary ALB in the browser and the app responded with:

Region: us-east-1 | AZ: us-east-1b | Environment: prod

That confirmed the ALB, target group, Auto Scaling Group, and EC2 user data were all working together correctly.

Route53 Failover Design

The Route53 module was included in the project design to support DNS failover between the two regions.

In my actual lab run, I left Route53 disabled because I did not have a hosted zone and domain ready in the account. So I verified the stack through the ALB DNS names directly instead.

Still, the failover behavior is important to understand.

If the primary region fails, the sequence looks like this:

the primary Route53 health check fails
Route53 stops returning the primary record
clients continue using cached DNS until TTL expires
after TTL expiry, new DNS lookups resolve to the secondary record
traffic shifts to the secondary ALB
the secondary ALB continues serving the application tier in the backup region

That means failover is not instant in the same way an internal service failover might be. It depends on:

health check detection
Route53 failover policy
DNS cache expiry

Multi-AZ vs Cross-Region Read Replicas

One of the most useful lessons today was understanding that these solve different problems.

Multi-AZ protects against Availability Zone failure within one region.

If one AZ in us-east-1 fails, AWS can fail the database over to another AZ in that same region.

Cross-region read replicas protect against a full regional outage.

If the primary region has a larger failure, the replica already exists in the secondary region and can become part of the recovery plan.

So the difference is:

Multi-AZ = resilience within a region
cross-region replica = resilience across regions

You need both ideas to think clearly about high availability.

A Useful Debugging Lesson

Day 27 had a few real-world AWS constraints that made the project more realistic.

A few examples:

ALB naming had to be shortened to stay within AWS limits
the RDS module needed the application security group ID, not the ASG name
the cross-region replica needed proper encryption handling to be created from an encrypted source
the backend had to be bootstrapped first before the main environment could use remote state

That was a good reminder that Terraform can describe the architecture, but AWS service rules still matter.

Remote State and Bootstrap

Like previous days, I used a remote backend with:

S3 for Terraform state
DynamoDB for state locking

For this project, I also used a separate bootstrap stack to create the backend resources first.

That mattered because Terraform cannot use a backend bucket and lock table until they already exist.

Remote state keeps the stack safer and more realistic, especially once infrastructure starts growing beyond one simple environment.

Cleanup

After verifying the app worked, I destroyed both:

the Day 27 production stack
the bootstrap backend resources

This matters because NAT Gateways, ALBs, EC2 instances, and RDS resources can keep generating cost even after the learning task is complete.

Final Takeaway

Day 27 helped me connect several Terraform lessons into one practical multi-region system.

A high-availability architecture is not just “more resources.” It is the relationship between networking, load balancing, scaling, database topology, failover strategy, and state management.

The biggest lesson:

Terraform modules are not just for organizing files. They help define the boundaries of responsibility in infrastructure.

DEV Community