DEV Community

Udoh Deborah
Udoh Deborah

Posted on

Day 4: Scaling to the Clouds – Building a Self-Healing Cluster on AWS with Terraform

After successfully launching a single web server on Day 3, today was all about High Availability (HA). I moved away from the "Single Point of Failure" model and built a distributed system that can handle traffic spikes and hardware failures automatically.

The Architecture: From One to Many
Instead of one lonely EC2 instance, I deployed an Application Load Balancer (ALB) sitting in front of an Auto Scaling Group (ASG).

The Stack:

Infrastructure as Code: Terraform

Compute: AWS EC2 (t3.micro)

Scaling: Auto Scaling Group (min: 2, max: 5)

Networking: Application Load Balancer (ALB)

Configuration: Launch Templates with Base64 User Data

The Reality Check: Troubleshooting the "502 Bad Gateway"
If you think Cloud Engineering is just writing code and hitting "Apply," Day 4 will humble you. I ran into several "final bosses" today:

The Launch Template Trap: Unlike standard EC2 instances, Launch Templates require user data to be Base64 encoded. Without this, the bash script never runs, and the server never starts.

The Silent Firewall: I had my security groups open for inbound traffic, but I forgot the egress (outbound) rules. If the ALB can't "talk" to the instances to check their health, it marks them as unhealthy and gives you a 502 Bad Gateway.

Target Group Health Checks: I learned that the "Health Check" is a literal conversation. If the timeout is too short, the instance gets killed before it even finishes booting. Relaxing the health check intervals was the key to stability.

Key Takeaways
DRY (Don't Repeat Yourself): Used Terraform variables to ensure the Load Balancer and the EC2 instances were always aligned on the same port (8080).

Self-Healing: I tested the ASG by manually terminating an instance, and watched as AWS automatically detected the loss and spun up a replacement.

Decoupling: By using an ALB, my "users" only ever see one DNS name, even if the servers behind it are constantly changing.

What’s Next?
Day 4 was a masterclass in networking and state management. Tomorrow, I’ll be diving into Terraform State and how to manage these resources in a team environment without causing a "state-file war."

Top comments (0)