Mukami

Posted on Mar 20

Deploying a Highly Available Web App on AWS Using Terraform

#devops #tutorial #aws #terraform

From One Lonely Server to a Party of Instances

Or: How I Learned to Stop Worrying and Love the Load Balancer

Remember yesterday? I was that person celebrating a single server like I'd just landed a rocket on Mars. "Look, everyone! It says Hello World!" Cute, right? Well, today we're trading in that tricycle for a Ferrari.

Welcome to Day 4 of the 30-Day Terraform Challenge, where we're about to make our infrastructure so "highly available" that even if an AWS data center decides to take an unscheduled nap, our website will still be serving cat memes (or whatever you're hosting).

The "Wait, What's DRY?" Moment

Before we go all "architect" on this, let's talk about DRY. No, not your skin after a long day in the sun. DRY = Don't Repeat Yourself.

Picture this: You're on a team of 10 developers. Everyone hardcodes their instance types. Dave uses "t2.micro" because he's cheap. Sarah uses "t3.large" because she likes to live dangerously. Meanwhile, poor DevOps Patricia is losing hair trying to figure out why dev, staging, and prod all look like a chaotic game of Mad Libs.

That's where input variables come in to save the day (and Patricia's remaining hair follicles).

The Magic of Variables

Instead of this:

instance_type = "t2.micro"  # Hardcoded like it's 1999

We do this:

instance_type = var.instance_type  # Smart like it's 2026

And then in variables.tf:

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t2.micro"
}

Now Dave can use terraform apply -var="instance_type=t2.micro" and Sarah can use -var="instance_type=t3.large", and they're both using the SAME CODE. Mind. Blown.

The Cluster: Because One Server is Lonely

Remember our single server from Day 3? It was like that one friend who's always reliable but if they get sick, your whole social life crashes. Literally. No server = no website.

Enter the cluster: think of it as having multiple friends who can all host the party. If one gets tired (or an AZ goes down), the others keep the music playing.

What We Built:

Internet  → Load Balancer (The Bouncer) → Target Group (The VIP List) → 
Auto Scaling Group (The Clone Army) → Multiple EC2 Instances (The Party Hosts)

The Cool Parts:

1. Data Sources = Terraform's Google Search

data "aws_availability_zones" "available" {
  state = "available"
}

This is Terraform asking AWS: "Hey, which neighborhoods (AZs) are open for business right now?" No hardcoding means if AWS opens a new AZ tomorrow, my code already knows about it. It's like having a friend who always knows which clubs are open.

2. Security Group Chaining = The Velvet Rope Strategy

# Load Balancer Security Group
ingress {
  from_port   = 80
  to_port     = 80
  cidr_blocks = ["0.0.0.0/0"]  # Everyone welcome to the club!
}

# Instance Security Group
ingress {
  from_port       = 80
  to_port         = 80
  security_groups = [aws_security_group.alb_sg.id]  # ONLY the bouncer can let you in
}

The instances ONLY trust the load balancer. No one can sneak in directly. It's like having a VIP section that only the bouncer knows about. 🥂

3. Launch Template = The Clone Blueprint

resource "aws_launch_template" "web" {
  name_prefix   = "party-host-"
  image_id      = data.aws_ami.amazon_linux_2.id
  instance_type = var.instance_type
  user_data = base64encode(<<-EOF
    #!/bin/bash
    # Party setup script goes here
  EOF
  )
}

This is the recipe for creating identical instances. Like a cloning machine, but for servers. 🧬

4. Auto Scaling Group = The Clone Army Commander

resource "aws_autoscaling_group" "web" {
  min_size = 2  # Never fewer than 2 party hosts
  max_size = 5  # But we can scale up to 5 if the party gets wild
  desired_capacity = 2  # Start with 2, because balance

  # Connect to the load balancer
  target_group_arns = [aws_lb_target_group.web.arn]
}

This ensures we always have between 2-5 instances running. If one crashes, the ASG spawns a replacement like it's playing whack-a-mole. 🔄

5. Load Balancer = The Traffic Cop

resource "aws_lb" "web" {
  name = "the-bouncer"
  # ... config ...
}

resource "aws_lb_target_group" "web" {
  # The VIP list - tracks healthy instances
}

resource "aws_lb_listener" "web" {
  # Listens on port 80 and forwards to the target group
  default_action {
    type = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

The load balancer:

Gets all traffic (one DNS name to rule them all)
Checks which instances are healthy (are they serving pages?)
Forwards requests to healthy instances (round-robin style)

The "Oh No, It's Broken!" Moments (A.K.A. Learning)

Disaster #1: The Dependency Tango

Error: Cycle: aws_security_group.instance_sg, aws_security_group.alb_sg

I created a circular dependency where both security groups referenced each other. It was like two people waiting for the other to open the door. The fix? The instance SG references the ALB SG, but the ALB SG doesn't need to reference back. Simple when you know how!

Disaster #2: The Ghost Instances
My instances were created but the load balancer kept marking them as unhealthy. Turns out, they needed a "grace period" to finish setting up before being judged:

health_check_grace_period = 300  # Give 'em 5 minutes to get ready!

Disaster #3: The Forgotten Connection
Instances were running, load balancer was running, but no traffic. I forgot to connect them:

# This line in the ASG was missing:
target_group_arns = [aws_lb_target_group.web.arn]

Without this, the ASG and Target Group were like two ships passing in the night. 🌙

The "It Works!" Moment

After wrestling with configurations, debugging errors, and questioning my life choices, I ran:

terraform apply

And then the magic happened:

alb_url = "http://dev-web-alb-1234567890.eu-north-1.elb.amazonaws.com:80"

I opened my browser, pasted the URL, and THERE IT WAS. My webpage, being served by... wait, let me refresh... DIFFERENT INSTANCE! And again... ANOTHER INSTANCE!

It was like watching a tennis match, but instead of a ball, it's HTTP requests flying between servers.

What I Learned (Besides How to Spell "Availability")

Single Server = Single Point of Failure - Like having one friend with the party snacks. If they're stuck in traffic, everyone's hungry.
Clustered = Party Proof - Multiple hosts, load balancer directing traffic, auto-scaling replacing fallen soldiers. It's infrastructure with a safety net.
Variables = Sanity - Without them, you're copy-pasting values like a caveman. With them, you're a civilized engineer who can change everything in one place.
Data Sources = Intelligence - Querying AWS for live data means your code works everywhere, every time. It's like writing a recipe that asks "what ingredients are fresh today?" instead of assuming.
Errors are Teachers - Every "Cycle" error, every "unhealthy instance" taught me more about how AWS resources actually connect. The errors aren't bugs, they're plot twists.

The Bottom Line

What took me hours of debugging and multiple cups of coffee today will save me (and my team) days of work down the line. We've moved from "it works on my machine" to "it works across multiple machines, in multiple availability zones, even if some machines die."

And that, my friends, is the difference between a toy deployment and production-ready infrastructure.

Now if you'll excuse me, I need to go destroy all these resources before my AWS bill looks like my student loans.

terraform destroy --auto-approve

DEV Community