Oluwasegun Adedigba for AWS Community Builders

Posted on Oct 2

Provisioning AWS Infrastructure using Terraform and GitHub Actions

#tutorial #devops #aws #database

Introduction

Availability, repeatability, and security are table stakes for production workloads. This guide provisions a baseline AWS stack with Terraform that's resilient to AZ failures, enforces least-privilege boundaries, integrates with CI/CD, and manages Terraform state with an S3 and DynamoDB backend. We'll deploy four Apache web servers across two availability zones with a Multi-AZ RDS database, all automated through GitHub Actions.

Why This Matters

Incidents rarely happen at convenient times. You want deterministic deployments, blast-radius isolation, and multi-AZ redundancy so failures degrade gracefully. When one availability zone experiences issues, your application continues running on servers in the healthy zone without any manual intervention.

Equally important is proper Terraform state management. State must be remote rather than stored on a laptop, it must be locked to prevent concurrent modifications, it must be encrypted and versioned for security and recovery, and it should be accessible via IAM rather than passed around in Slack or email. This becomes critical when working in teams or using automated CI/CD pipelines because everyone needs access to the same source of truth about what infrastructure exists.

This article covers both infrastructure resources and state management so you can run with confidence. We'll also set up a complete GitHub Actions pipeline that automatically detects changes to your Terraform code and deploys them, replacing the manual apply process with automated continuous deployment.

What We're Building

The architecture we're building includes these components working together to create a highly available system. We'll provision a VPC with public and private subnets spread across two availability zones in the London region. In the public subnets, we'll deploy four EC2 instances running Apache web servers, with two instances in each availability zone. An Application Load Balancer will distribute incoming traffic across these four servers, automatically routing requests away from any unhealthy instances.

For the database tier, we'll create a Multi-AZ RDS MySQL instance that automatically maintains a standby replica in a different availability zone. If the primary database fails, RDS automatically promotes the standby to become the new primary without requiring any code changes. The database will live in private subnets with no internet access, protected by security groups that only allow connections from the web servers.

For state management, we'll configure an S3 bucket with versioning and encryption to store the Terraform state file, along with a DynamoDB table that provides locking to prevent multiple people or automation pipelines from modifying the infrastructure simultaneously. Finally, we'll set up GitHub Actions workflows that automatically run terraform plan on pull requests so you can review changes, and terraform apply when changes merge to the main branch, giving you the same automation benefits that Jenkins provides but using GitHub's native platform.

Here's what we're deploying:

Networking: VPC, 2 public subnets, 2 private subnets, Internet Gateway, route tables
Compute: 4 Apache web servers across 2 availability zones in an Auto Scaling Group
Ingress: Application Load Balancer with health checks and automatic failover
Data: RDS MySQL Multi-AZ in isolated private subnets
Security: Security groups scoped per role, encrypted storage, IMDSv2 enforcement
State: S3 remote state with DynamoDB locking, versioned and encrypted
CI/CD: GitHub Actions pipeline for automated terraform plan and apply
Observability: CloudWatch metrics and alarms

Prerequisites

Before starting, you'll need Terraform version 1.6 or higher installed on your local machine. You'll also need the AWS CLI configured with an IAM user or role that has permissions to create VPC, EC2, RDS, and S3 resources. While we'll create the S3 bucket and DynamoDB table for state management as our first step, you'll need initial AWS credentials to bootstrap that infrastructure.

You should also have a GitHub account and a repository where you'll store your Terraform code. The GitHub Actions workflows will run directly in your repository, so you'll need to configure AWS credentials as GitHub Secrets to allow the automation to deploy infrastructure on your behalf.

Step 1. Remote State Backend (S3 + DynamoDB)

The first thing we need to do is create the infrastructure that will manage our Terraform state. This is a one-time bootstrap process. We're creating an S3 bucket to store the state file and a DynamoDB table to provide state locking. The bucket will have versioning enabled so you can recover from accidental deletions or corrupted state, and we'll enforce encryption at rest using AES256. We're also blocking all public access to ensure the state file, which may contain sensitive information like database passwords, remains private.

The DynamoDB table uses on-demand billing so you only pay for the lock operations that actually occur, which is minimal. Terraform will write a lock entry to this table whenever someone runs an apply or plan operation, preventing others from making concurrent changes that could corrupt your infrastructure.

Create a file called backend-bootstrap.tf:

# This is a one-time setup file to create the S3 bucket and DynamoDB table
# After running this once, you can delete this file or move it to a separate directory

provider "aws" {
  region = "eu-west-2"
}

# S3 bucket to store Terraform state files
resource "aws_s3_bucket" "state" {
  bucket = "tf-state-prod-stack-eu-west-2"

  # Prevent accidental deletion of the state bucket
  lifecycle {
    prevent_destroy = true
  }
}

# Enable versioning so we can recover from bad state changes
resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id

  versioning_configuration {
    status = "Enabled"
  }
}

# Encrypt state files at rest for security
resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
  bucket = aws_s3_bucket.state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

# Block all public access to the state bucket
resource "aws_s3_bucket_public_access_block" "state" {
  bucket                  = aws_s3_bucket.state.id
  block_public_acls       = true
  block_public_policy     = true
  restrict_public_buckets = true
  ignore_public_acls      = true
}

# DynamoDB table for state locking to prevent concurrent modifications
resource "aws_dynamodb_table" "lock" {
  name         = "tf-state-locks-prod-stack"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Run this bootstrap process once:

terraform init
terraform apply

After the bucket and table are created, you can delete this bootstrap file or move it to a separate directory. The state infrastructure is now ready to use.

Step 2. Backend Configuration

Now that we have our state storage infrastructure, we need to configure Terraform to use it. We'll create a backend configuration file for each environment. This separation allows you to have different state files for development, staging, and production environments, preventing changes in one environment from affecting others.

Create a directory structure for your environments and add a backend configuration file. For production, create envs/prod/backend.hcl:

bucket         = "tf-state-prod-stack-eu-west-2"
key            = "envs/prod/global.tfstate"
region         = "eu-west-2"
dynamodb_table = "tf-state-locks-prod-stack"
encrypt        = true

In your main Terraform directory, create a main.tf file and add the backend configuration block. Notice that we don't specify the actual bucket name here because we'll pass that in via the backend config file. This allows us to use the same Terraform code across multiple environments:

terraform {
  # Require Terraform version 1.6 or higher
  required_version = ">= 1.6"

  # Backend configuration for remote state storage
  backend "s3" {}

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "eu-west-2"
}

Initialize Terraform with the backend configuration:

terraform init -backend-config=envs/prod/backend.hcl

Terraform will now store its state remotely in S3 and use DynamoDB for locking. You can verify this worked by checking that Terraform created a .terraform directory with backend configuration.

Step 3. Variables

We need to define variables for values that change between environments or that should not be hardcoded. Database credentials are particularly important to handle as variables because you never want to commit passwords to version control. The sensitive flag ensures these values won't appear in Terraform's output logs.

Create a variables.tf file:

variable "db_username" {
  description = "Database administrator username"
  type        = string
  sensitive   = true
}

variable "db_password" {
  description = "Database administrator password"
  type        = string
  sensitive   = true
}

variable "environment" {
  description = "Environment name used for resource tagging and naming"
  type        = string
  default     = "prod"
}

variable "aws_region" {
  description = "AWS region for resource deployment"
  type        = string
  default     = "eu-west-2"
}

variable "instance_type" {
  description = "EC2 instance type for web servers"
  type        = string
  default     = "t3.micro"
}

variable "db_instance_class" {
  description = "RDS instance class"
  type        = string
  default     = "db.t3.micro"
}

Create a terraform.tfvars file with actual values. This file should never be committed to version control, so add it to your .gitignore:

db_username       = "admin"
db_password       = "YourSecurePasswordHere123!"
environment       = "prod"
aws_region        = "eu-west-2"
instance_type     = "t3.micro"
db_instance_class = "db.t3.micro"

Add this line to your .gitignore:

terraform.tfvars
*.tfvars
.terraform/

For GitHub Actions, we'll pass these values as GitHub Secrets instead of using a tfvars file.

Step 4. Networking Infrastructure

The networking layer is the foundation of your infrastructure. We're creating a VPC with a CIDR block that gives us over 65,000 possible IP addresses, which is more than enough for most applications. We're enabling DNS support and hostnames so that resources within the VPC can resolve each other by DNS names rather than having to use IP addresses.

We'll create two public subnets and two private subnets, with one of each type in each availability zone. The public subnets will host the load balancer and web servers, while the private subnets will host the database. By spreading resources across two availability zones, we ensure that if one entire data center goes offline, our application continues running in the other.

The Internet Gateway provides the connection point between our VPC and the internet. We'll create route tables that define how traffic flows. The public route table will direct internet-bound traffic to the Internet Gateway, while the private route table will have no internet route, keeping the database completely isolated.

Create a network.tf file:

# Fetch available availability zones in the current region
data "aws_availability_zones" "available" {
  state = "available"
}

# Main VPC - this is the container for all our networking resources
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
  }
}

# Internet Gateway provides internet access for public subnets
resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "${var.environment}-igw"
    Environment = var.environment
  }
}

# Public Subnet 1 - hosts ALB and web servers in first AZ
resource "aws_subnet" "public_1" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = data.aws_availability_zones.available.names[0]
  map_public_ip_on_launch = true

  tags = {
    Name        = "${var.environment}-public-1"
    Environment = var.environment
    Type        = "Public"
  }
}

# Public Subnet 2 - hosts ALB and web servers in second AZ
resource "aws_subnet" "public_2" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.2.0/24"
  availability_zone       = data.aws_availability_zones.available.names[1]
  map_public_ip_on_launch = true

  tags = {
    Name        = "${var.environment}-public-2"
    Environment = var.environment
    Type        = "Public"
  }
}

# Private Subnet 1 - hosts RDS in first AZ (completely isolated)
resource "aws_subnet" "private_1" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.10.0/24"
  availability_zone = data.aws_availability_zones.available.names[0]

  tags = {
    Name        = "${var.environment}-private-1"
    Environment = var.environment
    Type        = "Private"
  }
}

# Private Subnet 2 - hosts RDS in second AZ (completely isolated)
resource "aws_subnet" "private_2" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.11.0/24"
  availability_zone = data.aws_availability_zones.available.names[1]

  tags = {
    Name        = "${var.environment}-private-2"
    Environment = var.environment
    Type        = "Private"
  }
}

# Route table for public subnets - routes internet traffic to Internet Gateway
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "${var.environment}-public-rt"
    Environment = var.environment
  }
}

# Route that directs all internet-bound traffic to the Internet Gateway
resource "aws_route" "public_internet" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.igw.id
}

# Associate public subnet 1 with the public route table
resource "aws_route_table_association" "public_1" {
  subnet_id      = aws_subnet.public_1.id
  route_table_id = aws_route_table.public.id
}

# Associate public subnet 2 with the public route table
resource "aws_route_table_association" "public_2" {
  subnet_id      = aws_subnet.public_2.id
  route_table_id = aws_route_table.public.id
}

# Route table for private subnets - no internet route, completely isolated
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name        = "${var.environment}-private-rt"
    Environment = var.environment
  }
}

# Associate private subnet 1 with the private route table
resource "aws_route_table_association" "private_1" {
  subnet_id      = aws_subnet.private_1.id
  route_table_id = aws_route_table.private.id
}

# Associate private subnet 2 with the private route table
resource "aws_route_table_association" "private_2" {
  subnet_id      = aws_subnet.private_2.id
  route_table_id = aws_route_table.private.id
}

Step 5. Security Groups

Security groups act as virtual firewalls that control traffic to and from your resources. We're implementing a defense-in-depth strategy where each tier of the application can only communicate with the tiers it needs to. The load balancer accepts traffic from the internet, the web servers accept traffic only from the load balancer, and the database accepts traffic only from the web servers.

This layered security approach means that even if someone discovers the IP address of a web server, they cannot connect to it directly because the security group will reject any traffic that doesn't originate from the load balancer. Similarly, the database is completely inaccessible except from the web servers, even though it exists in the same VPC.

Create a security-groups.tf file:

# Security Group for Application Load Balancer
# Accepts HTTP and HTTPS from the internet, forwards to web servers
resource "aws_security_group" "alb" {
  name        = "${var.environment}-alb-sg"
  description = "Security group for application load balancer"
  vpc_id      = aws_vpc.main.id

  # Allow HTTP from anywhere on the internet
  ingress {
    description = "HTTP from internet"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # Allow HTTPS from anywhere on the internet
  ingress {
    description = "HTTPS from internet"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # Allow all outbound traffic so ALB can forward to web servers
  egress {
    description = "Allow all outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name        = "${var.environment}-alb-sg"
    Environment = var.environment
  }
}

# Security Group for Web Servers
# Only accepts HTTP from the load balancer, not directly from internet
resource "aws_security_group" "web" {
  name        = "${var.environment}-web-sg"
  description = "Security group for web server instances"
  vpc_id      = aws_vpc.main.id

  # Only allow HTTP from the load balancer security group
  # This prevents direct access to web servers from the internet
  ingress {
    description     = "HTTP from ALB only"
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  # Allow all outbound for package updates and external API calls
  egress {
    description = "Allow all outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name        = "${var.environment}-web-sg"
    Environment = var.environment
  }
}

# Security Group for RDS Database
# Only accepts MySQL connections from web servers
resource "aws_security_group" "database" {
  name        = "${var.environment}-db-sg"
  description = "Security group for RDS database"
  vpc_id      = aws_vpc.main.id

  # Only allow MySQL from the web server security group
  # Database is completely inaccessible from the internet
  ingress {
    description     = "MySQL from web servers only"
    from_port       = 3306
    to_port         = 3306
    protocol        = "tcp"
    security_groups = [aws_security_group.web.id]
  }

  tags = {
    Name        = "${var.environment}-db-sg"
    Environment = var.environment
  }
}

Step 6. IAM Roles for EC2

We need to create an IAM role that our EC2 instances will assume. This role grants permissions for AWS Systems Manager Session Manager, which allows you to connect to instances without needing SSH keys or opening port 22. This is a more secure approach because you don't have to manage SSH keys, and all session activity is logged in CloudTrail for audit purposes.

Create an iam.tf file:

# IAM role that EC2 instances will assume
resource "aws_iam_role" "ec2_role" {
  name = "${var.environment}-ec2-role"

  # Trust policy allowing EC2 service to assume this role
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })

  tags = {
    Name        = "${var.environment}-ec2-role"
    Environment = var.environment
  }
}

# Attach AWS-managed policy for Systems Manager access
# This allows SSM Session Manager connections without SSH
resource "aws_iam_role_policy_attachment" "ec2_ssm" {
  role       = aws_iam_role.ec2_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance profile wraps the role so it can be attached to EC2 instances
resource "aws_iam_instance_profile" "ec2_profile" {
  name = "${var.environment}-ec2-profile"
  role = aws_iam_role.ec2_role.name
}

Step 7. Load Balancer and Web Servers

Now we'll create the Application Load Balancer and the Auto Scaling Group with four web servers. The load balancer will perform health checks on each web server, and if a server fails its health check, the load balancer automatically stops sending traffic to it until it becomes healthy again.

The Auto Scaling Group will maintain exactly four instances running at all times, distributed evenly across the two availability zones. If an instance fails or is terminated, the Auto Scaling Group automatically launches a replacement. The user data script installs Apache and creates a simple HTML page that displays the hostname, allowing you to see which server is responding to each request.

Create a compute.tf file:

# Fetch the latest Ubuntu 20.04 AMI
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"] # Canonical's AWS account ID

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

# Application Load Balancer distributes traffic across web servers
resource "aws_lb" "main" {
  name               = "${var.environment}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = [aws_subnet.public_1.id, aws_subnet.public_2.id]

  enable_deletion_protection = false

  tags = {
    Name        = "${var.environment}-alb"
    Environment = var.environment
  }
}

# Target group defines the pool of web servers
resource "aws_lb_target_group" "web" {
  name     = "${var.environment}-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  # Health check configuration
  # ALB will mark instances as unhealthy if they fail these checks
  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 5
    interval            = 30
    path                = "/"
    protocol            = "HTTP"
    matcher             = "200"
  }

  tags = {
    Name        = "${var.environment}-tg"
    Environment = var.environment
  }
}

# HTTP listener on port 80
resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

# Launch template defines the configuration for EC2 instances
resource "aws_launch_template" "web" {
  name_prefix   = "${var.environment}-web-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = var.instance_type

  # Attach IAM role for SSM access
  iam_instance_profile {
    arn = aws_iam_instance_profile.ec2_profile.arn
  }

  # Enforce IMDSv2 for enhanced security
  # This prevents SSRF attacks against the instance metadata service
  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  network_interfaces {
    associate_public_ip_address = true
    security_groups             = [aws_security_group.web.id]
  }

  # User data script installs Apache and creates a simple test page
  UserData:
          Fn::Base64: !Sub |
            #!/bin/bash
            set -e
            apt-get update
            apt-get install -y apache2
            systemctl enable apache2
            systemctl start apache2
  )

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name        = "${var.environment}-web-server"
      Environment = var.environment
    }
  }
}

# Auto Scaling Group maintains 4 web servers across 2 AZs
resource "aws_autoscaling_group" "web" {
  name                = "${var.environment}-asg"
  vpc_zone_identifier = [aws_subnet.public_1.id, aws_subnet.public_2.id]
  target_group_arns   = [aws_lb_target_group.web.arn]

  # Maintain exactly 4 instances (2 per AZ)
  desired_capacity = 4
  min_size         = 4
  max_size         = 8

  # Use ELB health checks so unhealthy instances are replaced
  health_check_type         = "ELB"
  health_check_grace_period = 300

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "${var.environment}-web-instance"
    propagate_at_launch = true
  }

  tag {
    key                 = "Environment"
    value               = var.environment
    propagate_at_launch = true
  }
}

Step 8. RDS Database

The RDS database will be deployed in Multi-AZ mode, which means AWS automatically maintains a standby replica in a different availability zone. If the primary database fails, RDS promotes the standby to primary automatically, typically within sixty to one hundred twenty seconds. Your application continues working because the database endpoint DNS name stays the same, it just points to the new primary instance.

The database will be completely isolated in the private subnets with no route to the internet. It can only be accessed from the web servers through the security group rules we configured earlier.

Create a database.tf file:

# DB subnet group defines which subnets RDS can use
resource "aws_db_subnet_group" "main" {
  name       = "${var.environment}-db-subnet-group"
  subnet_ids = [aws_subnet.private_1.id, aws_subnet.private_2.id]

  tags = {
    Name        = "${var.environment}-db-subnet-group"
    Environment = var.environment
  }
}

# RDS MySQL instance with Multi-AZ for high availability
resource "aws_db_instance" "main" {
  identifier = "${var.environment}-mysql"

  # Database configuration
  engine               = "mysql"
  engine_version       = "8.0.40"
  instance_class       = var.db_instance_class
  allocated_storage    = 20
  storage_type         = "gp3"
  storage_encrypted    = true

  # Multi-AZ creates standby replica in different AZ
  multi_az = true

  # Network configuration
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.database.id]
  publicly_accessible    = false

  # Authentication
  username = var.db_username
  password = var.db_password

  # Backup configuration
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "mon:04:00-mon:05:00"

  # Disable final snapshot for easier cleanup (change for production)
  skip_final_snapshot = true

  # Enable deletion protection in production
  deletion_protection = false

  tags = {
    Name        = "${var.environment}-mysql"
    Environment = var.environment
  }
}

Step 9. Outputs

Outputs display important information after Terraform completes. We'll output the load balancer DNS name, which is the URL you'll use to access your application, and the database endpoint for connecting your application to the database.

Create an outputs.tf file:

output "alb_dns_name" {
  description = "DNS name of the Application Load Balancer"
  value       = aws_lb.main.dns_name
}

output "alb_url" {
  description = "URL to access the application"
  value       = "http://${aws_lb.main.dns_name}"
}

output "db_endpoint" {
  description = "RDS database endpoint"
  value       = aws_db_instance.main.endpoint
  sensitive   = true
}

output "db_address" {
  description = "RDS database address"
  value       = aws_db_instance.main.address
  sensitive   = true
}

output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.main.id
}

Step 10. GitHub Actions CI/CD Pipeline

Now we'll set up GitHub Actions to automatically deploy infrastructure changes. This replaces Jenkins from the original article but provides the same functionality. When you push changes to your Terraform code, GitHub Actions will automatically run terraform plan to show you what will change. When you merge a pull request to the main branch, it will automatically run terraform apply to deploy those changes.

Create .github/workflows/terraform.yml:

name: 'Terraform CI/CD'

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

env:
  TF_VERSION: '1.6.0'
  AWS_REGION: 'eu-west-2'

jobs:
  terraform:
    name: 'Terraform'
    runs-on: ubuntu-latest

    # These permissions are needed for the GitHub token
    permissions:
      contents: read
      pull-requests: write

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Terraform Format Check
        id: fmt
        run: terraform fmt -check
        continue-on-error: true

      - name: Terraform Init
        id: init
        run: terraform init -backend-config=envs/prod/backend.hcl

      - name: Terraform Validate
        id: validate
        run: terraform validate -no-color

      - name: Terraform Plan
        id: plan
        if: github.event_name == 'pull_request'
        run: |
          terraform plan -no-color -input=false \
            -var="db_username=${{ secrets.DB_USERNAME }}" \
            -var="db_password=${{ secrets.DB_PASSWORD }}"
        continue-on-error: true

      - name: Comment Plan on PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
            #### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
            #### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
            #### Terraform Plan 📖\`${{ steps.plan.outcome }}\`

            <details><summary>Show Plan</summary>

            \`\`\`terraform
            ${{ steps.plan.outputs.stdout }}
            \`\`\`

            </details>

            *Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

      - name: Terraform Plan Status
        if: steps.plan.outcome == 'failure'
        run: exit 1

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: |
          terraform apply -auto-approve -input=false \
            -var="db_username=${{ secrets.DB_USERNAME }}" \
            -var="db_password=${{ secrets.DB_PASSWORD }}"

Create a destroy workflow at .github/workflows/terraform-destroy.yml:

name: 'Terraform Destroy'

on:
  workflow_dispatch:
    inputs:
      confirm:
        description: 'Type "destroy" to confirm'
        required: true

jobs:
  destroy:
    name: 'Destroy Infrastructure'
    runs-on: ubuntu-latest

    steps:
      - name: Verify Confirmation
        if: github.event.inputs.confirm != 'destroy'
        run: |
          echo "Confirmation failed. You must type 'destroy' to proceed."
          exit 1

      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: '1.6.0'

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: eu-west-2

      - name: Terraform Init
        run: terraform init -backend-config=envs/prod/backend.hcl

      - name: Terraform Destroy
        run: |
          terraform destroy -auto-approve -input=false \
            -var="db_username=${{ secrets.DB_USERNAME }}" \
            -var="db_password=${{ secrets.DB_PASSWORD }}"

Setting Up GitHub Secrets

In your GitHub repository, go to Settings → Secrets and variables → Actions, and add these secrets:

AWS_ACCESS_KEY_ID: Your AWS access key
AWS_SECRET_ACCESS_KEY: Your AWS secret key
DB_USERNAME: Database admin username (e.g., "admin")
DB_PASSWORD: Database admin password

These secrets allow GitHub Actions to deploy infrastructure on your behalf without exposing credentials in your code.

Step 11. Deployment Process

Now that everything is set up, here's how to deploy your infrastructure. First, you'll create the remote state backend locally, then push your code to GitHub where the automated pipeline takes over.

Initial Setup

First, create the state backend infrastructure:

# Create the bootstrap file and run it
terraform init
terraform apply

After the S3 bucket and DynamoDB table are created, update your main configuration to use the remote backend:

# Initialize with remote backend
terraform init -backend-config=envs/prod/backend.hcl

Terraform will ask if you want to migrate your local state to S3. Type "yes" to proceed.

Deploy via GitHub Actions

Commit and push your code to GitHub:

git add .
git commit -m "Initial infrastructure setup"
git push origin main

GitHub Actions will automatically run terraform plan and terraform apply. You can watch the progress in the Actions tab of your repository.

Making Changes

To make infrastructure changes, create a feature branch:

git checkout -b update-instance-type
# Make your changes to the Terraform files
git add .
git commit -m "Update instance type to t3.small"
git push origin update-instance-type

Create a pull request on GitHub. The GitHub Actions workflow will automatically run terraform plan and post the results as a comment on your PR. Review the plan to see exactly what will change. If everything looks good, merge the pull request. GitHub Actions will automatically run terraform apply to deploy your changes.

Monitoring Deployments

You can view deployment progress in real-time by going to the Actions tab in your GitHub repository. Each workflow run shows all the steps and their outputs. If a deployment fails, you can see the exact error message and debug from there.

Step 12. Testing Your Infrastructure

Once deployment completes, you can test your infrastructure. Get the load balancer URL from the Terraform outputs:

terraform output alb_url

Visit that URL in your browser. You should see the custom welcome page showing the instance ID and availability zone. Refresh the page multiple times and you'll notice the instance ID changes as the load balancer distributes requests across your four web servers.

To verify Multi-AZ deployment:

# Check Auto Scaling Group distribution
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names prod-asg \
  --query 'AutoScalingGroups[0].Instances[*].[InstanceId,AvailabilityZone]' \
  --output table

# Check RDS Multi-AZ status
aws rds describe-db-instances \
  --db-instance-identifier prod-mysql \
  --query 'DBInstances[0].MultiAZ'

You should see two instances in each availability zone, and the RDS Multi-AZ status should return true.

Deployed Architecture Overview

Troubleshooting Common Issues

Let me walk you through solutions to common problems you might encounter. If your web servers aren't showing up as healthy in the load balancer target group, first check that Apache is actually running on the instances. Connect via Systems Manager Session Manager and run systemctl status apache2 to verify. Check the security group rules to ensure the web server security group allows traffic from the load balancer security group on port eighty.

If you can't connect to the database from your web servers, verify the security group rules allow MySQL traffic from the web server security group. Check that the database is in the available state using the RDS console. Verify that the web servers can resolve the database endpoint DNS name. Test connectivity using telnet or nc to the database endpoint on port 3306.

If Terraform apply fails with state locking errors, someone else might be running Terraform at the same time. Wait for their operation to complete. If Terraform crashed and left a stale lock, you can forcefully unlock using terraform force-unlock followed by the lock ID shown in the error message. Use this carefully because unlocking while someone else is actively making changes can corrupt your state.

If instances launch but immediately fail health checks, check the user data script logs at /var/log/cloud-init-output.log on the instance. The user data script might be failing, preventing Apache from starting. Verify that the instance can reach the internet to download packages by checking the route tables and internet gateway attachment.

If your GitHub Actions pipeline fails, check that you've configured all the required secrets in GitHub. Verify that the AWS credentials have sufficient permissions to create all the resources. Check the Actions logs for specific error messages that will point you to the problem.

Conclusion

You've now built a production-grade, highly available infrastructure on AWS using Terraform and GitHub Actions. This infrastructure can handle availability zone failures gracefully, automatically scales to meet demand, and deploys changes through an automated pipeline. The four Apache web servers distributed across two availability zones ensure your application remains available even when problems occur.

The Multi-AZ RDS database provides automatic failover if the primary database fails, and the remote state management with S3 and DynamoDB ensures your team can collaborate safely on infrastructure changes. The GitHub Actions pipeline replaces manual terraform apply commands with automated deployments that happen consistently every time.

This foundation gives you a solid starting point that you can evolve as your needs grow.

If you would prefer to use Jenkins as the CI/CD tool, check this out

DEV Community