Introduction
Availability, repeatability, and security are table stakes for production workloads. This guide provisions a baseline AWS stack with Terraform that's resilient to AZ failures, enforces least-privilege boundaries, integrates with CI/CD, and manages Terraform state with an S3 and DynamoDB backend. We'll deploy four Apache web servers across two availability zones with a Multi-AZ RDS database, all automated through GitHub Actions.
Why This Matters
Incidents rarely happen at convenient times. You want deterministic deployments, blast-radius isolation, and multi-AZ redundancy so failures degrade gracefully. When one availability zone experiences issues, your application continues running on servers in the healthy zone without any manual intervention.
Equally important is proper Terraform state management. State must be remote rather than stored on a laptop, it must be locked to prevent concurrent modifications, it must be encrypted and versioned for security and recovery, and it should be accessible via IAM rather than passed around in Slack or email. This becomes critical when working in teams or using automated CI/CD pipelines because everyone needs access to the same source of truth about what infrastructure exists.
This article covers both infrastructure resources and state management so you can run with confidence. We'll also set up a complete GitHub Actions pipeline that automatically detects changes to your Terraform code and deploys them, replacing the manual apply process with automated continuous deployment.
What We're Building
The architecture we're building includes these components working together to create a highly available system. We'll provision a VPC with public and private subnets spread across two availability zones in the London region. In the public subnets, we'll deploy four EC2 instances running Apache web servers, with two instances in each availability zone. An Application Load Balancer will distribute incoming traffic across these four servers, automatically routing requests away from any unhealthy instances.
For the database tier, we'll create a Multi-AZ RDS MySQL instance that automatically maintains a standby replica in a different availability zone. If the primary database fails, RDS automatically promotes the standby to become the new primary without requiring any code changes. The database will live in private subnets with no internet access, protected by security groups that only allow connections from the web servers.
For state management, we'll configure an S3 bucket with versioning and encryption to store the Terraform state file, along with a DynamoDB table that provides locking to prevent multiple people or automation pipelines from modifying the infrastructure simultaneously. Finally, we'll set up GitHub Actions workflows that automatically run terraform plan on pull requests so you can review changes, and terraform apply when changes merge to the main branch, giving you the same automation benefits that Jenkins provides but using GitHub's native platform.
Here's what we're deploying:
- Networking: VPC, 2 public subnets, 2 private subnets, Internet Gateway, route tables
- Compute: 4 Apache web servers across 2 availability zones in an Auto Scaling Group
- Ingress: Application Load Balancer with health checks and automatic failover
- Data: RDS MySQL Multi-AZ in isolated private subnets
- Security: Security groups scoped per role, encrypted storage, IMDSv2 enforcement
- State: S3 remote state with DynamoDB locking, versioned and encrypted
- CI/CD: GitHub Actions pipeline for automated terraform plan and apply
- Observability: CloudWatch metrics and alarms
Prerequisites
Before starting, you'll need Terraform version 1.6 or higher installed on your local machine. You'll also need the AWS CLI configured with an IAM user or role that has permissions to create VPC, EC2, RDS, and S3 resources. While we'll create the S3 bucket and DynamoDB table for state management as our first step, you'll need initial AWS credentials to bootstrap that infrastructure.
You should also have a GitHub account and a repository where you'll store your Terraform code. The GitHub Actions workflows will run directly in your repository, so you'll need to configure AWS credentials as GitHub Secrets to allow the automation to deploy infrastructure on your behalf.
Step 1. Remote State Backend (S3 + DynamoDB)
The first thing we need to do is create the infrastructure that will manage our Terraform state. This is a one-time bootstrap process. We're creating an S3 bucket to store the state file and a DynamoDB table to provide state locking. The bucket will have versioning enabled so you can recover from accidental deletions or corrupted state, and we'll enforce encryption at rest using AES256. We're also blocking all public access to ensure the state file, which may contain sensitive information like database passwords, remains private.
The DynamoDB table uses on-demand billing so you only pay for the lock operations that actually occur, which is minimal. Terraform will write a lock entry to this table whenever someone runs an apply or plan operation, preventing others from making concurrent changes that could corrupt your infrastructure.
Create a file called backend-bootstrap.tf
:
# This is a one-time setup file to create the S3 bucket and DynamoDB table
# After running this once, you can delete this file or move it to a separate directory
provider "aws" {
region = "eu-west-2"
}
# S3 bucket to store Terraform state files
resource "aws_s3_bucket" "state" {
bucket = "tf-state-prod-stack-eu-west-2"
# Prevent accidental deletion of the state bucket
lifecycle {
prevent_destroy = true
}
}
# Enable versioning so we can recover from bad state changes
resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.state.id
versioning_configuration {
status = "Enabled"
}
}
# Encrypt state files at rest for security
resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
bucket = aws_s3_bucket.state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
# Block all public access to the state bucket
resource "aws_s3_bucket_public_access_block" "state" {
bucket = aws_s3_bucket.state.id
block_public_acls = true
block_public_policy = true
restrict_public_buckets = true
ignore_public_acls = true
}
# DynamoDB table for state locking to prevent concurrent modifications
resource "aws_dynamodb_table" "lock" {
name = "tf-state-locks-prod-stack"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
Run this bootstrap process once:
terraform init
terraform apply
After the bucket and table are created, you can delete this bootstrap file or move it to a separate directory. The state infrastructure is now ready to use.
Step 2. Backend Configuration
Now that we have our state storage infrastructure, we need to configure Terraform to use it. We'll create a backend configuration file for each environment. This separation allows you to have different state files for development, staging, and production environments, preventing changes in one environment from affecting others.
Create a directory structure for your environments and add a backend configuration file. For production, create envs/prod/backend.hcl
:
bucket = "tf-state-prod-stack-eu-west-2"
key = "envs/prod/global.tfstate"
region = "eu-west-2"
dynamodb_table = "tf-state-locks-prod-stack"
encrypt = true
In your main Terraform directory, create a main.tf
file and add the backend configuration block. Notice that we don't specify the actual bucket name here because we'll pass that in via the backend config file. This allows us to use the same Terraform code across multiple environments:
terraform {
# Require Terraform version 1.6 or higher
required_version = ">= 1.6"
# Backend configuration for remote state storage
backend "s3" {}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "eu-west-2"
}
Initialize Terraform with the backend configuration:
terraform init -backend-config=envs/prod/backend.hcl
Terraform will now store its state remotely in S3 and use DynamoDB for locking. You can verify this worked by checking that Terraform created a .terraform
directory with backend configuration.
Step 3. Variables
We need to define variables for values that change between environments or that should not be hardcoded. Database credentials are particularly important to handle as variables because you never want to commit passwords to version control. The sensitive flag ensures these values won't appear in Terraform's output logs.
Create a variables.tf
file:
variable "db_username" {
description = "Database administrator username"
type = string
sensitive = true
}
variable "db_password" {
description = "Database administrator password"
type = string
sensitive = true
}
variable "environment" {
description = "Environment name used for resource tagging and naming"
type = string
default = "prod"
}
variable "aws_region" {
description = "AWS region for resource deployment"
type = string
default = "eu-west-2"
}
variable "instance_type" {
description = "EC2 instance type for web servers"
type = string
default = "t3.micro"
}
variable "db_instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.micro"
}
Create a terraform.tfvars
file with actual values. This file should never be committed to version control, so add it to your .gitignore
:
db_username = "admin"
db_password = "YourSecurePasswordHere123!"
environment = "prod"
aws_region = "eu-west-2"
instance_type = "t3.micro"
db_instance_class = "db.t3.micro"
Add this line to your .gitignore
:
terraform.tfvars
*.tfvars
.terraform/
For GitHub Actions, we'll pass these values as GitHub Secrets instead of using a tfvars file.
Step 4. Networking Infrastructure
The networking layer is the foundation of your infrastructure. We're creating a VPC with a CIDR block that gives us over 65,000 possible IP addresses, which is more than enough for most applications. We're enabling DNS support and hostnames so that resources within the VPC can resolve each other by DNS names rather than having to use IP addresses.
We'll create two public subnets and two private subnets, with one of each type in each availability zone. The public subnets will host the load balancer and web servers, while the private subnets will host the database. By spreading resources across two availability zones, we ensure that if one entire data center goes offline, our application continues running in the other.
The Internet Gateway provides the connection point between our VPC and the internet. We'll create route tables that define how traffic flows. The public route table will direct internet-bound traffic to the Internet Gateway, while the private route table will have no internet route, keeping the database completely isolated.
Create a network.tf
file:
# Fetch available availability zones in the current region
data "aws_availability_zones" "available" {
state = "available"
}
# Main VPC - this is the container for all our networking resources
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
}
}
# Internet Gateway provides internet access for public subnets
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-igw"
Environment = var.environment
}
}
# Public Subnet 1 - hosts ALB and web servers in first AZ
resource "aws_subnet" "public_1" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = data.aws_availability_zones.available.names[0]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-1"
Environment = var.environment
Type = "Public"
}
}
# Public Subnet 2 - hosts ALB and web servers in second AZ
resource "aws_subnet" "public_2" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
availability_zone = data.aws_availability_zones.available.names[1]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-2"
Environment = var.environment
Type = "Public"
}
}
# Private Subnet 1 - hosts RDS in first AZ (completely isolated)
resource "aws_subnet" "private_1" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.10.0/24"
availability_zone = data.aws_availability_zones.available.names[0]
tags = {
Name = "${var.environment}-private-1"
Environment = var.environment
Type = "Private"
}
}
# Private Subnet 2 - hosts RDS in second AZ (completely isolated)
resource "aws_subnet" "private_2" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.11.0/24"
availability_zone = data.aws_availability_zones.available.names[1]
tags = {
Name = "${var.environment}-private-2"
Environment = var.environment
Type = "Private"
}
}
# Route table for public subnets - routes internet traffic to Internet Gateway
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-public-rt"
Environment = var.environment
}
}
# Route that directs all internet-bound traffic to the Internet Gateway
resource "aws_route" "public_internet" {
route_table_id = aws_route_table.public.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
# Associate public subnet 1 with the public route table
resource "aws_route_table_association" "public_1" {
subnet_id = aws_subnet.public_1.id
route_table_id = aws_route_table.public.id
}
# Associate public subnet 2 with the public route table
resource "aws_route_table_association" "public_2" {
subnet_id = aws_subnet.public_2.id
route_table_id = aws_route_table.public.id
}
# Route table for private subnets - no internet route, completely isolated
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-private-rt"
Environment = var.environment
}
}
# Associate private subnet 1 with the private route table
resource "aws_route_table_association" "private_1" {
subnet_id = aws_subnet.private_1.id
route_table_id = aws_route_table.private.id
}
# Associate private subnet 2 with the private route table
resource "aws_route_table_association" "private_2" {
subnet_id = aws_subnet.private_2.id
route_table_id = aws_route_table.private.id
}
Step 5. Security Groups
Security groups act as virtual firewalls that control traffic to and from your resources. We're implementing a defense-in-depth strategy where each tier of the application can only communicate with the tiers it needs to. The load balancer accepts traffic from the internet, the web servers accept traffic only from the load balancer, and the database accepts traffic only from the web servers.
This layered security approach means that even if someone discovers the IP address of a web server, they cannot connect to it directly because the security group will reject any traffic that doesn't originate from the load balancer. Similarly, the database is completely inaccessible except from the web servers, even though it exists in the same VPC.
Create a security-groups.tf
file:
# Security Group for Application Load Balancer
# Accepts HTTP and HTTPS from the internet, forwards to web servers
resource "aws_security_group" "alb" {
name = "${var.environment}-alb-sg"
description = "Security group for application load balancer"
vpc_id = aws_vpc.main.id
# Allow HTTP from anywhere on the internet
ingress {
description = "HTTP from internet"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow HTTPS from anywhere on the internet
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow all outbound traffic so ALB can forward to web servers
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.environment}-alb-sg"
Environment = var.environment
}
}
# Security Group for Web Servers
# Only accepts HTTP from the load balancer, not directly from internet
resource "aws_security_group" "web" {
name = "${var.environment}-web-sg"
description = "Security group for web server instances"
vpc_id = aws_vpc.main.id
# Only allow HTTP from the load balancer security group
# This prevents direct access to web servers from the internet
ingress {
description = "HTTP from ALB only"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
# Allow all outbound for package updates and external API calls
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.environment}-web-sg"
Environment = var.environment
}
}
# Security Group for RDS Database
# Only accepts MySQL connections from web servers
resource "aws_security_group" "database" {
name = "${var.environment}-db-sg"
description = "Security group for RDS database"
vpc_id = aws_vpc.main.id
# Only allow MySQL from the web server security group
# Database is completely inaccessible from the internet
ingress {
description = "MySQL from web servers only"
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.web.id]
}
tags = {
Name = "${var.environment}-db-sg"
Environment = var.environment
}
}
Step 6. IAM Roles for EC2
We need to create an IAM role that our EC2 instances will assume. This role grants permissions for AWS Systems Manager Session Manager, which allows you to connect to instances without needing SSH keys or opening port 22. This is a more secure approach because you don't have to manage SSH keys, and all session activity is logged in CloudTrail for audit purposes.
Create an iam.tf
file:
# IAM role that EC2 instances will assume
resource "aws_iam_role" "ec2_role" {
name = "${var.environment}-ec2-role"
# Trust policy allowing EC2 service to assume this role
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
tags = {
Name = "${var.environment}-ec2-role"
Environment = var.environment
}
}
# Attach AWS-managed policy for Systems Manager access
# This allows SSM Session Manager connections without SSH
resource "aws_iam_role_policy_attachment" "ec2_ssm" {
role = aws_iam_role.ec2_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# Instance profile wraps the role so it can be attached to EC2 instances
resource "aws_iam_instance_profile" "ec2_profile" {
name = "${var.environment}-ec2-profile"
role = aws_iam_role.ec2_role.name
}
Step 7. Load Balancer and Web Servers
Now we'll create the Application Load Balancer and the Auto Scaling Group with four web servers. The load balancer will perform health checks on each web server, and if a server fails its health check, the load balancer automatically stops sending traffic to it until it becomes healthy again.
The Auto Scaling Group will maintain exactly four instances running at all times, distributed evenly across the two availability zones. If an instance fails or is terminated, the Auto Scaling Group automatically launches a replacement. The user data script installs Apache and creates a simple HTML page that displays the hostname, allowing you to see which server is responding to each request.
Create a compute.tf
file:
# Fetch the latest Ubuntu 20.04 AMI
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical's AWS account ID
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# Application Load Balancer distributes traffic across web servers
resource "aws_lb" "main" {
name = "${var.environment}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = [aws_subnet.public_1.id, aws_subnet.public_2.id]
enable_deletion_protection = false
tags = {
Name = "${var.environment}-alb"
Environment = var.environment
}
}
# Target group defines the pool of web servers
resource "aws_lb_target_group" "web" {
name = "${var.environment}-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id
# Health check configuration
# ALB will mark instances as unhealthy if they fail these checks
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 2
timeout = 5
interval = 30
path = "/"
protocol = "HTTP"
matcher = "200"
}
tags = {
Name = "${var.environment}-tg"
Environment = var.environment
}
}
# HTTP listener on port 80
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = 80
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.web.arn
}
}
# Launch template defines the configuration for EC2 instances
resource "aws_launch_template" "web" {
name_prefix = "${var.environment}-web-"
image_id = data.aws_ami.ubuntu.id
instance_type = var.instance_type
# Attach IAM role for SSM access
iam_instance_profile {
arn = aws_iam_instance_profile.ec2_profile.arn
}
# Enforce IMDSv2 for enhanced security
# This prevents SSRF attacks against the instance metadata service
metadata_options {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 1
}
network_interfaces {
associate_public_ip_address = true
security_groups = [aws_security_group.web.id]
}
# User data script installs Apache and creates a simple test page
UserData:
Fn::Base64: !Sub |
#!/bin/bash
set -e
apt-get update
apt-get install -y apache2
systemctl enable apache2
systemctl start apache2
)
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.environment}-web-server"
Environment = var.environment
}
}
}
# Auto Scaling Group maintains 4 web servers across 2 AZs
resource "aws_autoscaling_group" "web" {
name = "${var.environment}-asg"
vpc_zone_identifier = [aws_subnet.public_1.id, aws_subnet.public_2.id]
target_group_arns = [aws_lb_target_group.web.arn]
# Maintain exactly 4 instances (2 per AZ)
desired_capacity = 4
min_size = 4
max_size = 8
# Use ELB health checks so unhealthy instances are replaced
health_check_type = "ELB"
health_check_grace_period = 300
launch_template {
id = aws_launch_template.web.id
version = "$Latest"
}
tag {
key = "Name"
value = "${var.environment}-web-instance"
propagate_at_launch = true
}
tag {
key = "Environment"
value = var.environment
propagate_at_launch = true
}
}
Step 8. RDS Database
The RDS database will be deployed in Multi-AZ mode, which means AWS automatically maintains a standby replica in a different availability zone. If the primary database fails, RDS promotes the standby to primary automatically, typically within sixty to one hundred twenty seconds. Your application continues working because the database endpoint DNS name stays the same, it just points to the new primary instance.
The database will be completely isolated in the private subnets with no route to the internet. It can only be accessed from the web servers through the security group rules we configured earlier.
Create a database.tf
file:
# DB subnet group defines which subnets RDS can use
resource "aws_db_subnet_group" "main" {
name = "${var.environment}-db-subnet-group"
subnet_ids = [aws_subnet.private_1.id, aws_subnet.private_2.id]
tags = {
Name = "${var.environment}-db-subnet-group"
Environment = var.environment
}
}
# RDS MySQL instance with Multi-AZ for high availability
resource "aws_db_instance" "main" {
identifier = "${var.environment}-mysql"
# Database configuration
engine = "mysql"
engine_version = "8.0.40"
instance_class = var.db_instance_class
allocated_storage = 20
storage_type = "gp3"
storage_encrypted = true
# Multi-AZ creates standby replica in different AZ
multi_az = true
# Network configuration
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.database.id]
publicly_accessible = false
# Authentication
username = var.db_username
password = var.db_password
# Backup configuration
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "mon:04:00-mon:05:00"
# Disable final snapshot for easier cleanup (change for production)
skip_final_snapshot = true
# Enable deletion protection in production
deletion_protection = false
tags = {
Name = "${var.environment}-mysql"
Environment = var.environment
}
}
Step 9. Outputs
Outputs display important information after Terraform completes. We'll output the load balancer DNS name, which is the URL you'll use to access your application, and the database endpoint for connecting your application to the database.
Create an outputs.tf
file:
output "alb_dns_name" {
description = "DNS name of the Application Load Balancer"
value = aws_lb.main.dns_name
}
output "alb_url" {
description = "URL to access the application"
value = "http://${aws_lb.main.dns_name}"
}
output "db_endpoint" {
description = "RDS database endpoint"
value = aws_db_instance.main.endpoint
sensitive = true
}
output "db_address" {
description = "RDS database address"
value = aws_db_instance.main.address
sensitive = true
}
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
Step 10. GitHub Actions CI/CD Pipeline
Now we'll set up GitHub Actions to automatically deploy infrastructure changes. This replaces Jenkins from the original article but provides the same functionality. When you push changes to your Terraform code, GitHub Actions will automatically run terraform plan to show you what will change. When you merge a pull request to the main branch, it will automatically run terraform apply to deploy those changes.
Create .github/workflows/terraform.yml
:
name: 'Terraform CI/CD'
on:
push:
branches:
- main
pull_request:
branches:
- main
env:
TF_VERSION: '1.6.0'
AWS_REGION: 'eu-west-2'
jobs:
terraform:
name: 'Terraform'
runs-on: ubuntu-latest
# These permissions are needed for the GitHub token
permissions:
contents: read
pull-requests: write
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Terraform Format Check
id: fmt
run: terraform fmt -check
continue-on-error: true
- name: Terraform Init
id: init
run: terraform init -backend-config=envs/prod/backend.hcl
- name: Terraform Validate
id: validate
run: terraform validate -no-color
- name: Terraform Plan
id: plan
if: github.event_name == 'pull_request'
run: |
terraform plan -no-color -input=false \
-var="db_username=${{ secrets.DB_USERNAME }}" \
-var="db_password=${{ secrets.DB_PASSWORD }}"
continue-on-error: true
- name: Comment Plan on PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const output = `#### Terraform Format and Style ๐\`${{ steps.fmt.outcome }}\`
#### Terraform Initialization โ๏ธ\`${{ steps.init.outcome }}\`
#### Terraform Validation ๐ค\`${{ steps.validate.outcome }}\`
#### Terraform Plan ๐\`${{ steps.plan.outcome }}\`
<details><summary>Show Plan</summary>
\`\`\`terraform
${{ steps.plan.outputs.stdout }}
\`\`\`
</details>
*Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
- name: Terraform Plan Status
if: steps.plan.outcome == 'failure'
run: exit 1
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: |
terraform apply -auto-approve -input=false \
-var="db_username=${{ secrets.DB_USERNAME }}" \
-var="db_password=${{ secrets.DB_PASSWORD }}"
Create a destroy workflow at .github/workflows/terraform-destroy.yml
:
name: 'Terraform Destroy'
on:
workflow_dispatch:
inputs:
confirm:
description: 'Type "destroy" to confirm'
required: true
jobs:
destroy:
name: 'Destroy Infrastructure'
runs-on: ubuntu-latest
steps:
- name: Verify Confirmation
if: github.event.inputs.confirm != 'destroy'
run: |
echo "Confirmation failed. You must type 'destroy' to proceed."
exit 1
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: '1.6.0'
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-west-2
- name: Terraform Init
run: terraform init -backend-config=envs/prod/backend.hcl
- name: Terraform Destroy
run: |
terraform destroy -auto-approve -input=false \
-var="db_username=${{ secrets.DB_USERNAME }}" \
-var="db_password=${{ secrets.DB_PASSWORD }}"
Setting Up GitHub Secrets
In your GitHub repository, go to Settings โ Secrets and variables โ Actions, and add these secrets:
-
AWS_ACCESS_KEY_ID
: Your AWS access key -
AWS_SECRET_ACCESS_KEY
: Your AWS secret key -
DB_USERNAME
: Database admin username (e.g., "admin") -
DB_PASSWORD
: Database admin password
These secrets allow GitHub Actions to deploy infrastructure on your behalf without exposing credentials in your code.
Step 11. Deployment Process
Now that everything is set up, here's how to deploy your infrastructure. First, you'll create the remote state backend locally, then push your code to GitHub where the automated pipeline takes over.
Initial Setup
First, create the state backend infrastructure:
# Create the bootstrap file and run it
terraform init
terraform apply
After the S3 bucket and DynamoDB table are created, update your main configuration to use the remote backend:
# Initialize with remote backend
terraform init -backend-config=envs/prod/backend.hcl
Terraform will ask if you want to migrate your local state to S3. Type "yes" to proceed.
Deploy via GitHub Actions
Commit and push your code to GitHub:
git add .
git commit -m "Initial infrastructure setup"
git push origin main
GitHub Actions will automatically run terraform plan and terraform apply. You can watch the progress in the Actions tab of your repository.
Making Changes
To make infrastructure changes, create a feature branch:
git checkout -b update-instance-type
# Make your changes to the Terraform files
git add .
git commit -m "Update instance type to t3.small"
git push origin update-instance-type
Create a pull request on GitHub. The GitHub Actions workflow will automatically run terraform plan and post the results as a comment on your PR. Review the plan to see exactly what will change. If everything looks good, merge the pull request. GitHub Actions will automatically run terraform apply to deploy your changes.
Monitoring Deployments
You can view deployment progress in real-time by going to the Actions tab in your GitHub repository. Each workflow run shows all the steps and their outputs. If a deployment fails, you can see the exact error message and debug from there.
Step 12. Testing Your Infrastructure
Once deployment completes, you can test your infrastructure. Get the load balancer URL from the Terraform outputs:
terraform output alb_url
Visit that URL in your browser. You should see the custom welcome page showing the instance ID and availability zone. Refresh the page multiple times and you'll notice the instance ID changes as the load balancer distributes requests across your four web servers.
To verify Multi-AZ deployment:
# Check Auto Scaling Group distribution
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names prod-asg \
--query 'AutoScalingGroups[0].Instances[*].[InstanceId,AvailabilityZone]' \
--output table
# Check RDS Multi-AZ status
aws rds describe-db-instances \
--db-instance-identifier prod-mysql \
--query 'DBInstances[0].MultiAZ'
You should see two instances in each availability zone, and the RDS Multi-AZ status should return true.
Deployed Architecture Overview
Troubleshooting Common Issues
Let me walk you through solutions to common problems you might encounter. If your web servers aren't showing up as healthy in the load balancer target group, first check that Apache is actually running on the instances. Connect via Systems Manager Session Manager and run systemctl status apache2 to verify. Check the security group rules to ensure the web server security group allows traffic from the load balancer security group on port eighty.
If you can't connect to the database from your web servers, verify the security group rules allow MySQL traffic from the web server security group. Check that the database is in the available state using the RDS console. Verify that the web servers can resolve the database endpoint DNS name. Test connectivity using telnet or nc to the database endpoint on port 3306.
If Terraform apply fails with state locking errors, someone else might be running Terraform at the same time. Wait for their operation to complete. If Terraform crashed and left a stale lock, you can forcefully unlock using terraform force-unlock followed by the lock ID shown in the error message. Use this carefully because unlocking while someone else is actively making changes can corrupt your state.
If instances launch but immediately fail health checks, check the user data script logs at /var/log/cloud-init-output.log on the instance. The user data script might be failing, preventing Apache from starting. Verify that the instance can reach the internet to download packages by checking the route tables and internet gateway attachment.
If your GitHub Actions pipeline fails, check that you've configured all the required secrets in GitHub. Verify that the AWS credentials have sufficient permissions to create all the resources. Check the Actions logs for specific error messages that will point you to the problem.
Conclusion
You've now built a production-grade, highly available infrastructure on AWS using Terraform and GitHub Actions. This infrastructure can handle availability zone failures gracefully, automatically scales to meet demand, and deploys changes through an automated pipeline. The four Apache web servers distributed across two availability zones ensure your application remains available even when problems occur.
The Multi-AZ RDS database provides automatic failover if the primary database fails, and the remote state management with S3 and DynamoDB ensures your team can collaborate safely on infrastructure changes. The GitHub Actions pipeline replaces manual terraform apply commands with automated deployments that happen consistently every time.
This foundation gives you a solid starting point that you can evolve as your needs grow.
If you would prefer to use Jenkins as the CI/CD tool, check this out
Top comments (0)