DEV Community

Mukami
Mukami

Posted on

Automating Terraform Testing: From Unit Tests to End-to-End Validation

How to Stop Wondering If Your Infrastructure Works and Start Knowing It Does


Day 18 of the 30-Day Terraform Challenge — and today I finally solved the problem that's been bothering me since Day 1.

How do you know your infrastructure actually works?

Manual testing gave me confidence, but it didn't scale. Every change meant re-running the same checks. Every environment meant more time. Every team member meant more coordination.

Today I automated everything.


The Three Layers of Testing

Test Type Tool Deploys Real Infra Time Cost
Unit terraform test No Seconds Free
Integration Terratest Yes Minutes Low
End-to-End Terratest Yes 15-30 min Medium

Each layer catches different failures. Together, they create confidence.


Layer 1: Unit Tests (Fast, Free, No AWS)

Terraform 1.6+ includes a native testing framework. No external dependencies. No real infrastructure deployed. Just plan-time assertions.

# webserver_cluster_test.tftest.hcl

variables {
  cluster_name  = "test-cluster"
  instance_type = "t3.micro"
  environment   = "dev"
}

run "validate_asg_name" {
  command = plan

  assert {
    condition     = can(regex("^test-cluster-asg-", aws_autoscaling_group.web.name_prefix))
    error_message = "ASG name prefix must start with cluster_name"
  }
}

run "validate_instance_type" {
  command = plan

  assert {
    condition     = aws_launch_template.web.instance_type == "t3.micro"
    error_message = "Instance type must match variable"
  }
}

run "validate_tags" {
  command = plan

  assert {
    condition     = aws_lb.web.tags["Environment"] == "dev"
    error_message = "ALB must have Environment tag = dev"
  }
}
Enter fullscreen mode Exit fullscreen mode

Run with: terraform test

What it catches: Syntax errors, naming conventions, tag consistency, logic mistakes.

What it doesn't catch: DNS propagation, health check failures, actual HTTP responses.


Layer 2: Integration Tests (Real Infra, Real Assertions)

Integration tests deploy real infrastructure, run assertions against it, then destroy it.

// test/webserver_cluster_test.go
func TestWebserverClusterIntegration(t *testing.T) {
  t.Parallel()

  uniqueID := random.UniqueId()
  clusterName := fmt.Sprintf("test-cluster-%s", uniqueID)

  terraformOptions := &terraform.Options{
    TerraformDir: "../manual-test",
    Vars: map[string]interface{}{
      "cluster_name":  clusterName,
      "instance_type": "t3.micro",
      "min_size":      1,
      "max_size":      2,
      "environment":   "dev",
    },
  }

  // CRITICAL: Always destroy, even if test fails
  defer terraform.Destroy(t, terraformOptions)

  terraform.InitAndApply(t, terraformOptions)

  albDnsName := terraform.Output(t, terraformOptions, "alb_dns_name")
  url := fmt.Sprintf("http://%s", albDnsName)

  // Retry for 5 minutes (ALB takes time)
  http_helper.HttpGetWithRetryWithCustomValidation(
    t, url, nil, 30, 10*time.Second,
    func(status int, body string) bool {
      return status == 200
    },
  )
}
Enter fullscreen mode Exit fullscreen mode

Run with: go test -v -timeout 30m ./...

What it catches: ALB DNS resolution, health check passing, actual HTTP responses, deployment ordering.

The critical piece: defer terraform.Destroy ensures cleanup even if tests fail. No orphaned resources. No surprise AWS bills.


Layer 3: End-to-End Tests (Full Stack)

E2E tests deploy everything — VPC, database, application — and verify the whole system works.

func TestFullStackEndToEnd(t *testing.T) {
  t.Parallel()
  uniqueID := random.UniqueId()

  // Deploy VPC
  vpcOptions := &terraform.Options{
    TerraformDir: "../modules/networking/vpc",
    Vars: map[string]interface{}{
      "vpc_name": fmt.Sprintf("test-vpc-%s", uniqueID),
    },
  }
  defer terraform.Destroy(t, vpcOptions)
  terraform.InitAndApply(t, vpcOptions)

  vpcID := terraform.Output(t, vpcOptions, "vpc_id")
  subnetIDs := terraform.OutputList(t, vpcOptions, "private_subnet_ids")

  // Deploy app using VPC outputs
  appOptions := &terraform.Options{
    TerraformDir: "../modules/services/webserver-cluster",
    Vars: map[string]interface{}{
      "cluster_name": fmt.Sprintf("test-app-%s", uniqueID),
      "vpc_id":       vpcID,
      "subnet_ids":   subnetIDs,
    },
  }
  defer terraform.Destroy(t, appOptions)
  terraform.InitAndApply(t, appOptions)

  albDnsName := terraform.Output(t, appOptions, "alb_dns_name")
  http_helper.HttpGetWithRetry(t, fmt.Sprintf("http://%s", albDnsName), nil, 200, "Hello", 30, 10*time.Second)
}
Enter fullscreen mode Exit fullscreen mode

What it catches: Cross-module integration issues, networking problems, full stack failures that unit and integration tests miss.


The CI/CD Pipeline

Run everything automatically on every commit:

name: Terraform Tests

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  unit-tests:
    name: Unit Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init && terraform test
        working-directory: manual-test

  integration-tests:
    name: Integration Tests
    runs-on: ubuntu-latest
    if: github.event_name == 'push'  # Only on merge to main
    needs: unit-tests
    env:
      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v4
        with: { go-version: "1.21" }
      - run: go test -v -timeout 30m ./...
        working-directory: test
Enter fullscreen mode Exit fullscreen mode

Job dependencies:

  • Unit tests run on every PR (fast, cheap)
  • Integration tests only run on merge to main (slower, costs money)
  • E2E tests run on schedule (once a day)

Why This Matters

Before automation, every change meant:

  1. Run terraform apply manually
  2. Wait 5 minutes
  3. Test with curl
  4. Remember to destroy
  5. Repeat for every environment

Now, every commit triggers:

  1. Unit tests (10 seconds)
  2. Integration tests (5 minutes)
  3. Confidence that it works

Infrastructure that is tested automatically is infrastructure you can trust.


The Results

Test Type What It Found Time Result
Unit Missing tags, wrong naming 10s ✅ Caught before PR
Integration Health check failures, 502 errors 5min ✅ Caught before merge
E2E Cross-module networking 15min ✅ Caught before release

What I Learned

Unit tests are your safety net. Run them on every commit. They cost nothing and catch everything.

Integration tests are your confidence builder. Run them before merging. They cost a little but find real issues.

E2E tests are your release gate. Run them less frequently. They cost more but verify everything works together.

defer terraform.Destroy is critical. Without it, failed tests leave resources running. With it, cleanup is guaranteed.

Secrets never go in code. Use GitHub Secrets for AWS credentials.


The Bottom Line

Manual testing gave me confidence for one deployment. Automated testing gives me confidence for every deployment.

Before After
Test once a day Test every commit
Manual curl checks Automated HTTP assertions
Hope cleanup works defer guarantees cleanup
30 minutes of manual work 5 minutes of automated trust

If you're not testing your infrastructure automatically, you're deploying with blind faith.


Top comments (0)