DEV Community

Mukami
Mukami

Posted on

The Importance of Manual Testing in Terraform

Why "It Works" Isn't Enough Until You Prove It


Day 17 of the 30-Day Terraform Challenge — and today I learned that my infrastructure "worked" until I actually tested it.

I had a webserver cluster. Terraform applied without errors. Everything looked perfect in the AWS Console. I was confident.

Then I ran a structured manual test. The results were humbling.


The Problem: Code Success ≠ Functional Success

Terraform told me:

  • ✅ 11 resources created
  • ✅ No errors
  • ✅ State matches configuration

But when I actually tried to use my infrastructure:

$ curl http://my-alb-dns
502 Bad Gateway
Enter fullscreen mode Exit fullscreen mode

The code worked. The infrastructure didn't.

This is why manual testing matters.


My Test Checklist

I built a structured test plan covering five categories:

1. Provisioning Verification

  • terraform init completes without errors
  • terraform validate passes cleanly
  • terraform plan shows expected resources
  • terraform apply completes successfully

2. Resource Correctness

  • Resources visible in AWS Console
  • Names match variables
  • Tags match expected values
  • Security group rules exactly as defined

3. Functional Verification

  • ALB DNS resolves
  • curl returns expected response
  • ASG instances pass health checks
  • Instance termination triggers replacement

4. State Consistency

  • terraform plan returns "No changes"
  • State file matches AWS resources

5. Cleanup

  • terraform destroy completes
  • AWS Console verification shows no resources

What I Found

Passed: 12 tests

  • Provisioning worked perfectly
  • All resources created with correct tags
  • State consistency was perfect
  • Destroy cleaned up properly

Failed: 2 tests

  • ALB DNS resolution (timeout)
  • ALB returned 502 Bad Gateway

The Root Cause

The infrastructure was created, but the application wasn't working. Why?

  1. ALB DNS takes time to propagate — I tested too early
  2. Health checks were failing — Instances weren't responding to HTTP
  3. User-data script may have failed — Apache probably wasn't running

The code was correct. The application was not.


What Manual Testing Taught Me

Terraform applies successfully ≠ infrastructure works

Terraform only checks that resources are created. It doesn't verify that your application is actually running.

DNS propagation is real — Just because the ALB exists doesn't mean it's reachable immediately.

Health checks are the real indicator — A running instance isn't enough. It needs to respond correctly.

Cleanup is harder than it looks — After terraform destroy, I found leftover instances. Manual verification is essential.


The Value of a Test Checklist

Before today, I'd run terraform apply and call it done.

Now I have a checklist that catches:

  • DNS propagation issues
  • Application startup failures
  • Health check problems
  • Cleanup gaps

Each failed test is a gap I can fix and later automate.


What I Learned About Cleanup

After terraform destroy, I verified with:

aws ec2 describe-instances --filters "Name=tag:Name,Values=*test-webserver*"
Enter fullscreen mode Exit fullscreen mode

I found five instances still running. Terraform destroyed the ASG but instances were still terminating. Manual verification caught what automation missed.

Lesson: Always verify cleanup. Don't trust destroy alone.


The Manual Test Results

Test Result
terraform init ✅ PASS
terraform validate ✅ PASS
terraform plan ✅ PASS
terraform apply ✅ PASS
Resources in AWS ✅ PASS
Tags correct ✅ PASS
Security group rules ✅ PASS
ALB DNS resolution ❌ FAIL
ALB returns webpage ❌ FAIL
ASG instances running ✅ PASS
State consistency ✅ PASS
terraform destroy ✅ PASS
Cleanup verification ✅ PASS

12 passed, 2 failed.


Why This Matters

Manual testing isn't about checking boxes. It's about finding gaps before they become outages.

If I had deployed this infrastructure without testing:

  • Users would see 502 errors
  • I'd be debugging under pressure
  • The problem would take longer to find

Instead, I found the failure in a controlled environment. I can now fix it and write automated tests to prevent it from happening again.


The Big Lesson

Terraform applies successfully ≠ Infrastructure works

The gap between "code success" and "functional success" is where outages happen. Manual testing closes that gap.


Next Steps

  1. Fix the user-data script to ensure Apache starts reliably
  2. Add wait_for_capacity_timeout to ASG
  3. Wait 2-3 minutes after apply before testing
  4. Write automated tests to catch these issues in CI

P.S. The 502 Bad Gateway was humbling. But finding it manually before deployment was a win. Test early, test often, test manually before you automate. 🚀

Top comments (0)