The Importance of Manual Testing in Terraform

#devops #terraform #aws #eveops

It is incredibly tempting to write Terraform code, see the green Apply complete! text, and assume your job is done. But there is a massive difference between infrastructure that provisions successfully and infrastructure that actually works.

For Day 17 of my 30-Day Terraform Challenge, I stepped back from writing code to focus on something even more critical: breaking it. I built a structured manual testing process to verify my AWS Blue/Green architecture. Here is why manual testing is the unavoidable foundation of Infrastructure as Code.

Why Manual Testing Still Matters

With tools like Terratest available, manual testing might feel archaic. However, automated tests are only as good as the assertions you write. You cannot automate a test until you manually discover exactly what edge cases, timing issues, or cloud provider quirks need verifying. Manual testing is the exploratory phase that defines your automated test suite.

Building a Structured Test Checklist

A manual test without a checklist is just aimlessly clicking around the AWS console. To build a robust checklist, you must break your infrastructure down into four distinct verification categories:

Provisioning Verification: Do the core Terraform commands (init, validate, plan, apply) execute cleanly?
Resource Correctness: Do the tags, names, and strict Security Group rules in the AWS Console match your configuration?
Functional Verification: Does the architecture actually behave as intended? (e.g., hitting the ALB URL, checking instance health).
State Consistency: Does running a subsequent terraform plan return zero changes?

Provisioning vs. Functional Verification

A successful terraform apply only tells you one thing: AWS accepted your API calls. This is provisioning verification. It does not tell you if your Security Groups allow the right traffic, if your ALB is routing to the correct Target Group, or if your EC2 instances actually boot up your application.

To prove the code works, you need functional verification. For my Blue/Green deployment, this meant explicitly curling the ALB DNS name to ensure it returned the correct web page, and manually terminating an EC2 instance to ensure my Auto Scaling Group successfully self-healed.

The Cleanup Discipline

Perhaps the most crucial part of manual testing is cleaning up afterward. Terraform state files can become corrupted, or a destroy command can fail partway through due to an API timeout. If you don't manually verify your cleanup using the AWS CLI (aws ec2 describe-instances), you risk leaving orphaned resources running in the background. In the cloud, orphaned resources equal runaway bills.

Actual Test Results: Passes and Failures

Testing isn't about proving your code is perfect; it's about exposing the flaws. Here are two examples from my test run today:

Test 1: Functional Routing (PASS)

Command: curl -s http://prod-app-alb-123456.us-east-1.elb.amazonaws.com
Expected: "Welcome to the BLUE Environment! (prod)"
Actual: "Welcome to the BLUE Environment! (prod)"
Result: PASS - The ALB successfully routed traffic to the active Target Group.

Test 2: State Consistency (FAIL)

Command: terraform plan (Run immediately after a successful apply)
Expected: "No changes. Your infrastructure matches the configuration."
Actual: 1 resource change detected — missing tag on security group.
Result: FAIL - The aws_security_group was missing the merge(local.common_tags) block. I caught this state drift manually and fixed the module code before it scaled to hundreds of resources.