DEV Community

Yash
Yash

Posted on

The Terraform Mistakes I Made So You Do Not Have To

The Terraform Mistakes I Made So You Do Not Have To

I have been writing Terraform professionally for 4 years. I have also been making Terraform mistakes for 4 years.

Here are the ones that actually cost me time, money, or sleep, and what I learned from each.

Mistake 1: One Giant State File for Everything

When I started, putting everything in one state file felt elegant. One source of truth. Simple to reason about.

Then I made a mistake in one module and the entire state file became unusable during apply. Then two engineers tried to apply changes simultaneously and got state lock conflicts. Then the state file grew to 50MB and plans started taking 8 minutes.

What I do now: one state file per service per environment. More overhead to set up, dramatically safer to operate.

Mistake 2: Storing Secrets in Variables

I know. Obviously bad. I still did it.

The situation: I was moving fast on a new project, needed to pass a database password to the application config, thought I would fix this properly later. Later never came. The password lived in a tfvars file for 8 months. The tfvars file was committed to git for 3 of those months before I noticed.

What I do now: AWS Secrets Manager or Parameter Store for all secrets. Terraform reads the secret ARN, not the secret value.

Mistake 3: Using Terraform for Configuration Management

Terraform provisions infrastructure. It is not designed for configuration management.

I used it anyway. I wrote Terraform resources to install packages, write config files, and manage services on EC2 instances using remote-exec and user_data.

It worked until servers drifted from their Terraform state because someone made a manual change. Terraform thought the configuration was correct. The servers disagreed.

What I do now: hard separation. Terraform provisions. Ansible configures. Never overlap.

Mistake 4: Ignoring the Blast Radius

Early in a project I had a root-level module that managed VPC, subnets, security groups, RDS, ECS cluster, and application services all together.

A typo in the application service configuration caused Terraform to evaluate dependencies across everything. The plan showed changes to resources I did not expect to touch.

I panicked. I ran terraform apply -target. I introduced more drift. It took 4 hours to sort out and we had 20 minutes of unnecessary downtime.

What I do now: separate modules for infrastructure layers with explicit interfaces between them. Changes to application config cannot accidentally touch networking.

Mistake 5: Not Planning for Terraform State Migration

When you outgrow your state structure, you need to migrate. Moving resources between state files requires terraform state mv commands, one per resource. With 200+ resources, this is a significant project. Get it wrong and you orphan resources or create duplicates.

I did this twice. Both times were painful.

What I do now: design the state structure for where the project will be in 12 months, not where it is today. It is much easier to start with more granular state than to split a monolith later.

Mistake 6: Manual Workspace Management

I used Terraform workspaces to manage staging and production environments. Different workspace equals different state equals different environment.

The problem: nothing prevents you from running terraform apply in the wrong workspace. I applied a destructive change to production instead of staging once. Short outage, we caught it fast, but entirely preventable.

What I do now: separate directories for separate environments. The directory structure makes it obvious which environment you are operating on.

The Pattern in All of These

Every mistake I made with Terraform came from optimizing for the short term. One state file is simpler to set up than many. User data is faster than setting up Ansible. Workspaces are less overhead than separate directories.

The short-term convenience always came with a long-term cost.

Terraform is a tool for managing infrastructure that will outlast any individual engineers involvement. Design for that.

I am building Step2Dev to make the right Terraform practices the default, not the exception. More at step2dev.com.

What Terraform mistake cost you the most? Drop it in the comments.

Top comments (0)