Terraform Best Practices Guide
A field-tested collection of patterns for writing maintainable, secure, and team-friendly Terraform configurations. These practices come from managing production AWS infrastructure across dozens of projects.
1. Project Structure
The single most impactful decision is how you organize files. A flat directory with everything in main.tf works for tutorials but breaks down fast in real projects.
Recommended layout:
project/
├── backend.tf # Provider and backend config
├── variables.tf # All input variables
├── outputs.tf # All outputs
├── terraform.tfvars # Variable values (git-ignored)
├── modules/
│ ├── vpc/
│ ├── ecs/
│ └── rds/
└── environments/
├── dev/main.tf # Composes modules for dev
└── prod/main.tf # Composes modules for prod
Why separate environments into directories instead of workspaces? Workspaces share the same backend config and state bucket key prefix. If you need different provider configurations, different module versions, or different backend settings per environment, directory-based separation is cleaner. Workspaces work well for identical environments that differ only in variable values.
2. Module Design
Good modules are the building blocks of maintainable infrastructure. Follow these principles:
Keep modules focused
Each module should manage one logical resource group. A VPC module creates a VPC, subnets, route tables, and gateways. It should not also create EC2 instances or RDS databases.
Expose configuration, hide implementation
# Good: The caller decides the behavior
variable "multi_az" {
type = bool
default = true
}
# Bad: The caller decides the implementation detail
variable "availability_zone_count" {
type = number
default = 3
}
Always set sensible defaults
Every variable should have a default that works for the most common case. This lets new team members use the module immediately without reading every variable description.
variable "instance_class" {
type = string
default = "db.t3.medium"
description = "RDS instance class. Use db.r6g.* for production workloads."
}
Use validation blocks for input constraints
Catch configuration mistakes at plan time instead of discovering them during apply:
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "vpc_cidr" {
type = string
validation {
condition = can(cidrhost(var.vpc_cidr, 0))
error_message = "Must be a valid CIDR block."
}
}
Output everything the caller might need
If a module creates a resource, output its ID, ARN, and any connection strings. It costs nothing and saves future refactoring.
3. State Management
Terraform state is the source of truth for what infrastructure exists. Mismanaging state is the #1 cause of Terraform disasters.
Always use remote state
Local state files get lost, cannot be shared, and offer no locking. Use S3 + DynamoDB (AWS), GCS (GCP), or Azure Blob Storage as your backend.
terraform {
backend "s3" {
bucket = "myorg-terraform-state"
key = "project/env/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Enable state locking
Without locking, two engineers running terraform apply simultaneously will corrupt state. DynamoDB-based locking is the standard for AWS.
Separate state per environment
Never share state between dev and prod. A mistake in dev should not be able to affect prod resources. Use different S3 keys:
dev/terraform.tfstatestaging/terraform.tfstateprod/terraform.tfstate
Encrypt state at rest
State files contain sensitive data: database passwords, private keys, API tokens. Always enable server-side encryption on your state bucket and consider using a KMS key for additional control.
Never edit state manually
If you need to move or remove resources from state, use terraform state mv and terraform state rm. Hand-editing terraform.tfstate will break checksums and can corrupt your infrastructure mapping.
4. Security Patterns
Never hardcode secrets
Use AWS Secrets Manager or SSM Parameter Store, and reference them in Terraform:
resource "aws_db_instance" "main" {
# Let AWS manage the password in Secrets Manager
manage_master_user_password = true
}
For secrets needed during apply (API keys, tokens), use environment variables:
export TF_VAR_datadog_api_key="abc123"
terraform apply
Use IAM roles, not access keys
For CI/CD pipelines, use OIDC federation (GitHub Actions, GitLab CI) or IAM roles (EC2, ECS) instead of long-lived access keys. This starter kit includes an OIDC provider configuration for GitHub Actions.
Apply least-privilege IAM policies
Scope IAM policies to specific resources using ARN patterns:
# Good: scoped to specific bucket
Resource = "arn:aws:s3:::myapp-prod-*/*"
# Bad: wildcard access
Resource = "*"
Block public access by default
S3 buckets, RDS instances, and Elasticsearch domains should never be publicly accessible. Use security groups and bucket policies to control access.
5. Tagging Strategy
Consistent tagging is critical for cost allocation, access control, and incident response.
Use default_tags in the provider
Apply organization-wide tags automatically:
provider "aws" {
default_tags {
tags = {
Project = var.project_name
Environment = var.environment
ManagedBy = "terraform"
Team = var.team
}
}
}
Required tags for every resource
| Tag | Purpose | Example |
|---|---|---|
Project |
Group resources by project | myapp |
Environment |
Identify environment | prod |
ManagedBy |
Distinguish IaC from manual | terraform |
Team |
Cost allocation and ownership | platform |
6. CI/CD Integration
Plan on PR, apply on merge
Never auto-apply on pull request. The workflow should be:
- PR is opened:
terraform planruns and posts the plan as a PR comment - Team reviews the plan diff alongside the code changes
- PR is merged to main:
terraform apply -auto-approveruns
Use -no-color for CI logs
Terraform color codes don't render well in CI logs. Always pass -no-color in CI pipelines.
Pin provider and Terraform versions
Inconsistent versions across team members and CI cause drift:
terraform {
required_version = "~> 1.7.0" # Allow 1.7.x patches only
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.40"
}
}
}
7. Cost Optimization
Use count and for_each to conditionally create resources
Don't pay for NAT Gateways in dev if you don't need them:
resource "aws_nat_gateway" "main" {
count = var.environment == "prod" ? length(var.availability_zones) : 1
# Prod: one NAT per AZ for HA. Dev: one NAT to save ~$32/month per gateway.
}
Right-size from the start
Use db.t3.medium in dev and db.r6g.large in prod. Make instance classes a variable with per-environment defaults.
Set lifecycle rules on S3 buckets
Transition objects to cheaper storage tiers automatically:
transition {
days = 90
storage_class = "STANDARD_IA" # ~40% cheaper for infrequent access
}
transition {
days = 365
storage_class = "GLACIER" # ~80% cheaper for archival
}
8. Testing and Validation
Run terraform validate and terraform fmt in CI
These catch syntax errors and enforce consistent formatting with zero configuration.
Use terraform plan as a test
A clean plan against an existing environment confirms your changes are additive and non-destructive. Look for (destroy) actions in the plan output — they usually indicate a mistake.
Consider Terratest for critical modules
For modules that manage production databases or networking, write Go tests with Terratest that create real infrastructure, validate it, and tear it down.
9. Common Pitfalls
Forgetting lifecycle.create_before_destroy
Security groups, parameter groups, and IAM policies often need replacement. Without this lifecycle rule, Terraform deletes the old resource before creating the new one, causing downtime.
Ignoring prevent_destroy for stateful resources
Databases and S3 buckets with important data should use prevent_destroy:
lifecycle {
prevent_destroy = true
}
Not using depends_on when needed
Most dependencies are inferred from resource references. But some (like IAM policy propagation) need explicit depends_on to avoid race conditions during apply.
Over-using terraform import
If you find yourself importing many resources, consider whether Terraform is the right tool for that resource. Some resources (DNS records managed by external teams, legacy VPCs) are better left outside Terraform.
Further Reading
Part of the Terraform Starter Kit by Datanest Digital (datanest.dev)
This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Terraform Starter Kit] with all files, templates, and documentation for $XX.
Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.
Top comments (0)