DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Terraform Starter Kit: Terraform Best Practices Guide

Terraform Best Practices Guide

A field-tested collection of patterns for writing maintainable, secure, and team-friendly Terraform configurations. These practices come from managing production AWS infrastructure across dozens of projects.


1. Project Structure

The single most impactful decision is how you organize files. A flat directory with everything in main.tf works for tutorials but breaks down fast in real projects.

Recommended layout:

project/
├── backend.tf          # Provider and backend config
├── variables.tf        # All input variables
├── outputs.tf          # All outputs
├── terraform.tfvars    # Variable values (git-ignored)
├── modules/
│   ├── vpc/
│   ├── ecs/
│   └── rds/
└── environments/
    ├── dev/main.tf     # Composes modules for dev
    └── prod/main.tf    # Composes modules for prod
Enter fullscreen mode Exit fullscreen mode

Why separate environments into directories instead of workspaces? Workspaces share the same backend config and state bucket key prefix. If you need different provider configurations, different module versions, or different backend settings per environment, directory-based separation is cleaner. Workspaces work well for identical environments that differ only in variable values.

2. Module Design

Good modules are the building blocks of maintainable infrastructure. Follow these principles:

Keep modules focused

Each module should manage one logical resource group. A VPC module creates a VPC, subnets, route tables, and gateways. It should not also create EC2 instances or RDS databases.

Expose configuration, hide implementation

# Good: The caller decides the behavior
variable "multi_az" {
  type    = bool
  default = true
}

# Bad: The caller decides the implementation detail
variable "availability_zone_count" {
  type    = number
  default = 3
}
Enter fullscreen mode Exit fullscreen mode

Always set sensible defaults

Every variable should have a default that works for the most common case. This lets new team members use the module immediately without reading every variable description.

variable "instance_class" {
  type        = string
  default     = "db.t3.medium"
  description = "RDS instance class. Use db.r6g.* for production workloads."
}
Enter fullscreen mode Exit fullscreen mode

Use validation blocks for input constraints

Catch configuration mistakes at plan time instead of discovering them during apply:

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be one of: dev, staging, prod."
  }
}

variable "vpc_cidr" {
  type = string
  validation {
    condition     = can(cidrhost(var.vpc_cidr, 0))
    error_message = "Must be a valid CIDR block."
  }
}
Enter fullscreen mode Exit fullscreen mode

Output everything the caller might need

If a module creates a resource, output its ID, ARN, and any connection strings. It costs nothing and saves future refactoring.

3. State Management

Terraform state is the source of truth for what infrastructure exists. Mismanaging state is the #1 cause of Terraform disasters.

Always use remote state

Local state files get lost, cannot be shared, and offer no locking. Use S3 + DynamoDB (AWS), GCS (GCP), or Azure Blob Storage as your backend.

terraform {
  backend "s3" {
    bucket         = "myorg-terraform-state"
    key            = "project/env/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}
Enter fullscreen mode Exit fullscreen mode

Enable state locking

Without locking, two engineers running terraform apply simultaneously will corrupt state. DynamoDB-based locking is the standard for AWS.

Separate state per environment

Never share state between dev and prod. A mistake in dev should not be able to affect prod resources. Use different S3 keys:

  • dev/terraform.tfstate
  • staging/terraform.tfstate
  • prod/terraform.tfstate

Encrypt state at rest

State files contain sensitive data: database passwords, private keys, API tokens. Always enable server-side encryption on your state bucket and consider using a KMS key for additional control.

Never edit state manually

If you need to move or remove resources from state, use terraform state mv and terraform state rm. Hand-editing terraform.tfstate will break checksums and can corrupt your infrastructure mapping.

4. Security Patterns

Never hardcode secrets

Use AWS Secrets Manager or SSM Parameter Store, and reference them in Terraform:

resource "aws_db_instance" "main" {
  # Let AWS manage the password in Secrets Manager
  manage_master_user_password = true
}
Enter fullscreen mode Exit fullscreen mode

For secrets needed during apply (API keys, tokens), use environment variables:

export TF_VAR_datadog_api_key="abc123"
terraform apply
Enter fullscreen mode Exit fullscreen mode

Use IAM roles, not access keys

For CI/CD pipelines, use OIDC federation (GitHub Actions, GitLab CI) or IAM roles (EC2, ECS) instead of long-lived access keys. This starter kit includes an OIDC provider configuration for GitHub Actions.

Apply least-privilege IAM policies

Scope IAM policies to specific resources using ARN patterns:

# Good: scoped to specific bucket
Resource = "arn:aws:s3:::myapp-prod-*/*"

# Bad: wildcard access
Resource = "*"
Enter fullscreen mode Exit fullscreen mode

Block public access by default

S3 buckets, RDS instances, and Elasticsearch domains should never be publicly accessible. Use security groups and bucket policies to control access.

5. Tagging Strategy

Consistent tagging is critical for cost allocation, access control, and incident response.

Use default_tags in the provider

Apply organization-wide tags automatically:

provider "aws" {
  default_tags {
    tags = {
      Project     = var.project_name
      Environment = var.environment
      ManagedBy   = "terraform"
      Team        = var.team
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Required tags for every resource

Tag Purpose Example
Project Group resources by project myapp
Environment Identify environment prod
ManagedBy Distinguish IaC from manual terraform
Team Cost allocation and ownership platform

6. CI/CD Integration

Plan on PR, apply on merge

Never auto-apply on pull request. The workflow should be:

  1. PR is opened: terraform plan runs and posts the plan as a PR comment
  2. Team reviews the plan diff alongside the code changes
  3. PR is merged to main: terraform apply -auto-approve runs

Use -no-color for CI logs

Terraform color codes don't render well in CI logs. Always pass -no-color in CI pipelines.

Pin provider and Terraform versions

Inconsistent versions across team members and CI cause drift:

terraform {
  required_version = "~> 1.7.0"  # Allow 1.7.x patches only
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

7. Cost Optimization

Use count and for_each to conditionally create resources

Don't pay for NAT Gateways in dev if you don't need them:

resource "aws_nat_gateway" "main" {
  count = var.environment == "prod" ? length(var.availability_zones) : 1
  # Prod: one NAT per AZ for HA. Dev: one NAT to save ~$32/month per gateway.
}
Enter fullscreen mode Exit fullscreen mode

Right-size from the start

Use db.t3.medium in dev and db.r6g.large in prod. Make instance classes a variable with per-environment defaults.

Set lifecycle rules on S3 buckets

Transition objects to cheaper storage tiers automatically:

transition {
  days          = 90
  storage_class = "STANDARD_IA"  # ~40% cheaper for infrequent access
}
transition {
  days          = 365
  storage_class = "GLACIER"      # ~80% cheaper for archival
}
Enter fullscreen mode Exit fullscreen mode

8. Testing and Validation

Run terraform validate and terraform fmt in CI

These catch syntax errors and enforce consistent formatting with zero configuration.

Use terraform plan as a test

A clean plan against an existing environment confirms your changes are additive and non-destructive. Look for (destroy) actions in the plan output — they usually indicate a mistake.

Consider Terratest for critical modules

For modules that manage production databases or networking, write Go tests with Terratest that create real infrastructure, validate it, and tear it down.

9. Common Pitfalls

Forgetting lifecycle.create_before_destroy

Security groups, parameter groups, and IAM policies often need replacement. Without this lifecycle rule, Terraform deletes the old resource before creating the new one, causing downtime.

Ignoring prevent_destroy for stateful resources

Databases and S3 buckets with important data should use prevent_destroy:

lifecycle {
  prevent_destroy = true
}
Enter fullscreen mode Exit fullscreen mode

Not using depends_on when needed

Most dependencies are inferred from resource references. But some (like IAM policy propagation) need explicit depends_on to avoid race conditions during apply.

Over-using terraform import

If you find yourself importing many resources, consider whether Terraform is the right tool for that resource. Some resources (DNS records managed by external teams, legacy VPCs) are better left outside Terraform.


Further Reading


Part of the Terraform Starter Kit by Datanest Digital (datanest.dev)


This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Terraform Starter Kit] with all files, templates, and documentation for $XX.

Get the Full Kit →

Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)