Infrastructure as Code Patterns Guide
Best practices for composable, maintainable, and scalable Terraform infrastructure.
Datanest Digital — datanest.dev
Table of Contents
- Pattern Philosophy
- Composition vs Inheritance
- Module Design Principles
- State Management
- Environment Strategy
- Drift Detection & Remediation
- Cost Estimation & Optimization
- Security Considerations
- CI/CD Integration
- Troubleshooting
Pattern Philosophy
These patterns follow three core principles:
Composable: Each pattern is a self-contained unit that can be used independently or combined with others. A VPC pattern doesn't assume you'll use ECS — it outputs everything downstream patterns need.
Environment-aware: Every pattern accepts an
environmentvariable and adjusts defaults accordingly. Dev gets cost-optimized settings; prod gets HA and monitoring.Opinionated defaults, flexible overrides: Patterns ship with production-ready defaults but expose variables for every meaningful configuration point.
┌─────────────────────────────────────────────────┐
│ Terragrunt Root │
│ (state, provider, common tags) │
├──────────────┬──────────────┬───────────────────┤
│ Dev │ Staging │ Prod │
│ (2 AZ, min) │ (2 AZ, mid)│ (3 AZ, HA, mon) │
├──────────────┴──────────────┴───────────────────┤
│ Pattern Library │
│ ┌─────┐ ┌──────────┐ ┌─────┐ ┌──────┐ ┌─────┐│
│ │ VPC │ │ECS Farg. │ │ RDS │ │Lambda│ │Site ││
│ └──┬──┘ └────┬─────┘ └──┬──┘ └──┬───┘ └──┬──┘│
│ └──────────┴──────────┴───────┘ │ │
│ Shared Modules │ │
│ ┌──────┐ │ │
│ │ Tags │◄──────────────────────────────────┘ │
│ └──────┘ │
└─────────────────────────────────────────────────┘
Composition vs Inheritance
The Problem with Monolithic Modules
A single module that provisions "everything" becomes unmaintainable:
# DON'T: Monolithic "platform" module
module "platform" {
source = "./modules/platform"
# 200+ variables covering VPC, ECS, RDS, Lambda...
}
Composition Pattern (Recommended)
Break infrastructure into independent patterns connected by outputs:
# DO: Compose independent patterns
module "network" {
source = "../patterns/vpc-three-tier"
environment = var.environment
vpc_cidr = "10.0.0.0/16"
}
module "database" {
source = "../patterns/rds-aurora"
vpc_id = module.network.vpc_id # ← composed
environment = var.environment
}
module "app" {
source = "../patterns/ecs-fargate-service"
vpc_id = module.network.vpc_id # ← composed
subnet_ids = module.network.private_subnet_ids
db_endpoint = module.database.cluster_endpoint
}
When to Use a Shared Module
Create a shared module when:
- The same logic appears in 3+ patterns (e.g., tagging)
- The logic is purely computational (no resources, just data transformation)
- It enforces organizational policy
# Good shared module: consistent tagging
module "tags" {
source = "../../modules/tags"
project = var.project
environment = var.environment
team = "platform"
}
Module Design Principles
1. Explicit Inputs, No Hidden Dependencies
Every external dependency should be a variable, never a hard-coded data source lookup inside the module:
# DON'T: Hidden dependency
data "aws_vpc" "main" {
tags = { Name = "main" } # Assumes a VPC named "main" exists
}
# DO: Explicit input
variable "vpc_id" {
description = "VPC ID where resources will be deployed"
type = string
}
2. Validate Early
Use variable validation blocks to catch misconfigurations before plan:
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
3. Conditional Resources
Use count or for_each for optional features, not separate modules:
# Optional read replicas
resource "aws_rds_cluster_instance" "readers" {
count = var.reader_count # 0 in dev, 2 in prod
...
}
4. Output Everything Downstream Needs
Outputs are your module's API. Be generous — it's easier to ignore an output than to add one later:
output "cluster_endpoint" { value = aws_rds_cluster.main.endpoint }
output "cluster_arn" { value = aws_rds_cluster.main.arn }
output "security_group_id" { value = aws_security_group.db.id }
output "kms_key_arn" { value = aws_kms_key.db.arn }
State Management
State File Organization
One state file per environment per pattern:
s3://my-terraform-state/
├── dev/
│ ├── vpc/terraform.tfstate
│ ├── ecs/terraform.tfstate
│ └── rds/terraform.tfstate
├── staging/
│ └── ...
└── prod/
└── ...
State Locking
Always use DynamoDB locking. The Terragrunt root config handles this automatically, but for standalone Terraform:
backend "s3" {
bucket = "my-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
Cross-State References
Use terraform_remote_state or SSM Parameter Store for cross-pattern data:
# Option A: Remote state (tightly coupled)
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "${var.environment}/vpc/terraform.tfstate"
region = "us-east-1"
}
}
# Option B: SSM Parameter Store (loosely coupled, preferred)
data "aws_ssm_parameter" "vpc_id" {
name = "/${var.project}/${var.environment}/vpc/id"
}
State Operations Safety
Use the tf-wrapper.sh script for all operations — it enforces:
- Lock timeout to prevent deadlocks
- Plan-before-apply workflow
- Production confirmation prompts
- Automatic plan archival
Environment Strategy
Dev Environment
- Goal: Fast iteration, low cost
- 2 AZs, single NAT gateway
- Serverless v2 for databases (scales near-zero)
- Minimal monitoring, short log retention
deletion_protection = false
Staging Environment
- Goal: Production-like validation
- Same architecture as prod but smaller instances
- Full monitoring enabled
- Mirrors prod security settings
Prod Environment
- Goal: High availability, full observability
- 3 AZs, NAT per AZ
- Provisioned database instances with read replicas
- Enhanced monitoring, Performance Insights, X-Ray
deletion_protection = true- 35-day backup retention
Promoting Between Environments
# Plan against staging with prod-like settings
./scripts/tf-wrapper.sh plan -e staging
# Review the plan carefully
less .logs/plan-staging.log
# Apply to staging
./scripts/tf-wrapper.sh apply -e staging
# After validation, plan and apply to prod
./scripts/tf-wrapper.sh plan -e prod
./scripts/tf-wrapper.sh apply -e prod
Drift Detection & Remediation
Drift occurs when real infrastructure diverges from Terraform state. Common causes:
- Manual console changes
- Auto-scaling events modifying desired counts
- AWS service updates changing defaults
Detecting Drift
# Check for drift
./scripts/tf-wrapper.sh drift -e prod
# Terraform will show a plan with unexpected changes
# Exit code 2 = drift detected
Remediation Strategies
- Accept drift: Import the change into state
terraform import aws_instance.web i-1234567890abcdef0
- Reject drift: Apply to revert to desired state
./scripts/tf-wrapper.sh plan -e prod
./scripts/tf-wrapper.sh apply -e prod
-
Prevent drift: Use
lifecycle { ignore_changes }for expected drift
lifecycle {
ignore_changes = [desired_count] # Managed by auto-scaling
}
Scheduled Drift Detection
Run drift detection in CI/CD on a schedule:
# .github/workflows/drift-detection.yml
on:
schedule:
- cron: '0 8 * * 1-5' # Weekdays at 8 AM UTC
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: |
./scripts/tf-wrapper.sh drift -e prod
if [ $? -eq 2 ]; then
# Send Slack notification about drift
curl -X POST "$SLACK_WEBHOOK" \
-d '{"text":"⚠️ Infrastructure drift detected in prod"}'
fi
Cost Estimation & Optimization
Using Infracost
# Estimate costs before applying
./scripts/tf-wrapper.sh cost -e prod
# Compare costs between branches
infracost diff \
--path . \
--compare-to infracost-base.json
Cost Optimization Tips
| Resource | Dev Optimization | Prod Optimization |
|---|---|---|
| RDS | Serverless v2 (min 0.5 ACU) | Right-size with Performance Insights data |
| NAT Gateway | Single NAT ($32/mo saved) | Keep multi-AZ for HA |
| ECS | 0.25 vCPU / 512 MB | Auto-scale based on actual metrics |
| CloudFront | PriceClass_100 | PriceClass_200 (covers most users) |
| S3 | No versioning | Lifecycle rules for old versions |
Tagging for Cost Allocation
The tags module ensures every resource has Project, Environment, Team, and CostCenter tags, enabling AWS Cost Explorer filtering.
Security Considerations
Secrets Management
-
Never store secrets in
.tfvarsfiles or state - Use
random_password+ Secrets Manager (as in the RDS pattern) - Reference secrets by ARN, not value
Encryption
All patterns encrypt data at rest by default:
- RDS: KMS with automatic key rotation
- S3: AES-256 with bucket keys
- DynamoDB: AWS managed encryption
Network Security
- All databases in private subnets (no public access)
- Security group rules reference other SGs, not CIDRs where possible
- SSL/TLS enforced on all database connections
- CloudFront → S3 via Origin Access Control (not public buckets)
IAM
- Lambda roles follow least-privilege (only the specific table, not
dynamodb:*) - No
*resource ARNs in production policies - IAM database authentication enabled for Aurora
CI/CD Integration
GitHub Actions Workflow
name: Terraform
on:
pull_request:
paths: ['terraform/**']
push:
branches: [main]
paths: ['terraform/**']
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Init
run: terraform init
- name: Validate
run: terraform validate
- name: Plan
run: terraform plan -out=tfplan
env:
TF_VAR_environment: ${{ github.base_ref == 'main' && 'prod' || 'dev' }}
- name: Comment PR
uses: actions/github-script@v7
if: github.event_name == 'pull_request'
with:
script: |
// Post plan output as PR comment
apply:
needs: plan
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production # Requires manual approval
steps:
- run: terraform apply tfplan
Pre-commit Hooks
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.88.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_tflint
- id: terraform_docs
- id: terraform_checkov
args: ['--args=--quiet']
Troubleshooting
Common Issues
State lock stuck
# List locks
aws dynamodb scan --table-name terraform-locks
# Force unlock (use with caution)
terraform force-unlock LOCK_ID
Provider version conflicts
# Upgrade providers
terraform init -upgrade
# Pin to specific version in required_providers
Resource already exists
# Import existing resource into state
terraform import aws_s3_bucket.site my-bucket-name
Cycle dependencies
- Break cycles by using
depends_onexplicitly - Or split into separate state files with remote state references
Useful Commands
# Show current state
terraform state list
# Show specific resource
terraform state show aws_rds_cluster.main
# Move resource in state (rename without destroy)
terraform state mv aws_instance.old aws_instance.new
# Remove from state without destroying
terraform state rm aws_instance.imported
# Refresh state from real infrastructure
terraform refresh
Pattern Quick Reference
| Pattern | Use Case | Key Features |
|---|---|---|
| VPC Three-Tier | Network foundation | Public/private/database subnets, NAT, flow logs |
| ECS Fargate | Container workloads | ALB, auto-scaling, health checks, rolling deploys |
| RDS Aurora | Relational databases | Encryption, read replicas, automated backups |
| Lambda API | Serverless APIs | API Gateway, DynamoDB, X-Ray, throttling |
| Static Site | Frontend hosting | S3, CloudFront, security headers, SPA support |
Part of the Infrastructure as Code Patterns collection by Datanest Digital.
For support: hello@datanest.dev
This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Infrastructure As Code Patterns] with all files, templates, and documentation for $XX.
Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.
Top comments (0)