DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Infrastructure as Code Patterns Guide

Infrastructure as Code Patterns Guide

Best practices for composable, maintainable, and scalable Terraform infrastructure.

Datanest Digital — datanest.dev


Table of Contents

  1. Pattern Philosophy
  2. Composition vs Inheritance
  3. Module Design Principles
  4. State Management
  5. Environment Strategy
  6. Drift Detection & Remediation
  7. Cost Estimation & Optimization
  8. Security Considerations
  9. CI/CD Integration
  10. Troubleshooting

Pattern Philosophy

These patterns follow three core principles:

  1. Composable: Each pattern is a self-contained unit that can be used independently or combined with others. A VPC pattern doesn't assume you'll use ECS — it outputs everything downstream patterns need.

  2. Environment-aware: Every pattern accepts an environment variable and adjusts defaults accordingly. Dev gets cost-optimized settings; prod gets HA and monitoring.

  3. Opinionated defaults, flexible overrides: Patterns ship with production-ready defaults but expose variables for every meaningful configuration point.

┌─────────────────────────────────────────────────┐
│                  Terragrunt Root                 │
│            (state, provider, common tags)        │
├──────────────┬──────────────┬───────────────────┤
│     Dev      │   Staging    │      Prod         │
│  (2 AZ, min) │  (2 AZ, mid)│  (3 AZ, HA, mon) │
├──────────────┴──────────────┴───────────────────┤
│                Pattern Library                   │
│  ┌─────┐ ┌──────────┐ ┌─────┐ ┌──────┐ ┌─────┐│
│  │ VPC │ │ECS Farg. │ │ RDS │ │Lambda│ │Site ││
│  └──┬──┘ └────┬─────┘ └──┬──┘ └──┬───┘ └──┬──┘│
│     └──────────┴──────────┴───────┘        │   │
│                Shared Modules               │   │
│  ┌──────┐                                   │   │
│  │ Tags │◄──────────────────────────────────┘   │
│  └──────┘                                       │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Composition vs Inheritance

The Problem with Monolithic Modules

A single module that provisions "everything" becomes unmaintainable:

# DON'T: Monolithic "platform" module
module "platform" {
  source = "./modules/platform"
  # 200+ variables covering VPC, ECS, RDS, Lambda...
}
Enter fullscreen mode Exit fullscreen mode

Composition Pattern (Recommended)

Break infrastructure into independent patterns connected by outputs:

# DO: Compose independent patterns
module "network" {
  source      = "../patterns/vpc-three-tier"
  environment = var.environment
  vpc_cidr    = "10.0.0.0/16"
}

module "database" {
  source      = "../patterns/rds-aurora"
  vpc_id      = module.network.vpc_id          # ← composed
  environment = var.environment
}

module "app" {
  source      = "../patterns/ecs-fargate-service"
  vpc_id      = module.network.vpc_id          # ← composed
  subnet_ids  = module.network.private_subnet_ids
  db_endpoint = module.database.cluster_endpoint
}
Enter fullscreen mode Exit fullscreen mode

When to Use a Shared Module

Create a shared module when:

  • The same logic appears in 3+ patterns (e.g., tagging)
  • The logic is purely computational (no resources, just data transformation)
  • It enforces organizational policy
# Good shared module: consistent tagging
module "tags" {
  source      = "../../modules/tags"
  project     = var.project
  environment = var.environment
  team        = "platform"
}
Enter fullscreen mode Exit fullscreen mode

Module Design Principles

1. Explicit Inputs, No Hidden Dependencies

Every external dependency should be a variable, never a hard-coded data source lookup inside the module:

# DON'T: Hidden dependency
data "aws_vpc" "main" {
  tags = { Name = "main" }  # Assumes a VPC named "main" exists
}

# DO: Explicit input
variable "vpc_id" {
  description = "VPC ID where resources will be deployed"
  type        = string
}
Enter fullscreen mode Exit fullscreen mode

2. Validate Early

Use variable validation blocks to catch misconfigurations before plan:

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Conditional Resources

Use count or for_each for optional features, not separate modules:

# Optional read replicas
resource "aws_rds_cluster_instance" "readers" {
  count = var.reader_count  # 0 in dev, 2 in prod
  ...
}
Enter fullscreen mode Exit fullscreen mode

4. Output Everything Downstream Needs

Outputs are your module's API. Be generous — it's easier to ignore an output than to add one later:

output "cluster_endpoint" { value = aws_rds_cluster.main.endpoint }
output "cluster_arn"      { value = aws_rds_cluster.main.arn }
output "security_group_id" { value = aws_security_group.db.id }
output "kms_key_arn"       { value = aws_kms_key.db.arn }
Enter fullscreen mode Exit fullscreen mode

State Management

State File Organization

One state file per environment per pattern:

s3://my-terraform-state/
├── dev/
│   ├── vpc/terraform.tfstate
│   ├── ecs/terraform.tfstate
│   └── rds/terraform.tfstate
├── staging/
│   └── ...
└── prod/
    └── ...
Enter fullscreen mode Exit fullscreen mode

State Locking

Always use DynamoDB locking. The Terragrunt root config handles this automatically, but for standalone Terraform:

backend "s3" {
  bucket         = "my-terraform-state"
  key            = "prod/vpc/terraform.tfstate"
  region         = "us-east-1"
  encrypt        = true
  dynamodb_table = "terraform-locks"
}
Enter fullscreen mode Exit fullscreen mode

Cross-State References

Use terraform_remote_state or SSM Parameter Store for cross-pattern data:

# Option A: Remote state (tightly coupled)
data "terraform_remote_state" "vpc" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key    = "${var.environment}/vpc/terraform.tfstate"
    region = "us-east-1"
  }
}

# Option B: SSM Parameter Store (loosely coupled, preferred)
data "aws_ssm_parameter" "vpc_id" {
  name = "/${var.project}/${var.environment}/vpc/id"
}
Enter fullscreen mode Exit fullscreen mode

State Operations Safety

Use the tf-wrapper.sh script for all operations — it enforces:

  • Lock timeout to prevent deadlocks
  • Plan-before-apply workflow
  • Production confirmation prompts
  • Automatic plan archival

Environment Strategy

Dev Environment

  • Goal: Fast iteration, low cost
  • 2 AZs, single NAT gateway
  • Serverless v2 for databases (scales near-zero)
  • Minimal monitoring, short log retention
  • deletion_protection = false

Staging Environment

  • Goal: Production-like validation
  • Same architecture as prod but smaller instances
  • Full monitoring enabled
  • Mirrors prod security settings

Prod Environment

  • Goal: High availability, full observability
  • 3 AZs, NAT per AZ
  • Provisioned database instances with read replicas
  • Enhanced monitoring, Performance Insights, X-Ray
  • deletion_protection = true
  • 35-day backup retention

Promoting Between Environments

# Plan against staging with prod-like settings
./scripts/tf-wrapper.sh plan -e staging

# Review the plan carefully
less .logs/plan-staging.log

# Apply to staging
./scripts/tf-wrapper.sh apply -e staging

# After validation, plan and apply to prod
./scripts/tf-wrapper.sh plan -e prod
./scripts/tf-wrapper.sh apply -e prod
Enter fullscreen mode Exit fullscreen mode

Drift Detection & Remediation

Drift occurs when real infrastructure diverges from Terraform state. Common causes:

  • Manual console changes
  • Auto-scaling events modifying desired counts
  • AWS service updates changing defaults

Detecting Drift

# Check for drift
./scripts/tf-wrapper.sh drift -e prod

# Terraform will show a plan with unexpected changes
# Exit code 2 = drift detected
Enter fullscreen mode Exit fullscreen mode

Remediation Strategies

  1. Accept drift: Import the change into state
   terraform import aws_instance.web i-1234567890abcdef0
Enter fullscreen mode Exit fullscreen mode
  1. Reject drift: Apply to revert to desired state
   ./scripts/tf-wrapper.sh plan -e prod
   ./scripts/tf-wrapper.sh apply -e prod
Enter fullscreen mode Exit fullscreen mode
  1. Prevent drift: Use lifecycle { ignore_changes } for expected drift
   lifecycle {
     ignore_changes = [desired_count]  # Managed by auto-scaling
   }
Enter fullscreen mode Exit fullscreen mode

Scheduled Drift Detection

Run drift detection in CI/CD on a schedule:

# .github/workflows/drift-detection.yml
on:
  schedule:
    - cron: '0 8 * * 1-5'  # Weekdays at 8 AM UTC

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: |
          ./scripts/tf-wrapper.sh drift -e prod
          if [ $? -eq 2 ]; then
            # Send Slack notification about drift
            curl -X POST "$SLACK_WEBHOOK" \
              -d '{"text":"⚠️ Infrastructure drift detected in prod"}'
          fi
Enter fullscreen mode Exit fullscreen mode

Cost Estimation & Optimization

Using Infracost

# Estimate costs before applying
./scripts/tf-wrapper.sh cost -e prod

# Compare costs between branches
infracost diff \
  --path . \
  --compare-to infracost-base.json
Enter fullscreen mode Exit fullscreen mode

Cost Optimization Tips

Resource Dev Optimization Prod Optimization
RDS Serverless v2 (min 0.5 ACU) Right-size with Performance Insights data
NAT Gateway Single NAT ($32/mo saved) Keep multi-AZ for HA
ECS 0.25 vCPU / 512 MB Auto-scale based on actual metrics
CloudFront PriceClass_100 PriceClass_200 (covers most users)
S3 No versioning Lifecycle rules for old versions

Tagging for Cost Allocation

The tags module ensures every resource has Project, Environment, Team, and CostCenter tags, enabling AWS Cost Explorer filtering.


Security Considerations

Secrets Management

  • Never store secrets in .tfvars files or state
  • Use random_password + Secrets Manager (as in the RDS pattern)
  • Reference secrets by ARN, not value

Encryption

All patterns encrypt data at rest by default:

  • RDS: KMS with automatic key rotation
  • S3: AES-256 with bucket keys
  • DynamoDB: AWS managed encryption

Network Security

  • All databases in private subnets (no public access)
  • Security group rules reference other SGs, not CIDRs where possible
  • SSL/TLS enforced on all database connections
  • CloudFront → S3 via Origin Access Control (not public buckets)

IAM

  • Lambda roles follow least-privilege (only the specific table, not dynamodb:*)
  • No * resource ARNs in production policies
  • IAM database authentication enabled for Aurora

CI/CD Integration

GitHub Actions Workflow

name: Terraform

on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Init
        run: terraform init

      - name: Validate
        run: terraform validate

      - name: Plan
        run: terraform plan -out=tfplan
        env:
          TF_VAR_environment: ${{ github.base_ref == 'main' && 'prod' || 'dev' }}

      - name: Comment PR
        uses: actions/github-script@v7
        if: github.event_name == 'pull_request'
        with:
          script: |
            // Post plan output as PR comment

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval
    steps:
      - run: terraform apply tfplan
Enter fullscreen mode Exit fullscreen mode

Pre-commit Hooks

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.88.0
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_tflint
      - id: terraform_docs
      - id: terraform_checkov
        args: ['--args=--quiet']
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Common Issues

State lock stuck

# List locks
aws dynamodb scan --table-name terraform-locks

# Force unlock (use with caution)
terraform force-unlock LOCK_ID
Enter fullscreen mode Exit fullscreen mode

Provider version conflicts

# Upgrade providers
terraform init -upgrade

# Pin to specific version in required_providers
Enter fullscreen mode Exit fullscreen mode

Resource already exists

# Import existing resource into state
terraform import aws_s3_bucket.site my-bucket-name
Enter fullscreen mode Exit fullscreen mode

Cycle dependencies

  • Break cycles by using depends_on explicitly
  • Or split into separate state files with remote state references

Useful Commands

# Show current state
terraform state list

# Show specific resource
terraform state show aws_rds_cluster.main

# Move resource in state (rename without destroy)
terraform state mv aws_instance.old aws_instance.new

# Remove from state without destroying
terraform state rm aws_instance.imported

# Refresh state from real infrastructure
terraform refresh
Enter fullscreen mode Exit fullscreen mode

Pattern Quick Reference

Pattern Use Case Key Features
VPC Three-Tier Network foundation Public/private/database subnets, NAT, flow logs
ECS Fargate Container workloads ALB, auto-scaling, health checks, rolling deploys
RDS Aurora Relational databases Encryption, read replicas, automated backups
Lambda API Serverless APIs API Gateway, DynamoDB, X-Ray, throttling
Static Site Frontend hosting S3, CloudFront, security headers, SPA support

Part of the Infrastructure as Code Patterns collection by Datanest Digital.
For support: hello@datanest.dev


This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Infrastructure As Code Patterns] with all files, templates, and documentation for $XX.

Get the Full Kit →

Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)