DatanestDigital

Posted on Mar 23 • Edited on Jun 17 • Originally published at datanest-stores.pages.dev

Infrastructure as Code Patterns Guide

#devops #docker #kubernetes #cicd

Infrastructure as Code Patterns Guide

Best practices for composable, maintainable, and scalable Terraform infrastructure.

Datanest Digital — datanest.dev

Pattern Philosophy
Composition vs Inheritance
Module Design Principles
State Management
Environment Strategy
Drift Detection & Remediation
Cost Estimation & Optimization
Security Considerations
CI/CD Integration
Troubleshooting

Pattern Philosophy

These patterns follow three core principles:

Composable: Each pattern is a self-contained unit that can be used independently or combined with others. A VPC pattern doesn't assume you'll use ECS — it outputs everything downstream patterns need.
Environment-aware: Every pattern accepts an environment variable and adjusts defaults accordingly. Dev gets cost-optimized settings; prod gets HA and monitoring.
Opinionated defaults, flexible overrides: Patterns ship with production-ready defaults but expose variables for every meaningful configuration point.

┌─────────────────────────────────────────────────┐
│                  Terragrunt Root                 │
│            (state, provider, common tags)        │
├──────────────┬──────────────┬───────────────────┤
│     Dev      │   Staging    │      Prod         │
│  (2 AZ, min) │  (2 AZ, mid)│  (3 AZ, HA, mon) │
├──────────────┴──────────────┴───────────────────┤
│                Pattern Library                   │
│  ┌─────┐ ┌──────────┐ ┌─────┐ ┌──────┐ ┌─────┐│
│  │ VPC │ │ECS Farg. │ │ RDS │ │Lambda│ │Site ││
│  └──┬──┘ └────┬─────┘ └──┬──┘ └──┬───┘ └──┬──┘│
│     └──────────┴──────────┴───────┘        │   │
│                Shared Modules               │   │
│  ┌──────┐                                   │   │
│  │ Tags │◄──────────────────────────────────┘   │
│  └──────┘                                       │
└─────────────────────────────────────────────────┘

Composition vs Inheritance

The Problem with Monolithic Modules

A single module that provisions "everything" becomes unmaintainable:

# DON'T: Monolithic "platform" module
module "platform" {
  source = "./modules/platform"
  # 200+ variables covering VPC, ECS, RDS, Lambda...
}

Composition Pattern (Recommended)

Break infrastructure into independent patterns connected by outputs:

# DO: Compose independent patterns
module "network" {
  source      = "../patterns/vpc-three-tier"
  environment = var.environment
  vpc_cidr    = "10.0.0.0/16"
}

module "database" {
  source      = "../patterns/rds-aurora"
  vpc_id      = module.network.vpc_id          # ← composed
  environment = var.environment
}

module "app" {
  source      = "../patterns/ecs-fargate-service"
  vpc_id      = module.network.vpc_id          # ← composed
  subnet_ids  = module.network.private_subnet_ids
  db_endpoint = module.database.cluster_endpoint
}

When to Use a Shared Module

Create a shared module when:

The same logic appears in 3+ patterns (e.g., tagging)
The logic is purely computational (no resources, just data transformation)
It enforces organizational policy

# Good shared module: consistent tagging
module "tags" {
  source      = "../../modules/tags"
  project     = var.project
  environment = var.environment
  team        = "platform"
}

Module Design Principles

1. Explicit Inputs, No Hidden Dependencies

Every external dependency should be a variable, never a hard-coded data source lookup inside the module:

# DON'T: Hidden dependency
data "aws_vpc" "main" {
  tags = { Name = "main" }  # Assumes a VPC named "main" exists
}

# DO: Explicit input
variable "vpc_id" {
  description = "VPC ID where resources will be deployed"
  type        = string
}

2. Validate Early

Use variable validation blocks to catch misconfigurations before plan:

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

3. Conditional Resources

Use count or for_each for optional features, not separate modules:

# Optional read replicas
resource "aws_rds_cluster_instance" "readers" {
  count = var.reader_count  # 0 in dev, 2 in prod
  ...
}

4. Output Everything Downstream Needs

Outputs are your module's API. Be generous — it's easier to ignore an output than to add one later:

output "cluster_endpoint" { value = aws_rds_cluster.main.endpoint }
output "cluster_arn"      { value = aws_rds_cluster.main.arn }
output "security_group_id" { value = aws_security_group.db.id }
output "kms_key_arn"       { value = aws_kms_key.db.arn }

State Management

State File Organization

One state file per environment per pattern:

s3://my-terraform-state/
├── dev/
│   ├── vpc/terraform.tfstate
│   ├── ecs/terraform.tfstate
│   └── rds/terraform.tfstate
├── staging/
│   └── ...
└── prod/
    └── ...

State Locking

Always use DynamoDB locking. The Terragrunt root config handles this automatically, but for standalone Terraform:

backend "s3" {
  bucket         = "my-terraform-state"
  key            = "prod/vpc/terraform.tfstate"
  region         = "us-east-1"
  encrypt        = true
  dynamodb_table = "terraform-locks"
}

Cross-State References

Use terraform_remote_state or SSM Parameter Store for cross-pattern data:

# Option A: Remote state (tightly coupled)
data "terraform_remote_state" "vpc" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key    = "${var.environment}/vpc/terraform.tfstate"
    region = "us-east-1"
  }
}

# Option B: SSM Parameter Store (loosely coupled, preferred)
data "aws_ssm_parameter" "vpc_id" {
  name = "/${var.project}/${var.environment}/vpc/id"
}

State Operations Safety

Use the tf-wrapper.sh script for all operations — it enforces:

Lock timeout to prevent deadlocks
Plan-before-apply workflow
Production confirmation prompts
Automatic plan archival

Environment Strategy

Dev Environment

Goal: Fast iteration, low cost
2 AZs, single NAT gateway
Serverless v2 for databases (scales near-zero)
Minimal monitoring, short log retention
deletion_protection = false

Staging Environment

Goal: Production-like validation
Same architecture as prod but smaller instances
Full monitoring enabled
Mirrors prod security settings

Prod Environment

Goal: High availability, full observability
3 AZs, NAT per AZ
Provisioned database instances with read replicas
Enhanced monitoring, Performance Insights, X-Ray
deletion_protection = true
35-day backup retention

Promoting Between Environments

# Plan against staging with prod-like settings
./scripts/tf-wrapper.sh plan -e staging

# Review the plan carefully
less .logs/plan-staging.log

# Apply to staging
./scripts/tf-wrapper.sh apply -e staging

# After validation, plan and apply to prod
./scripts/tf-wrapper.sh plan -e prod
./scripts/tf-wrapper.sh apply -e prod

Drift Detection & Remediation

Drift occurs when real infrastructure diverges from Terraform state. Common causes:

Manual console changes
Auto-scaling events modifying desired counts
AWS service updates changing defaults

Detecting Drift

# Check for drift
./scripts/tf-wrapper.sh drift -e prod

# Terraform will show a plan with unexpected changes
# Exit code 2 = drift detected

Remediation Strategies

Accept drift: Import the change into state

   terraform import aws_instance.web i-1234567890abcdef0

Reject drift: Apply to revert to desired state

   ./scripts/tf-wrapper.sh plan -e prod
   ./scripts/tf-wrapper.sh apply -e prod

Prevent drift: Use lifecycle { ignore_changes } for expected drift

   lifecycle {
     ignore_changes = [desired_count]  # Managed by auto-scaling
   }

Scheduled Drift Detection

Run drift detection in CI/CD on a schedule:

# .github/workflows/drift-detection.yml
on:
  schedule:
    - cron: '0 8 * * 1-5'  # Weekdays at 8 AM UTC

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: |
          ./scripts/tf-wrapper.sh drift -e prod
          if [ $? -eq 2 ]; then
            # Send Slack notification about drift
            curl -X POST "$SLACK_WEBHOOK" \
              -d '{"text":"⚠️ Infrastructure drift detected in prod"}'
          fi

Cost Estimation & Optimization

Using Infracost

# Estimate costs before applying
./scripts/tf-wrapper.sh cost -e prod

# Compare costs between branches
infracost diff \
  --path . \
  --compare-to infracost-base.json

Cost Optimization Tips

Resource	Dev Optimization	Prod Optimization
RDS	Serverless v2 (min 0.5 ACU)	Right-size with Performance Insights data
NAT Gateway	Single NAT ($32/mo saved)	Keep multi-AZ for HA
ECS	0.25 vCPU / 512 MB	Auto-scale based on actual metrics
CloudFront	PriceClass_100	PriceClass_200 (covers most users)
S3	No versioning	Lifecycle rules for old versions

Tagging for Cost Allocation

The tags module ensures every resource has Project, Environment, Team, and CostCenter tags, enabling AWS Cost Explorer filtering.

Security Considerations

Secrets Management

Never store secrets in .tfvars files or state
Use random_password + Secrets Manager (as in the RDS pattern)
Reference secrets by ARN, not value

Encryption

All patterns encrypt data at rest by default:

RDS: KMS with automatic key rotation
S3: AES-256 with bucket keys
DynamoDB: AWS managed encryption

Network Security

All databases in private subnets (no public access)
Security group rules reference other SGs, not CIDRs where possible
SSL/TLS enforced on all database connections
CloudFront → S3 via Origin Access Control (not public buckets)

IAM

Lambda roles follow least-privilege (only the specific table, not dynamodb:*)
No * resource ARNs in production policies
IAM database authentication enabled for Aurora

CI/CD Integration

GitHub Actions Workflow

name: Terraform

on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Init
        run: terraform init

      - name: Validate
        run: terraform validate

      - name: Plan
        run: terraform plan -out=tfplan
        env:
          TF_VAR_environment: ${{ github.base_ref == 'main' && 'prod' || 'dev' }}

      - name: Comment PR
        uses: actions/github-script@v7
        if: github.event_name == 'pull_request'
        with:
          script: |
            // Post plan output as PR comment

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval
    steps:
      - run: terraform apply tfplan

Pre-commit Hooks

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.88.0
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_tflint
      - id: terraform_docs
      - id: terraform_checkov
        args: ['--args=--quiet']

Troubleshooting

Common Issues

State lock stuck

# List locks
aws dynamodb scan --table-name terraform-locks

# Force unlock (use with caution)
terraform force-unlock LOCK_ID

Provider version conflicts

# Upgrade providers
terraform init -upgrade

# Pin to specific version in required_providers

Resource already exists

# Import existing resource into state
terraform import aws_s3_bucket.site my-bucket-name

Cycle dependencies

Break cycles by using depends_on explicitly
Or split into separate state files with remote state references

Useful Commands

# Show current state
terraform state list

# Show specific resource
terraform state show aws_rds_cluster.main

# Move resource in state (rename without destroy)
terraform state mv aws_instance.old aws_instance.new

# Remove from state without destroying
terraform state rm aws_instance.imported

# Refresh state from real infrastructure
terraform refresh

Pattern Quick Reference

Pattern	Use Case	Key Features
VPC Three-Tier	Network foundation	Public/private/database subnets, NAT, flow logs
ECS Fargate	Container workloads	ALB, auto-scaling, health checks, rolling deploys
RDS Aurora	Relational databases	Encryption, read replicas, automated backups
Lambda API	Serverless APIs	API Gateway, DynamoDB, X-Ray, throttling
Static Site	Frontend hosting	S3, CloudFront, security headers, SPA support

Part of the Infrastructure as Code Patterns collection by Datanest Digital.
For support: hello@datanest.dev

This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Infrastructure As Code Patterns] with all files, templates, and documentation for $49.

Get the Full Kit →

Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.

Get the Complete Bundle →

Infrastructure as Code Patterns Guide

Table of Contents

Pattern Philosophy

Composition vs Inheritance

The Problem with Monolithic Modules

Composition Pattern (Recommended)

When to Use a Shared Module

Module Design Principles

1. Explicit Inputs, No Hidden Dependencies

2. Validate Early

3. Conditional Resources

4. Output Everything Downstream Needs

State Management

State File Organization

State Locking

Cross-State References

State Operations Safety

Environment Strategy

Dev Environment

Staging Environment

Prod Environment

Promoting Between Environments

Drift Detection & Remediation

Detecting Drift

Remediation Strategies

Scheduled Drift Detection

Cost Estimation & Optimization

Using Infracost

Cost Optimization Tips

Tagging for Cost Allocation

Security Considerations

Secrets Management

Encryption

Network Security

IAM

CI/CD Integration

GitHub Actions Workflow

Pre-commit Hooks

Troubleshooting

Common Issues

Useful Commands

Pattern Quick Reference

Related Articles