DEV Community

S, Sanjay
S, Sanjay

Posted on

Terraform Modules: Building Reusable Infrastructure for Azure

Writing Terraform without modules is like writing code without functions — it works until it doesn't.

After building a reusable module library that provisions Azure infrastructure 90% faster across multiple teams, here's how to design modules that actually get adopted.


The Problem With Flat Terraform

# Every project starts like this...
main.tf          (2000+ lines)
variables.tf     (500+ lines)
outputs.tf       (200+ lines)

# Then you need a second environment...
# Copy-paste everything.
# Then a third...
# Now you have 3 copies with subtle drift.

# 6 months later:
"Why does staging have a different VM size than production?"
"Who changed the network config in dev?"
"This S3—I mean Storage Account—was supposed to have versioning."
Enter fullscreen mode Exit fullscreen mode

Module Architecture

The Three-Layer Pattern

Layer 1: ROOT MODULES (composition layer)
├── Combines child modules for a specific use case
├── Example: "web-app" = AKS + Redis + SQL + Key Vault
└── Teams consume this layer

Layer 2: CHILD MODULES (resource layer)
├── Provisions a single Azure resource type
├── Example: "aks-cluster", "sql-database", "key-vault"
└── Reusable building blocks

Layer 3: UTILITY MODULES (helper layer)
├── Naming conventions, tagging, diagnostics
├── Example: "naming", "tags", "diagnostic-settings"
└── Consistency enforcement
Enter fullscreen mode Exit fullscreen mode
terraform-modules/
├── modules/
│   ├── aks-cluster/          # Child module
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── versions.tf
│   │   └── README.md
│   ├── sql-database/         # Child module
│   ├── key-vault/            # Child module
│   ├── storage-account/      # Child module
│   ├── virtual-network/      # Child module
│   ├── container-registry/   # Child module
│   └── naming/               # Utility module
├── compositions/
│   ├── web-app/              # Root module
│   │   ├── main.tf           # Composes child modules
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── data-platform/        # Root module
├── examples/
│   ├── aks-basic/
│   └── aks-production/
└── tests/
    ├── aks_test.go
    └── sql_test.go
Enter fullscreen mode Exit fullscreen mode

Building a Production-Grade Module

Example: AKS Cluster Module

# modules/aks-cluster/versions.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.80.0"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
# modules/aks-cluster/variables.tf
variable "name" {
  description = "Name of the AKS cluster"
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9-]+$", var.name))
    error_message = "Cluster name must contain only lowercase letters, numbers, and hyphens."
  }
}

variable "resource_group_name" {
  description = "Name of the resource group"
  type        = string
}

variable "location" {
  description = "Azure region"
  type        = string
}

variable "kubernetes_version" {
  description = "Kubernetes version"
  type        = string
  default     = null  # Uses latest if not specified
}

variable "sku_tier" {
  description = "AKS SKU tier: Free or Standard"
  type        = string
  default     = "Standard"

  validation {
    condition     = contains(["Free", "Standard"], var.sku_tier)
    error_message = "SKU tier must be Free or Standard."
  }
}

variable "system_node_pool" {
  description = "System node pool configuration"
  type = object({
    vm_size    = optional(string, "Standard_D4s_v5")
    node_count = optional(number, 3)
    min_count  = optional(number, 3)
    max_count  = optional(number, 5)
    zones      = optional(list(number), [1, 2, 3])
  })
  default = {}
}

variable "app_node_pools" {
  description = "Map of application node pools"
  type = map(object({
    vm_size    = optional(string, "Standard_D8s_v5")
    min_count  = optional(number, 2)
    max_count  = optional(number, 10)
    zones      = optional(list(number), [1, 2, 3])
    node_labels = optional(map(string), {})
    node_taints = optional(list(string), [])
    priority   = optional(string, "Regular")
  }))
  default = {}
}

variable "network_config" {
  description = "Network configuration"
  type = object({
    plugin         = optional(string, "azure")
    policy         = optional(string, "calico")
    subnet_id      = string
    service_cidr   = optional(string, "10.0.0.0/16")
    dns_service_ip = optional(string, "10.0.0.10")
  })
}

variable "enable_azure_rbac" {
  description = "Enable Azure AD RBAC integration"
  type        = bool
  default     = true
}

variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  default     = {}
}
Enter fullscreen mode Exit fullscreen mode
# modules/aks-cluster/main.tf

# Fetch latest stable K8s version if not specified
data "azurerm_kubernetes_service_versions" "current" {
  location        = var.location
  include_preview = false
}

locals {
  kubernetes_version = coalesce(
    var.kubernetes_version,
    data.azurerm_kubernetes_service_versions.current.latest_version
  )

  default_tags = {
    "managed-by"  = "terraform"
    "module"      = "aks-cluster"
    "k8s-version" = local.kubernetes_version
  }

  tags = merge(local.default_tags, var.tags)
}

resource "azurerm_kubernetes_cluster" "this" {
  name                = var.name
  location            = var.location
  resource_group_name = var.resource_group_name
  dns_prefix          = var.name
  kubernetes_version  = local.kubernetes_version
  sku_tier            = var.sku_tier

  # System node pool
  default_node_pool {
    name                         = "system"
    vm_size                      = var.system_node_pool.vm_size
    node_count                   = var.system_node_pool.node_count
    min_count                    = var.system_node_pool.min_count
    max_count                    = var.system_node_pool.max_count
    auto_scaling_enabled         = true
    zones                        = var.system_node_pool.zones
    only_critical_addons_enabled = true
    temporary_name_for_rotation  = "systemtemp"
    os_disk_type                 = "Ephemeral"
    os_disk_size_gb              = 128
    vnet_subnet_id               = var.network_config.subnet_id

    node_labels = {
      "nodepool-type" = "system"
    }
  }

  # Network configuration
  network_profile {
    network_plugin    = var.network_config.plugin
    network_policy    = var.network_config.policy
    load_balancer_sku = "standard"
    service_cidr      = var.network_config.service_cidr
    dns_service_ip    = var.network_config.dns_service_ip
  }

  # Managed identity (no service principals)
  identity {
    type = "SystemAssigned"
  }

  # Azure AD RBAC
  dynamic "azure_active_directory_role_based_access_control" {
    for_each = var.enable_azure_rbac ? [1] : []
    content {
      azure_rbac_enabled = true
      managed            = true
    }
  }

  # Monitoring
  oms_agent {
    log_analytics_workspace_id = azurerm_log_analytics_workspace.aks.id
  }

  # Maintenance window
  maintenance_window_auto_upgrade {
    frequency   = "Weekly"
    interval    = 1
    day_of_week = "Sunday"
    start_time  = "02:00"
    duration    = 4
    utc_offset  = "+05:30"
  }

  tags = local.tags

  lifecycle {
    ignore_changes = [
      default_node_pool[0].node_count,  # Managed by autoscaler
      kubernetes_version,                # Managed by auto-upgrade
    ]
  }
}

# Application node pools
resource "azurerm_kubernetes_cluster_node_pool" "apps" {
  for_each = var.app_node_pools

  name                  = each.key
  kubernetes_cluster_id = azurerm_kubernetes_cluster.this.id
  vm_size               = each.value.vm_size
  min_count             = each.value.min_count
  max_count             = each.value.max_count
  auto_scaling_enabled  = true
  zones                 = each.value.zones
  os_disk_type          = "Ephemeral"
  os_disk_size_gb       = 128
  vnet_subnet_id        = var.network_config.subnet_id
  priority              = each.value.priority
  node_labels           = each.value.node_labels
  node_taints           = each.value.node_taints

  eviction_policy = each.value.priority == "Spot" ? "Delete" : null
  spot_max_price  = each.value.priority == "Spot" ? -1 : null

  tags = local.tags

  lifecycle {
    ignore_changes = [node_count]
  }
}

# Log Analytics workspace for monitoring
resource "azurerm_log_analytics_workspace" "aks" {
  name                = "${var.name}-logs"
  location            = var.location
  resource_group_name = var.resource_group_name
  sku                 = "PerGB2018"
  retention_in_days   = 30

  tags = local.tags
}
Enter fullscreen mode Exit fullscreen mode
# modules/aks-cluster/outputs.tf
output "cluster_id" {
  description = "AKS cluster resource ID"
  value       = azurerm_kubernetes_cluster.this.id
}

output "cluster_name" {
  description = "AKS cluster name"
  value       = azurerm_kubernetes_cluster.this.name
}

output "kube_config" {
  description = "Kubeconfig for cluster access"
  value       = azurerm_kubernetes_cluster.this.kube_config_raw
  sensitive   = true
}

output "kubelet_identity" {
  description = "Kubelet managed identity"
  value = {
    client_id = azurerm_kubernetes_cluster.this.kubelet_identity[0].client_id
    object_id = azurerm_kubernetes_cluster.this.kubelet_identity[0].object_id
  }
}

output "node_resource_group" {
  description = "Auto-generated resource group for nodes"
  value       = azurerm_kubernetes_cluster.this.node_resource_group
}

output "oidc_issuer_url" {
  description = "OIDC issuer URL for workload identity"
  value       = azurerm_kubernetes_cluster.this.oidc_issuer_url
}
Enter fullscreen mode Exit fullscreen mode

Module Consumption

Simple Usage

module "aks" {
  source = "git::https://github.com/myorg/terraform-modules.git//modules/aks-cluster?ref=v2.1.0"

  name                = "aks-prod-eastus"
  resource_group_name = azurerm_resource_group.main.name
  location            = "eastus"

  network_config = {
    subnet_id = module.vnet.subnet_ids["aks"]
  }

  app_node_pools = {
    apps = {
      vm_size   = "Standard_D8s_v5"
      min_count = 3
      max_count = 20
    }
    spot = {
      vm_size   = "Standard_D8s_v5"
      min_count = 0
      max_count = 10
      priority  = "Spot"
      node_taints = [
        "kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
      ]
    }
  }

  tags = {
    environment = "production"
    team        = "platform"
    cost-center = "engineering"
  }
}
Enter fullscreen mode Exit fullscreen mode

5 minutes to provision a production-grade AKS cluster. Without the module, this would be 200+ lines of hand-written Terraform.


State Management Strategy

Rule 1: NEVER use local state in production
Rule 2: One state file per environment per component
Rule 3: State locking is mandatory
Rule 4: Enable versioning for rollback
Enter fullscreen mode Exit fullscreen mode

Backend Configuration

# backend.tf
terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state"
    storage_account_name = "sttfstateproduction"
    container_name       = "tfstate"
    key                  = "production/aks/terraform.tfstate"
    use_oidc             = true  # Workload identity authentication
  }
}
Enter fullscreen mode Exit fullscreen mode

State File Layout

Storage Account: sttfstateproduction
└── Container: tfstate
    ├── production/
    │   ├── networking/terraform.tfstate
    │   ├── aks/terraform.tfstate
    │   ├── databases/terraform.tfstate
    │   └── monitoring/terraform.tfstate
    ├── staging/
    │   ├── networking/terraform.tfstate
    │   ├── aks/terraform.tfstate
    │   └── databases/terraform.tfstate
    └── dev/
        └── ... (same structure)
Enter fullscreen mode Exit fullscreen mode

Why Separate State Files?

❌ One giant state file:
├── terraform plan takes 10 minutes
├── Lock contention between teams
├── Blast radius = everything
└── One mistake = destroy all

✅ Scoped state files:
├── terraform plan takes 30 seconds
├── Teams work independently
├── Blast radius = one component
└── Mistakes are contained
Enter fullscreen mode Exit fullscreen mode

Module Versioning

Always pin module versions. Use Git tags:

# ✅ Pinned to specific version
module "aks" {
  source = "git::https://github.com/myorg/terraform-modules.git//modules/aks-cluster?ref=v2.1.0"
}

# ❌ Using main branch (will break randomly)
module "aks" {
  source = "git::https://github.com/myorg/terraform-modules.git//modules/aks-cluster?ref=main"
}
Enter fullscreen mode Exit fullscreen mode

Semantic Versioning for Modules

v1.0.0 → Initial release
v1.1.0 → New optional variable added (non-breaking)
v1.1.1 → Bug fix (non-breaking)
v2.0.0 → Variable renamed or removed (BREAKING)
Enter fullscreen mode Exit fullscreen mode

Release Process

# After merging to main
git tag -a v2.1.0 -m "feat: add spot node pool support"
git push origin v2.1.0
Enter fullscreen mode Exit fullscreen mode

The Composition Pattern

Combine child modules into higher-level abstractions:

# compositions/web-app/main.tf

# A "web app" composition = VNet + AKS + ACR + Key Vault + SQL

module "naming" {
  source      = "../../modules/naming"
  project     = var.project_name
  environment = var.environment
  region      = var.location
}

module "vnet" {
  source              = "../../modules/virtual-network"
  name                = module.naming.virtual_network
  resource_group_name = azurerm_resource_group.main.name
  location            = var.location
  address_space       = [var.vnet_cidr]

  subnets = {
    aks = {
      address_prefix = var.aks_subnet_cidr
    }
    database = {
      address_prefix                            = var.db_subnet_cidr
      private_endpoint_network_policies_enabled = true
    }
  }

  tags = var.tags
}

module "acr" {
  source              = "../../modules/container-registry"
  name                = module.naming.container_registry
  resource_group_name = azurerm_resource_group.main.name
  location            = var.location
  sku                 = "Premium"  # For geo-replication and private endpoints

  allowed_subnet_ids = [module.vnet.subnet_ids["aks"]]

  tags = var.tags
}

module "aks" {
  source              = "../../modules/aks-cluster"
  name                = module.naming.kubernetes_cluster
  resource_group_name = azurerm_resource_group.main.name
  location            = var.location

  network_config = {
    subnet_id = module.vnet.subnet_ids["aks"]
  }

  app_node_pools = var.node_pools

  tags = var.tags
}

module "key_vault" {
  source              = "../../modules/key-vault"
  name                = module.naming.key_vault
  resource_group_name = azurerm_resource_group.main.name
  location            = var.location

  # Grant AKS managed identity access
  access_policies = {
    aks = {
      object_id          = module.aks.kubelet_identity.object_id
      secret_permissions = ["Get", "List"]
    }
  }

  # Private endpoint
  subnet_id = module.vnet.subnet_ids["database"]

  tags = var.tags
}

# Grant AKS → ACR pull access
resource "azurerm_role_assignment" "aks_acr" {
  principal_id         = module.aks.kubelet_identity.object_id
  role_definition_name = "AcrPull"
  scope                = module.acr.registry_id
}
Enter fullscreen mode Exit fullscreen mode

Consuming the Composition

# A team provisions their entire stack in 30 lines:
module "web_app" {
  source = "git::https://github.com/myorg/terraform-modules.git//compositions/web-app?ref=v3.0.0"

  project_name = "order-platform"
  environment  = "production"
  location     = "eastus"

  vnet_cidr       = "10.1.0.0/16"
  aks_subnet_cidr = "10.1.0.0/20"
  db_subnet_cidr  = "10.1.16.0/24"

  node_pools = {
    apps = {
      vm_size   = "Standard_D8s_v5"
      min_count = 3
      max_count = 15
    }
  }

  tags = {
    team        = "order-platform"
    cost-center = "CC-1234"
  }
}
Enter fullscreen mode Exit fullscreen mode

From zero to full production infrastructure in 30 lines. VNet, AKS, ACR, Key Vault, RBAC — all wired together with best practices baked in.


Validation and Testing

Variable Validation

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "vm_size" {
  type = string
  validation {
    condition     = can(regex("^Standard_", var.vm_size))
    error_message = "VM size must be a Standard_ SKU."
  }
}
Enter fullscreen mode Exit fullscreen mode

Pre-commit Hooks

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.86.0
    hooks:
      - id: terraform_fmt        # Format check
      - id: terraform_validate   # Syntax validation
      - id: terraform_tflint     # Linting
      - id: terraform_docs       # Auto-generate README
      - id: terraform_tfsec      # Security scanning
Enter fullscreen mode Exit fullscreen mode

Automated Testing with Terratest

// tests/aks_test.go
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestAksModule(t *testing.T) {
    t.Parallel()

    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../examples/aks-basic",
        Vars: map[string]interface{}{
            "name":     "aks-test-ci",
            "location": "eastus",
        },
    })

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify outputs
    clusterName := terraform.Output(t, terraformOptions, "cluster_name")
    assert.Equal(t, "aks-test-ci", clusterName)

    nodeRG := terraform.Output(t, terraformOptions, "node_resource_group")
    assert.Contains(t, nodeRG, "aks-test-ci")
}
Enter fullscreen mode Exit fullscreen mode

CI/CD for Modules

# .github/workflows/module-ci.yml
name: Module CI

on:
  pull_request:
    paths:
      - 'modules/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        module: [aks-cluster, sql-database, key-vault, virtual-network]
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.7.0"

      - name: Terraform Format
        run: terraform fmt -check -recursive
        working-directory: modules/${{ matrix.module }}

      - name: Terraform Init
        run: terraform init -backend=false
        working-directory: modules/${{ matrix.module }}

      - name: Terraform Validate
        run: terraform validate
        working-directory: modules/${{ matrix.module }}

      - name: TFLint
        uses: terraform-linters/setup-tflint@v4
      - run: |
          tflint --init
          tflint --recursive
        working-directory: modules/${{ matrix.module }}

      - name: TFSec Security Scan
        uses: aquasecurity/tfsec-action@v1.0.3
        with:
          working_directory: modules/${{ matrix.module }}
Enter fullscreen mode Exit fullscreen mode

The Module Adoption Checklist

Before releasing a module to your org:

  • [ ] README.md with examples (terraform-docs auto-generates this)
  • [ ] Input validation on all variables
  • [ ] Sensible defaults (works out of the box)
  • [ ] Outputs for everything downstream modules need
  • [ ] Version pinning for providers
  • [ ] Lifecycle rules (ignore autoscaler changes, etc.)
  • [ ] Tags applied consistently to all resources
  • [ ] Examples in an /examples directory
  • [ ] Tests (at minimum, terraform validate)
  • [ ] Security scan passing (tfsec/checkov)
  • [ ] Semantic version tag on release
  • [ ] CHANGELOG.md documenting breaking changes

Build the module library once. Let every team provision infrastructure in minutes instead of weeks. That's the power of reusable Terraform.


Building Terraform modules? Share your patterns in the comments. Follow for more infrastructure-as-code content.

Top comments (0)