How We Stopped Infrastructure Drift Between Environments — One Module, One Pipeline, No Exceptions

#terraform #devops #infrastructure #gitlab

Every new client engagement starts the same way.

I open the infrastructure repo and find three separate sets of Terraform config — one for dev, one for staging, one for prod. Sometimes they started as copies of each other. By the time I show up, they've diverged completely. Nobody can tell me when or why.

After 14 years of seeing this pattern, I stopped trying to fix it with discipline. Discipline doesn't scale. The fix is removing the conditions that allow drift in the first place.

Here's the pattern I landed on.

The core idea: one module, one workspace convention

A Terraform workspace is just a named state file. Same configuration, three environments, zero separate codebases to keep in sync.

terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

Environment-specific differences — instance sizes, replica counts, feature flags — live in .tfvars files, not scattered through the module itself. The only legitimate differences between environments are the ones I explicitly write down.

# environments/prod.tfvars
instance_type    = "t3.medium"
db_replica_count = 2
min_capacity     = 3

# environments/dev.tfvars
instance_type    = "t3.micro"
db_replica_count = 0
min_capacity     = 1

The pipeline enforces promotion order

Changes flow in sequence: dev → staging → prod. If something breaks in dev, it never reaches prod.

Nobody on my team runs terraform apply locally. Everything goes through GitLab CI/CD with a manual gate before every apply.

# .gitlab-ci.yml (simplified)
stages:
  - plan
  - apply-dev
  - apply-staging
  - apply-prod

apply-prod:
  stage: apply-prod
  script:
    - terraform workspace select prod
    - terraform apply -var-file=environments/prod.tfvars -auto-approve
  when: manual
  needs: ["apply-staging"]

The when: manual + needs combination means prod can only be triggered after staging succeeds, and only by a human explicitly approving it.

A few things that made this actually work

Tag every resource with the workspace name. Costs me nothing. Saves hours when debugging cost anomalies or access issues later.

locals {
  common_tags = {
    Environment = terraform.workspace
    ManagedBy   = "terraform"
    Team        = var.team_name
  }
}

Keep .tfvars files in the repo, reviewed like code. Environment differences are visible, diffable, and part of the change history. No more mystery configs.

State backend per workspace, not per environment folder. One S3 bucket, workspace-namespaced keys. Clean, simple, auditable.

terraform {
  backend "s3" {
    bucket = "your-terraform-state"
    key    = "app/terraform.tfstate"
    region = "us-east-1"
  }
}

The result

When a change is approved in dev, I know exactly what's going to happen in prod. Not approximately — exactly. The only question is whether the variable values are right, and those are in the PR diff.

Full writeup on the NextLink blog: https://nextlinklabs.com/resources/insights/using-terraform-workspaces-to-keep-infrastructure-consistent-across-environments

If you want more of this — practical DevOps, IaC patterns, AI-assisted engineering — I publish it monthly in the NextLink newsletter: https://nextlinklabs.com/newsletter

Happy to answer questions in the comments — always curious what patterns others are using for environment consistency.