The real reason your staging environment is always broken

#devops #aws #terraform #stagingenvironment

The real reason your staging environment is always broken

"Don't trust staging" is engineering folklore. Teams treat it as inevitable. It's not — it's a symptom.

What actually causes staging drift

1. Manual setup at different times: Prod Q1, staging Q3. Different engineers, different patterns. Identical at the start. Diverging ever since.

2. Hotfix culture: Prod incident at 2 AM. IAM permission patched in the console. Terraform state doesn't know. Staging doesn't have the patch.

3. Cost pressure on non-prod: "We don't need the full setup in staging." Different ALB, smaller RDS, different security groups. "Close enough" becomes "completely different."

4. Nobody owns staging: Prod has on-call. Staging has... whoever notices it's broken.

The fix: same module, different vars

# prod
module "payment_service" {
  source = "../../modules/service"; environment = "prod"
  instance_count = 3; db_class = "db.r6g.large"
}

# staging — SAME MODULE
module "payment_service_staging" {
  source = "../../modules/service"; environment = "staging"
  instance_count = 1; db_class = "db.t3.medium"
}