DEV Community

Yash
Yash

Posted on

The real reason your staging environment is always broken

The real reason your staging environment is always broken

"Don't trust staging" is engineering folklore. Teams treat it as inevitable. It's not — it's a symptom.

What actually causes staging drift

1. Manual setup at different times: Prod Q1, staging Q3. Different engineers, different patterns. Identical at the start. Diverging ever since.

2. Hotfix culture: Prod incident at 2 AM. IAM permission patched in the console. Terraform state doesn't know. Staging doesn't have the patch.

3. Cost pressure on non-prod: "We don't need the full setup in staging." Different ALB, smaller RDS, different security groups. "Close enough" becomes "completely different."

4. Nobody owns staging: Prod has on-call. Staging has... whoever notices it's broken.

The fix: same module, different vars

# prod
module "payment_service" {
  source = "../../modules/service"; environment = "prod"
  instance_count = 3; db_class = "db.r6g.large"
}

# staging — SAME MODULE
module "payment_service_staging" {
  source = "../../modules/service"; environment = "staging"
  instance_count = 1; db_class = "db.t3.medium"
}
Enter fullscreen mode Exit fullscreen mode

Same module → same IAM → same security groups → same monitoring → same deploy process.

The difference between prod and staging is scale, not structure.

The hotfix rule

Every manual change to prod needs a Terraform change in the same sprint. Non-negotiable.

Named ownership

Staging needs a named owner. Not "the team." A specific person.

Step2Dev provisions staging and prod from the same template — parity by default.

👉 step2dev.com

What's your most memorable "staging was broken and we didn't know" story?

Top comments (0)