DEV Community

Cover image for Infrastructure Drift: Detecting and Preventing It
Samson Tanimawo
Samson Tanimawo

Posted on

Infrastructure Drift: Detecting and Preventing It

Your Terraform says the firewall rule is A. The cloud console says it's B. Someone changed it manually and didn't tell anyone. This is infrastructure drift, and it's a reliability killer.

How drift happens

The emergency edit. At 2 AM, someone clicks a button in the console to fix an outage. They promise to 'put it in Terraform later.' They don't.

The helpful colleague. Someone from a different team edits a shared resource because it was blocking them. They didn't know it was Terraform-managed.

The console-first culture. Some teams never adopt IaC. They edit everything manually, and then one day someone runs Terraform plan and sees 200 unexpected changes.

The partial migration. You migrated some resources to Terraform but not others. Nobody documented which is which.

Why drift is dangerous

  1. The next Terraform apply will either revert the manual change (surprise!) or fail because of conflicts.
  2. The manual change often has no audit trail. 'Why is this firewall rule open?' No answer.
  3. Disaster recovery becomes a guessing game. You can redeploy from Terraform, but you'll lose the manual edits.
  4. Security audits become impossible because your declared state doesn't match real state.

Detection

Run terraform plan on a schedule. Daily or more. Alert on any unexpected diffs. The first time I set this up, we had 40 drift items in our main AWS account. Now we have zero because the alert forces cleanup within a day.

Cloud-native drift detection. AWS Config, GCP Asset Inventory. These track changes at the cloud level and can flag manual edits independently of IaC tools.

Git as source of truth. If your IaC is in Git and the cloud state differs, Git wins — either manually reconcile or explicitly accept the drift by updating the code.

Prevention

Remove manual edit permissions. This is the nuclear option and the most effective. If engineers can't edit resources manually, drift stops happening. Instead, they have to write Terraform (or equivalent) to make changes.

Yes, this slows down emergency responses. That's why you build fast-path deployment pipelines for IaC — commit, auto-approve, auto-apply within 5 minutes.

Write-only service accounts. Grant humans read-only in prod. Only CI/CD can write. Breakglass accounts exist for emergencies but require audit trail and review.

The cultural shift

The hardest part isn't the tools. It's the mindset shift from 'the console is the source of truth' to 'Git is the source of truth.' Teams used to clicking in the console will resist. Give them better tooling (fast apply pipelines, good Terraform examples) and the resistance fades.

You'll know you've made it when 'put it in Terraform first' becomes the default response to any infrastructure change request. Until then, drift will bite you.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)