I Built a Free Terraform Drift Detector — Here's Why
If you manage Terraform infrastructure, you've probably experienced this: someone tweaks a security group in the AWS console "just for testing," forgets about it, and three months later your terraform apply blows up — or worse, that change silently creates a security hole that nobody catches.
This is infrastructure drift. And there's no good free tool to detect it properly.
So I built tfdrift — a free, open-source CLI that detects Terraform drift, classifies it by severity, and alerts your team.
The problem with "just run terraform plan"
Yes, terraform plan detects drift. But in practice, it falls short for teams managing real infrastructure:
No severity awareness. Someone changed a tag? Same alert as someone opening port 22 to the world. terraform plan treats all changes equally — but they're not. An IAM policy change is a security incident. A tag change is noise.
No multi-workspace scanning. If you have 20 Terraform workspaces across environments, you need to cd into each one and run plan manually. Nobody does this consistently.
No notifications. You only discover drift when you happen to run plan. By then, the damage may already be done.
No ignore rules. Auto-scaling groups constantly change desired_capacity. ECS services change desired_count. These are expected — but plan flags them every time, creating alert fatigue.
What tfdrift does
pip install tfdrift
tfdrift scan --path ./infrastructure
It recursively discovers all Terraform workspaces, runs terraform plan -json on each, parses the output, and gives you a structured report with severity classification:
- CRITICAL — Security group ingress/egress changes, IAM policy modifications, S3 public access changes
- HIGH — Instance type changes, RDS public accessibility, encryption config changes
- MEDIUM — Most other attribute changes
- LOW — Tags, descriptions
Here's what the output looks like:
⚠️ Drift detected: 7 resource(s) across 2/4 workspace(s)
🔴 critical: 2 🟠 high: 2 🟡 medium: 2 🔵 low: 1
📂 infrastructure/production (5 drifted, 3.2s)
┌──────────┬───────────────────────────────────┬────────┬─────────────────────┐
│ Severity │ Resource │ Action │ Changed attributes │
├──────────┼───────────────────────────────────┼────────┼─────────────────────┤
│ CRITICAL │ aws_security_group.api_sg │ update │ ingress │
│ CRITICAL │ aws_iam_role_policy.lambda_exec │ update │ policy │
│ HIGH │ aws_instance.web_server │ update │ instance_type, ami │
│ HIGH │ aws_rds_instance.primary │ update │ publicly_accessible │
│ LOW │ aws_s3_bucket.assets │ update │ tags │
└──────────┴───────────────────────────────────┴────────┴─────────────────────┘
Now you immediately know what matters.
I tested it against real AWS infrastructure
To validate the tool beyond unit tests, I set up a sample AWS environment with EC2 instances managed by Terraform. I ran tfdrift against it:
tfdrift scan --path ./dev-terraform/development/ec2 --verbose
Within 4 seconds, it detected a drifted EC2 instance — an aws_instance resource that had been modified outside of Terraform. The tool correctly classified it as HIGH severity because instance_type was among the changed attributes:
⚠️ Drift detected: 1 resource(s) across 1/4 workspace(s)
🟠 high: 1
📂 development/ec2/dev-machines (1 drifted, 4.2s)
┌──────────┬──────────────────────┬────────┬────────────────────────────────────┐
│ Severity │ Resource │ Action │ Changed attributes │
├──────────┼──────────────────────┼────────┼────────────────────────────────────┤
│ HIGH │ aws_instance.this[0] │ create │ instance_type, tags, monitoring, │
│ │ │ │ metadata_options, volume_tags │
└──────────┴──────────────────────┴────────┴────────────────────────────────────┘
That drift had been sitting there undetected. A simple terraform plan would have found it too — but nobody was running plan against that workspace regularly. With tfdrift's watch mode, this could have been caught automatically and sent to Slack within minutes.
Key features
Watch mode — continuous monitoring with Slack alerts:
tfdrift watch --interval 30m --slack-webhook https://hooks.slack.com/services/XXX
Auto-remediation — with safety guards that block destructive changes on critical resources:
tfdrift scan --auto-fix --confirm --env dev
Ignore rules — a .tfdriftignore file for expected drift:
aws_autoscaling_group.*.desired_capacity
aws_ecs_service.*.desired_count
*.tags.LastModified
CI/CD integration — exit codes designed for pipelines (0 = clean, 1 = drift, 2 = error, 3 = remediated), plus a ready-made GitHub Actions workflow.
JSON and Markdown output — for programmatic use or generating reports:
tfdrift scan --format json --output drift-report.json
tfdrift scan --format markdown --output drift-report.md
How it compares
| Feature | tfdrift | terraform plan | Terraform Enterprise |
|---|---|---|---|
| Actively maintained | ✅ | ✅ | ✅ |
| Severity classification | ✅ | ❌ | ❌ |
| Multi-workspace scan | ✅ | ❌ | ✅ |
| Auto-remediation | ✅ | ❌ | ✅ |
| Watch mode + alerts | ✅ | ❌ | ✅ |
| Ignore rules | ✅ | ❌ | ❌ |
| Cost | Free | Free | $15K+/yr |
The architecture
tfdrift is written in Python and built with a modular architecture:
tfdrift/
├── detectors/ # Drift detection engine (terraform plan parser)
├── reporters/ # Output formatters (JSON, Markdown, table, Slack)
├── remediators/ # Auto-fix logic with safety guards
├── severity.py # 30+ built-in AWS severity rules
├── config.py # .tfdrift.yml and .tfdriftignore parser
└── cli.py # CLI (Click)
The severity engine uses pattern matching to classify drift. For example, any change to aws_security_group.*.ingress is automatically classified as CRITICAL, while *.tags changes are classified as LOW. You can extend these rules in your .tfdrift.yml:
severity:
critical:
- aws_security_group.*.ingress
- aws_iam_policy.*.policy
high:
- aws_instance.*.instance_type
- aws_rds_instance.*.publicly_accessible
Why I built this
I'm a DevOps engineer working with AWS, Terraform, and Kubernetes daily. Infrastructure drift was a recurring problem — security-critical changes mixed in with harmless noise, and no good free tool to sort through it.
The existing options were either expensive (Terraform Enterprise at $15K+/year) or too basic (raw terraform plan in a cron job).
I wanted something that could:
- Tell me what drifted and how bad it is
- Scan all my workspaces in one command
- Send me a Slack alert at 2 AM when someone modifies a security group
- Let me ignore the noise (auto-scaling changes, tag updates)
- Run in CI/CD and fail the pipeline if critical drift is found
So I built it.
Try it
pip install tfdrift
# Scan your infrastructure
tfdrift scan --path ./your-terraform-dir
# Set up continuous monitoring
tfdrift watch --interval 1h --slack-webhook $SLACK_WEBHOOK
# Generate a config file
tfdrift init
The code is fully open source: github.com/sudarshan8417/tfdrift
I'd love feedback, bug reports, and contributions. If you've dealt with Terraform drift at your organization, I'd especially love to hear what features you'd want next — Azure/GCP support, a web dashboard, PagerDuty integration?
Drop a comment or open an issue on GitHub.
If you found this useful, consider giving the repo a ⭐ on GitHub — it helps other engineers find the tool.
Top comments (1)
This is a solid build.
What stands out is the shift from “detect drift” to “classify how risky it is.”
That’s already closer to control than most setups.
But it also shows the same pattern across systems.
Detection improves.
Response time improves.
The system still continues operating after drift is found.
So drift becomes visible, but not constrained.
That’s the gap between detection and governance.
Right now tfdrift tells you:
what changed
where it changed
how severe it is
The next layer is what happens when severity crosses a Decision Boundary.
Does Escalation trigger automatically
Does execution pause for critical conditions
Does Stop Authority exist for high-risk drift
Without that, the loop is still:
drift → detect → alert → human response
Which works, but only if someone acts fast enough.
Tying severity directly to enforcement conditions would turn this from visibility into control.