DEV Community

Cover image for I Built tfdrift Free Terraform Drift Detection With Severity Alerts
Sudarshan Thakur
Sudarshan Thakur

Posted on

I Built tfdrift Free Terraform Drift Detection With Severity Alerts

I Built a Free Terraform Drift Detector — Here's Why

If you manage Terraform infrastructure, you've probably experienced this: someone tweaks a security group in the AWS console "just for testing," forgets about it, and three months later your terraform apply blows up — or worse, that change silently creates a security hole that nobody catches.

This is infrastructure drift. And there's no good free tool to detect it properly.

So I built tfdrift — a free, open-source CLI that detects Terraform drift, classifies it by severity, and alerts your team.


The problem with "just run terraform plan"

Yes, terraform plan detects drift. But in practice, it falls short for teams managing real infrastructure:

No severity awareness. Someone changed a tag? Same alert as someone opening port 22 to the world. terraform plan treats all changes equally — but they're not. An IAM policy change is a security incident. A tag change is noise.

No multi-workspace scanning. If you have 20 Terraform workspaces across environments, you need to cd into each one and run plan manually. Nobody does this consistently.

No notifications. You only discover drift when you happen to run plan. By then, the damage may already be done.

No ignore rules. Auto-scaling groups constantly change desired_capacity. ECS services change desired_count. These are expected — but plan flags them every time, creating alert fatigue.


What tfdrift does

pip install tfdrift
tfdrift scan --path ./infrastructure
Enter fullscreen mode Exit fullscreen mode

It recursively discovers all Terraform workspaces, runs terraform plan -json on each, parses the output, and gives you a structured report with severity classification:

  • CRITICAL — Security group ingress/egress changes, IAM policy modifications, S3 public access changes
  • HIGH — Instance type changes, RDS public accessibility, encryption config changes
  • MEDIUM — Most other attribute changes
  • LOW — Tags, descriptions

Here's what the output looks like:

⚠️  Drift detected: 7 resource(s) across 2/4 workspace(s)

🔴 critical: 2  🟠 high: 2  🟡 medium: 2  🔵 low: 1

📂 infrastructure/production (5 drifted, 3.2s)
┌──────────┬───────────────────────────────────┬────────┬─────────────────────┐
│ Severity │ Resource                          │ Action │ Changed attributes  │
├──────────┼───────────────────────────────────┼────────┼─────────────────────┤
│ CRITICAL │ aws_security_group.api_sg         │ update │ ingress             │
│ CRITICAL │ aws_iam_role_policy.lambda_exec   │ update │ policy              │
│ HIGH     │ aws_instance.web_server           │ update │ instance_type, ami  │
│ HIGH     │ aws_rds_instance.primary          │ update │ publicly_accessible │
│ LOW      │ aws_s3_bucket.assets              │ update │ tags                │
└──────────┴───────────────────────────────────┴────────┴─────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Now you immediately know what matters.


I tested it against real AWS infrastructure

To validate the tool beyond unit tests, I set up a sample AWS environment with EC2 instances managed by Terraform. I ran tfdrift against it:

tfdrift scan --path ./dev-terraform/development/ec2 --verbose
Enter fullscreen mode Exit fullscreen mode

Within 4 seconds, it detected a drifted EC2 instance — an aws_instance resource that had been modified outside of Terraform. The tool correctly classified it as HIGH severity because instance_type was among the changed attributes:

⚠️  Drift detected: 1 resource(s) across 1/4 workspace(s)

🟠 high: 1

📂 development/ec2/dev-machines (1 drifted, 4.2s)
┌──────────┬──────────────────────┬────────┬────────────────────────────────────┐
│ Severity │ Resource             │ Action │ Changed attributes                 │
├──────────┼──────────────────────┼────────┼────────────────────────────────────┤
│ HIGH     │ aws_instance.this[0] │ create │ instance_type, tags, monitoring,  │
│          │                      │        │ metadata_options, volume_tags      │
└──────────┴──────────────────────┴────────┴────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

That drift had been sitting there undetected. A simple terraform plan would have found it too — but nobody was running plan against that workspace regularly. With tfdrift's watch mode, this could have been caught automatically and sent to Slack within minutes.


Key features

Watch mode — continuous monitoring with Slack alerts:

tfdrift watch --interval 30m --slack-webhook https://hooks.slack.com/services/XXX
Enter fullscreen mode Exit fullscreen mode

Auto-remediation — with safety guards that block destructive changes on critical resources:

tfdrift scan --auto-fix --confirm --env dev
Enter fullscreen mode Exit fullscreen mode

Ignore rules — a .tfdriftignore file for expected drift:

aws_autoscaling_group.*.desired_capacity
aws_ecs_service.*.desired_count
*.tags.LastModified
Enter fullscreen mode Exit fullscreen mode

CI/CD integration — exit codes designed for pipelines (0 = clean, 1 = drift, 2 = error, 3 = remediated), plus a ready-made GitHub Actions workflow.

JSON and Markdown output — for programmatic use or generating reports:

tfdrift scan --format json --output drift-report.json
tfdrift scan --format markdown --output drift-report.md
Enter fullscreen mode Exit fullscreen mode

How it compares

Feature tfdrift terraform plan Terraform Enterprise
Actively maintained
Severity classification
Multi-workspace scan
Auto-remediation
Watch mode + alerts
Ignore rules
Cost Free Free $15K+/yr

The architecture

tfdrift is written in Python and built with a modular architecture:

tfdrift/
├── detectors/       # Drift detection engine (terraform plan parser)
├── reporters/       # Output formatters (JSON, Markdown, table, Slack)
├── remediators/     # Auto-fix logic with safety guards
├── severity.py      # 30+ built-in AWS severity rules
├── config.py        # .tfdrift.yml and .tfdriftignore parser
└── cli.py           # CLI (Click)
Enter fullscreen mode Exit fullscreen mode

The severity engine uses pattern matching to classify drift. For example, any change to aws_security_group.*.ingress is automatically classified as CRITICAL, while *.tags changes are classified as LOW. You can extend these rules in your .tfdrift.yml:

severity:
  critical:
    - aws_security_group.*.ingress
    - aws_iam_policy.*.policy
  high:
    - aws_instance.*.instance_type
    - aws_rds_instance.*.publicly_accessible
Enter fullscreen mode Exit fullscreen mode

Why I built this

I'm a DevOps engineer working with AWS, Terraform, and Kubernetes daily. Infrastructure drift was a recurring problem — security-critical changes mixed in with harmless noise, and no good free tool to sort through it.

The existing options were either expensive (Terraform Enterprise at $15K+/year) or too basic (raw terraform plan in a cron job).

I wanted something that could:

  • Tell me what drifted and how bad it is
  • Scan all my workspaces in one command
  • Send me a Slack alert at 2 AM when someone modifies a security group
  • Let me ignore the noise (auto-scaling changes, tag updates)
  • Run in CI/CD and fail the pipeline if critical drift is found

So I built it.


Try it

pip install tfdrift

# Scan your infrastructure
tfdrift scan --path ./your-terraform-dir

# Set up continuous monitoring
tfdrift watch --interval 1h --slack-webhook $SLACK_WEBHOOK

# Generate a config file
tfdrift init
Enter fullscreen mode Exit fullscreen mode

The code is fully open source: github.com/sudarshan8417/tfdrift

I'd love feedback, bug reports, and contributions. If you've dealt with Terraform drift at your organization, I'd especially love to hear what features you'd want next — Azure/GCP support, a web dashboard, PagerDuty integration?

Drop a comment or open an issue on GitHub.


If you found this useful, consider giving the repo a ⭐ on GitHub — it helps other engineers find the tool.

Top comments (1)

Collapse
 
hollowhouse profile image
Hollow House Institute

This is a solid build.

What stands out is the shift from “detect drift” to “classify how risky it is.”

That’s already closer to control than most setups.

But it also shows the same pattern across systems.

Detection improves.
Response time improves.

The system still continues operating after drift is found.

So drift becomes visible, but not constrained.

That’s the gap between detection and governance.

Right now tfdrift tells you:
what changed
where it changed
how severe it is

The next layer is what happens when severity crosses a Decision Boundary.

Does Escalation trigger automatically
Does execution pause for critical conditions
Does Stop Authority exist for high-risk drift

Without that, the loop is still:

drift → detect → alert → human response

Which works, but only if someone acts fast enough.

Tying severity directly to enforcement conditions would turn this from visibility into control.