Most teams discover Terraform drift the hard way — someone runs terraform plan before a deploy and gets a screen full of unexpected changes. By then the drift might have been sitting there for weeks. Maybe longer.
What if you could catch it automatically? Run a scan every few hours, get a Slack message only when something important drifts, and ignore the noise?
That's what this tutorial sets up. By the end, you'll have:
- A GitHub Actions workflow that scans your Terraform infrastructure on a schedule
- Slack alerts that only fire for High and Critical severity drift
- A JSON report saved as an artifact for audit trails
- The whole thing running hands-free, zero maintenance
I'm using tfdrift for this — it's an open-source CLI I built that classifies drift by severity. But the GitHub Actions + Slack pattern works with any drift detection approach.
Prerequisites
Before we start, you'll need:
- A GitHub repo with your Terraform code
- AWS credentials (or whatever cloud provider you use)
- A Slack workspace where you can create a webhook
- Python 3.9+ (tfdrift is a Python CLI)
- About 20 minutes
Step 1: Create a Slack webhook
First, let's set up where the alerts will go.
- Go to api.slack.com/apps
- Click "Create New App" → "From scratch"
- Name it something like "Drift Alerts" and pick your workspace
- Go to "Incoming Webhooks" in the sidebar → toggle it On
- Click "Add New Webhook to Workspace"
- Pick the channel you want alerts in (I use
#infra-alerts) - Copy the webhook URL — it looks like
https://hooks.slack.com/services/T00000/B00000/XXXX
Save that URL. We'll need it in a minute.
Step 2: Add secrets to your GitHub repo
You need to store your cloud credentials and Slack webhook as GitHub secrets so they're not exposed in your workflow file.
Go to your repo → Settings → Secrets and variables → Actions → New repository secret
Add these:
AWS_ACCESS_KEY_ID → your AWS access key
AWS_SECRET_ACCESS_KEY → your AWS secret key
AWS_DEFAULT_REGION → us-east-1 (or whatever region you use)
SLACK_WEBHOOK_URL → the webhook URL from Step 1
If you're using AWS SSO or assuming roles, you'll need AWS_ROLE_ARN too — but the basic access key approach works for getting started.
Step 3: Create the GitHub Actions workflow
Create this file in your repo at .github/workflows/drift-check.yml:
name: Terraform Drift Check
on:
schedule:
# Run every 6 hours
- cron: '0 */6 * * *'
workflow_dispatch:
# Allow manual triggering from the Actions tab
jobs:
drift-check:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
terraform_wrapper: false
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install tfdrift
run: pip install tfdrift
- name: Run drift scan
id: drift_scan
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
run: |
tfdrift scan \
--path ./infrastructure \
--format json \
--output drift-report.json \
--slack-webhook ${{ secrets.SLACK_WEBHOOK_URL }} \
--quiet
continue-on-error: true
- name: Upload drift report
if: always()
uses: actions/upload-artifact@v4
with:
name: drift-report-${{ github.run_number }}
path: drift-report.json
retention-days: 30
- name: Fail on drift
if: steps.drift_scan.outcome == 'failure'
run: |
echo "::warning::Terraform drift detected! Check the drift report artifact."
exit 1
Let me walk through what each part does.
The schedule trigger: cron: '0 */6 * * *' runs the scan every 6 hours — at midnight, 6am, noon, and 6pm UTC. You can adjust this. Every hour is '0 * * * *', once a day at midnight is '0 0 * * *'. I find every 6 hours is a good balance between catching drift quickly and not burning through GitHub Actions minutes.
workflow_dispatch: This lets you trigger the scan manually from the Actions tab. Useful for testing or when you want an immediate check.
terraform_wrapper: false: This is important. The hashicorp/setup-terraform action wraps the terraform binary by default, which breaks JSON output parsing. Setting this to false gives you the raw terraform binary.
continue-on-error: true on the scan step: tfdrift exits with code 1 when drift is detected. Without continue-on-error, the workflow would stop and skip the artifact upload. We want to always save the report, then handle the failure separately in the last step.
The --quiet flag: Suppresses terminal output since we're running in CI. The JSON report captures everything we need.
Artifact upload with if: always(): Saves the JSON report regardless of whether drift was found. This gives you an audit trail — you can go back and see what your infrastructure looked like at any point in the last 30 days.
Step 4: Configure severity filtering
Right now the workflow scans everything and alerts on all drift. To filter out the noise, create a .tfdrift.yml in your repo root:
# .tfdrift.yml
scan:
paths:
- ./infrastructure
exclude:
- "**/.terraform/**"
- "**/test/**"
- "**/modules/**"
notifications:
slack:
webhook_url: ${SLACK_WEBHOOK_URL}
channel: "#infra-alerts"
min_severity: high # Only alert on High and Critical
severity:
critical:
- aws_security_group.*.ingress
- aws_security_group.*.egress
- aws_iam_policy.*.policy
- aws_iam_role.*.assume_role_policy
- aws_s3_bucket_public_access_block.*
high:
- aws_instance.*.instance_type
- aws_rds_instance.*.publicly_accessible
- aws_rds_instance.*.storage_encrypted
The key setting is min_severity: high. This means:
- Critical drift (security groups, IAM policies) → Slack alert immediately
- High drift (instance types, encryption) → Slack alert immediately
- Medium drift → logged in the JSON report but no alert
- Low drift (tags) → logged in the JSON report but no alert
This is what took our alert volume from 100% down to 27% while still catching 94% of the changes that actually mattered.
Step 5: Add an ignore file
Create .tfdriftignore in your repo root for drift you never want to see:
# Autoscaling — these change every few minutes by design
aws_autoscaling_group.*.desired_capacity
aws_ecs_service.*.desired_count
# Tags managed by external cost allocation tools
*.tags.CostCenter
*.tags.LastModified
*.tags.UpdatedBy
*.tags.ManagedBy
# Terraform internal metadata
*.tags.terraform
*.tags_all.*
Without this file, you'll get constant alerts about autoscaling changes. Those aren't drift — that's the system working correctly.
Step 6: Test it
Push everything and trigger a manual run:
- Commit and push
.github/workflows/drift-check.yml,.tfdrift.yml, and.tfdriftignore - Go to your repo → Actions tab → Terraform Drift Check → Run workflow
- Watch it run
If there's drift, you'll see:
- A Slack message in
#infra-alertswith the severity breakdown - A JSON artifact attached to the workflow run
- The workflow marked as failed (exit code 1)
If there's no drift:
- No Slack message (nothing to alert about)
- A JSON artifact with an empty drift report
- The workflow marked as passed (exit code 0)
What the Slack alert looks like
When tfdrift finds High or Critical drift, it sends a message like:
⚠️ Terraform Drift Detected — 3 resource(s)
Workspaces scanned: 12
With drift: 2
Severity: critical: 1 | high: 2
🔴 CRITICAL — aws_security_group.api_sg
Action: update | Changed: ingress
🟠 HIGH — aws_instance.web_server
Action: update | Changed: instance_type, ami
🟠 HIGH — aws_rds_instance.primary
Action: update | Changed: publicly_accessible
You immediately know what drifted, how bad it is, and which workspace it's in. No log diving, no running terraform plan manually, no guessing.
Making it production-ready
A few things I'd add for a real production setup:
Multiple environments
If you have separate Terraform directories for dev, staging, and production, you can either scan them all in one workflow or create separate workflows with different schedules:
# Production — check every 2 hours
- cron: '0 */2 * * *'
# Staging — check every 6 hours
- cron: '0 */6 * * *'
# Dev — check once a day
- cron: '0 8 * * *'
You can also use separate .tfdrift.yml files per environment with different severity thresholds. Maybe you only alert on Critical for dev but alert on Medium+ for production.
Branch protection
You can add drift checking to your PR workflow so that drift must be resolved before merging:
on:
pull_request:
branches: [main]
paths:
- 'infrastructure/**'
This catches drift before it becomes a deployment surprise. If someone opened a security group in the console and you're about to deploy, the PR pipeline will flag it.
PagerDuty for critical drift
For production infrastructure, you might want Critical drift to page someone instead of just going to Slack:
notifications:
slack:
webhook_url: ${SLACK_WEBHOOK_URL}
min_severity: high
pagerduty:
routing_key: ${PAGERDUTY_ROUTING_KEY}
min_severity: critical
This means High drift goes to Slack, but Critical drift (someone modified a security group or IAM policy) pages the on-call engineer. That's the kind of change you want someone looking at within minutes, not hours.
HTML reports
For weekly reviews or compliance, you can generate an HTML report:
- name: Generate HTML report
run: |
tfdrift report \
--path ./infrastructure \
--output drift-report.html
The HTML report gives you a standalone page with severity breakdowns, drifted resources, and workspace details — useful for sharing with security teams or attaching to compliance documentation.
Cost and limits
A few practical things to know:
GitHub Actions minutes: The free tier gives you 2,000 minutes/month. A drift scan typically takes 2-5 minutes depending on how many workspaces you have. Running every 6 hours is ~120 runs/month × ~3 minutes = ~360 minutes. Well within the free tier.
Terraform API calls: Each terraform plan makes API calls to your cloud provider to refresh state. AWS doesn't charge for most describe/get API calls, but if you have hundreds of workspaces, the volume might matter. Monitor your AWS CloudTrail to be safe.
Slack rate limits: Slack webhooks are limited to 1 message per second. If you're scanning 50 workspaces and 10 have drift, tfdrift batches them into a single Slack message, so this shouldn't be an issue.
The full file structure
After setup, your repo should look like:
your-terraform-repo/
├── .github/
│ └── workflows/
│ └── drift-check.yml # The scan workflow
├── .tfdrift.yml # Severity rules and notification config
├── .tfdriftignore # Expected drift exclusions
└── infrastructure/
├── production/
│ ├── main.tf
│ └── terraform.tfvars
├── staging/
│ └── main.tf
└── dev/
└── main.tf
Try it
If you don't have tfdrift installed yet:
pip install tfdrift
tfdrift scan --path ./your-terraform-dir
The GitHub Actions workflow, .tfdrift.yml, and .tfdriftignore templates are all in the tfdrift repo under the examples/ directory.
If you set this up and run into issues, open a GitHub issue — I'd genuinely like to hear about edge cases I haven't hit yet.
This is part 4 of a series on infrastructure drift detection. Part 1: I Built a Free Terraform Drift Detector — Here's Why. Part 2: Why Severity Classification Changes Everything. Part 3: How I Built a Terraform Plan JSON Parser in Python.
Top comments (0)