DEV Community

Raj Patil
Raj Patil

Posted on

I got tired of manually digging CloudTrail every time Terraform drifted — so I built tf-why

The problem

Every time terraform plan showed drift, I'd see something like this:
~ aws_security_group.web will be updated in-place
~ ingress = [

  • {
  • from_port = 22
  • protocol = "tcp"
  • to_port = 22 }, ]

Terraform tells you what changed. It tells you nothing about who
changed it, when, or how.

So I'd open the AWS Console, navigate to CloudTrail, set the time range,
filter by resource, scroll through dozens of unrelated events, and
eventually find the culprit — 20 minutes later.

Every. Single. Time.


What I built

tf-why — a CLI that pipes terraform show -json output directly into
AWS CloudTrail and gives you attribution in seconds.

terraform show -json plan.tfplan | tf-why
Enter fullscreen mode Exit fullscreen mode

Output:
aws_security_group.web (ingress rules changed)
├── Changed by: john.doe@company.com
├── When: 2 days ago (2026-05-02 14:32 UTC)
├── Via: AWS Console
└── Event: AuthorizeSecurityGroupIngress

aws_s3_bucket.assets (versioning changed)
├── Changed by: ci-deploy-role
├── When: 1 day ago (2026-05-03 09:15 UTC)
├── Via: AWS CLI / SDK
└── Event: PutBucketVersioning

2 drifted resources found.

That's it.

How it works internally

Three steps:
1. Parse the plan JSON
terraform show -json plan.tfplan outputs clean structured JSON.
I filter resource_changes[] for entries where change.actions
includes "update" — these are your drifted resources.

2. Map resource to CloudTrail lookup key
Each Terraform resource type maps to a real AWS resource identifier.
For example:

  • aws_s3_bucket.my_bucket → bucket name from change.after.bucket
  • aws_security_group.web → security group ID from change.after.id

I maintain a mapper for 25+ resource types covering S3, EC2, IAM,
RDS, Lambda, ECS, EKS, DynamoDB, SQS, SNS, CloudFront, and more.

3. Query CloudTrail

client.lookup_events(
    LookupAttributes=[{
        'AttributeKey': 'ResourceName',
        'AttributeValue': resource_id
    }],
    StartTime=datetime.now() - timedelta(days=lookback),
    EndTime=datetime.now()
)
Enter fullscreen mode Exit fullscreen mode

Filter for write events on the drifted resource → sort by time →

return the most recent match.

What it doesn't do (being honest)

  • AWS only for v1 — Azure Activity Log and GCP Audit Log support is on the roadmap, community PRs welcome
  • 90-day CloudTrail limit — free tier CloudTrail only stores 90 days of event history. Documented upfront.
  • Not 100% perfect ARN matching — some complex resources with nested ARN formats are still being expanded --- ## Install
pip install tf-why
Enter fullscreen mode Exit fullscreen mode

Requirements:

  • Python 3.8+
  • AWS credentials with CloudTrail read access

- terraform show -json plan.tfplan output

Why open source

Because every team using Terraform hits this exact wall during
incident response and post-mortems. This should exist as a standard
part of the Terraform workflow — not buried inside paid SaaS platforms.

Built this solo. 101 tests, 81% coverage.
Roast it, break it, open issues, send PRs.

GitHub: https://github.com/Raj-glitch-max/tf.why

PyPI: https://pypi.org/project/tf-why

What's the most painful part of your Terraform debugging workflow?
Genuinely curious — next feature might solve yours.

Top comments (0)