DEV Community

Cover image for Terraform CI Is Green. Here's What It Missed.
Glenn Gray
Glenn Gray

Posted on • Originally published at graycloudarch.com

Terraform CI Is Green. Here's What It Missed.

Originally published on graycloudarch.com.


The apply produced a diff nobody expected. The plan had been green. The PR had been approved. Two engineers had been moving fast through a Terraform monorepo — module changes, stack updates, new resources in parallel — and the CI was green on every single PR. Nobody saw the problem until the change was already in.

The cause wasn't bad code. It was a CI pattern so common it's nearly a default: run terraform plan only for stacks where files changed in the PR.

That sounds right. It is wrong.

The specific failure: changed-files detection doesn't know about consumers

Here's the shape of a typical monorepo CI setup:

modules/
  network/
    main.tf
stacks/
  prod-vpc/
    main.tf   ← sources from modules/network/
  dev-vpc/
    main.tf   ← sources from modules/network/
Enter fullscreen mode Exit fullscreen mode

A PR modifies modules/network/main.tf. The changed-files action sees changes in modules/network/. It runs a plan for modules/network/. It does not run a plan for stacks/prod-vpc/ or stacks/dev-vpc/ — because those directories have no changed files.

Both of those stacks will produce a different plan when they're next applied. Nobody saw it before merge.

The logic is seductive: why run plans for stacks that haven't changed? But the premise is wrong. A stack that sources a changed module has changed — you just can't see it in the diff. The module change is the diff.

What actually works

Three approaches, in order of correctness:

Run plan for every stack on every PR. Expensive on a large monorepo, but correct. Terragrunt's run-all plan with --terragrunt-parallelism 8 makes this tractable in most codebases. If it's too slow, it's a signal the monorepo has grown past what a single pipeline can handle — and that's a different problem worth surfacing.

Build a dependency graph. Parse source = references to find all consumers of changed modules, add those stacks to the plan set. This is the right answer architecturally, but it requires build tooling to maintain the graph. Tools like Terragrunt's dependency blocks give you this for free if your dependency declarations are complete.

Practical middle ground. Run plan for all stacks in the same directory subtree as any changed module. Not as precise as a graph, but catches the most common failure: a module and its primary consumers living near each other in the directory structure. Works well for codebases where modules/ and stacks/ are adjacent siblings and team conventions keep related things together.

What doesn't work: paths-filter or the changed-files action scoped to the stack directory. It sees no diff, skips the plan, CI stays green, and the module change is invisible to reviewers until apply runs post-merge.

Three supporting fixes that complete the picture

The module consumer problem is the silent failure mode — it requires a deliberate fix to CI architecture. But there are three other common issues that are cheaper to address and eliminate most of the remaining review friction.

Put the plan in the PR comment, not in the logs.

A plan that lives in the Actions logs requires a reviewer to click through to the workflow run, find the right job, scroll to the plan output, and read it in isolation from the PR diff. Most reviewers don't. They check whether CI is green and click approve.

- name: Post plan to PR
  run: |
    PLAN=$(terraform show -no-color tfplan 2>&1 | head -200)
    gh pr comment ${{ github.event.pull_request.number }} \
      --body "### Terraform Plan
    \`\`\`
    ${PLAN}
    \`\`\`"
  env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Enter fullscreen mode Exit fullscreen mode

A reviewer who sees the plan inline — showing N resources to add, M to change, 0 to destroy — can make a real decision before clicking approve. The plan comment also becomes a lightweight audit trail: what did we expect to happen, and what actually happened.

Enable terraform fmt --check. For real this time.

Most codebases have it disabled. The comment is usually # TODO: fix formatting first. The fix is a one-time operation:

terraform fmt -recursive .
git commit -m "terraform fmt: normalize formatting before enforcing check"
Enter fullscreen mode Exit fullscreen mode

Then enable the check as a separate fast job. It runs in under 10 seconds, has no false positives, and eliminates the category of review comments that are pure style — freeing reviewers to focus on substance.

Add tflint with the AWS ruleset.

terraform validate catches syntax errors. It does not catch deprecated resource types, instance types that no longer exist, missing required_providers, or module interface mismatches where a variable is passed to a module that no longer expects it. Those surface at apply time.

# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.32.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}
Enter fullscreen mode Exit fullscreen mode

The practical value is catching things that Terraform itself won't catch until it's talking to the AWS API — like an instance_type that was deprecated, or a required_providers block that's incomplete after a module upgrade.

What good Terraform CI looks like end-to-end

Terraform CI pipeline: fmt check → tflint → plan all stacks → post plan to PR comment → apply on merge

PR opened
  → terraform fmt --check        (fast; fails on style)
  → tflint                       (fast; catches deprecated/missing config)
  → terraform plan (all affected stacks)
  → plan posted to PR comment

PR merged
  → terraform apply (gated on approval + merge)
Enter fullscreen mode Exit fullscreen mode

The critical constraint is the last line: apply never runs on open PRs. Plan runs freely and often; apply runs exactly once per PR, after merge, and only on approved changes.

On a monorepo with Terragrunt, run-all plan handles the multi-stack case. The plan comment step posts one comment per stack with a summary header, so reviewers can scan affected stacks without opening each workflow run.

What "CI is green" actually means

Green CI on a Terraform PR means syntax is valid, the workflow ran, and the specific stacks with changed files produced a plan. It does not mean the change is safe. It does not mean the full blast radius is visible.

The module consumer problem is the clearest example of this gap, but it's not the only one. Infrastructure review requires actually reading the plan — which requires the plan to be somewhere reviewers will look. Green CI that nobody reads is a false signal, and a fast-moving codebase will eventually prove that.

The four fixes here don't require new tools or platform investment. They require deciding that CI should actually help reviewers make decisions, not just confirm the workflow completed.

Working through Terraform CI gaps in a fast-moving monorepo? This is the kind of platform work I do regularly. Get in touch.

Top comments (0)