Daniel Westgaard

Posted on Jul 3 • Edited on Jul 18 • Originally published at riftmap.dev

How to add a blast-radius gate to your merge pipeline

#blastradius #cicd #platformengineering #githubactions

A pull request to a repository that a hundred others build on should not merge with one approval from a phone. Here is a CI gate that routes the review by measured downstream exposure, in two HTTP calls and about forty lines, on GitLab CI or GitHub Actions.

Someone opens a one-line pull request. It bumps the default in a shared Terraform module, or edits the FROM line in a base image, or changes an include in a CI template. The plan is clean. The diff is three characters. CI goes green, one reviewer approves on their phone between meetings, and it merges. Then the next terraform init in six other repositories resolves the new version, and the people who own those repositories find out from their own pipelines.

The change was correct in isolation. What went wrong was the review. A repository that a hundred others build on had exactly one person look at the thing before it shipped, and that person had no way to see, from inside the pull request, who was standing downstream.

The industry's answer to this has arrived as a wave of pre-merge blast-radius gates, and they are worth taking seriously. Overmind ships a GitHub Action that submits each pull request's Terraform plan and comments the blast radius straight onto the PR. An open-source project reads live dependency relationships out of AWS Config and fails the build with a threshold gate when a change fans out too far. Amazon's answer, after its own change-failure numbers moved, was blunter: require senior sign-off on AI-assisted changes from junior and mid-level engineers. Three gates, and every one of them checks a different graph.

A blast-radius merge gate is only ever as good as the graph it queries. And for the class of change that most needs a second pair of eyes, a base image bump, a shared module rename, a CI-template edit, the graph you want is the artifact graph: which repositories declare a build-time dependency on the thing this pull request changes. That is the graph none of the gates above reads, because a FROM bump has no Terraform plan and no running resource and no code symbol, and it is the one you can query from CI today in two HTTP calls. This post is Post A made operational: the three-graph argument, turned into a job you can paste into a pipeline.

The gate is two GET requests

The whole gate is two GET requests and a threshold. You need one thing that is not in the pipeline, a Riftmap graph of your organisation, which is a one-off read-only scan I will come back to at the end. Given that, the gate resolves itself, because both platforms hand a CI job the repository's own path for free. It is $CI_PROJECT_PATH on GitLab and ${{ github.repository }} on GitHub Actions, and that is exactly what the lookup call takes. Nested GitLab subgroups are included: on a project at platform/runtime/base-images, $CI_PROJECT_PATH is that whole three-segment path, which is exactly the form the scan stores, so the lookup matches nested namespaces without any massaging.

# 1. Resolve owner/repo to its Riftmap id.
REPO_ID=$(curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
  "https://api.riftmap.dev/api/v1/repositories/lookup?full_path=$REPO_PATH" | jq -r '.id')

# 2. Ask who declares a dependency on it.
curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
  "https://api.riftmap.dev/api/v1/repositories/$REPO_ID/impact?max_depth=3&min_confidence=0.8"

The impact call walks the dependency graph outward from your repository and returns every repository that depends on it, each tagged with a depth and a confidence, plus a total_affected count. Depth 1 is who breaks first: the repositories whose manifests name yours directly, a source block resolving to your module, a FROM line pinned to your image, an include pointing at your template. Deeper hops are the amplification. min_confidence defaults to 0.8, which drops the heuristic matches and keeps the edges Riftmap parsed rather than guessed, and for a gate you want it there. (That number is resolution confidence, not existence probability, and the 0.8 floor is doing something more specific than it looks; a companion post works through how a tool knows an edge exists at all and what the score actually measures.)

One thing has to be honest before you wire this to anything, because it decides whether the whole idea is useful or noise. The count is the standing consumer population as of the last scan, at the level of the whole repository. It is not a diff of which consumers your specific change breaks. A repository with 147 downstream consumers returns 147 whether this pull request renames an output every one of them uses or fixes a typo in a comment. So this is a gate on exposure, not on breakage. Read the rest of this post with that framing and it stays sharp. Sell it to your team as a breakage detector, and the first person to run it on a busy shared repository will watch it fire on every pull request, including their own README fix, and quietly conclude the tool is broken. It is not measuring danger. It is measuring how many people a mistake here could reach.

The GitLab CI recipe

Here is the entire gate as a GitLab CI job that runs on every merge request. It needs one thing configured, a masked CI/CD variable called RIFTMAP_API_KEY holding a read-only Riftmap key (mint one labelled ci so you can revoke it independently of the rest).

# .gitlab-ci.yml
blast-radius:
  stage: test
  image: alpine:3.20
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  variables:
    RIFTMAP_BASE_URL: "https://api.riftmap.dev/api/v1"
    THRESHOLD: "10"                    # direct consumers that warrant the review lane
  before_script:
    - apk add --no-cache curl jq
  script:
    - |
      # GitLab hands the job this repo's path for free.
      REPO_ID=$(curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
        "$RIFTMAP_BASE_URL/repositories/lookup?full_path=$CI_PROJECT_PATH" | jq -r '.id')

      # A repo Riftmap has not scanned yet returns nothing. Skip loudly rather than pass silently.
      if [ -z "$REPO_ID" ] || [ "$REPO_ID" = "null" ]; then
        echo "Repo not in the Riftmap graph yet; skipping blast-radius check."
        exit 0
      fi

      IMPACT=$(curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
        "$RIFTMAP_BASE_URL/repositories/$REPO_ID/impact?max_depth=3&min_confidence=0.8")

      DIRECT=$(echo "$IMPACT" | jq '[.affected_repositories[] | select(.depth == 1)] | length')
      TOTAL=$(echo "$IMPACT"  | jq '.total_affected')

      echo "Downstream consumers: $DIRECT direct, $TOTAL transitive."
      if [ "$DIRECT" -ge "$THRESHOLD" ]; then
        echo "Over threshold ($THRESHOLD): this change touches a high-fan-in repository."
      fi

That is the complete gate. It resolves the repository, asks who depends on it, counts the direct consumers, and prints the number. It makes only GET requests, so it never trips Riftmap's rate limits, which apply to writes and not reads, and it holds no cloud credentials, because the cloud was never involved. The one guard that earns its place is the empty-REPO_ID check: a repository the scan has not reached yet, and every brand-new repository is one of those, returns nothing from the lookup, and without the guard the rest of the job would quietly compute nothing and go green while printing blank counts. That is the single worst failure mode a gate can have, looking like it ran and found no exposure when it simply never ran. Skipping loudly keeps the gate honest about the one thing it is entitled to speak on, which is repositories Riftmap has actually scanned. The job then exits 0, which is deliberate. What to do with the number is the next section, and blocking the merge is the option you should reach for last, not first.

The same gate in GitHub Actions

The GitHub Actions version is the identical two calls wearing GitHub's pull-request plumbing. curl and jq are already on the ubuntu-latest runner, so there is no install step, and the repository path arrives as ${{ github.repository }}.

# .github/workflows/blast-radius.yml
name: Blast radius
on:
  pull_request:
    branches: [main]

jobs:
  blast-radius:
    runs-on: ubuntu-latest
    env:
      RIFTMAP_BASE_URL: https://api.riftmap.dev/api/v1
      THRESHOLD: "10"
    steps:
      - name: Measure downstream exposure
        env:
          RIFTMAP_API_KEY: ${{ secrets.RIFTMAP_API_KEY }}
        run: |
          # GitHub hands the job this repo's path for free.
          REPO_ID=$(curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
            "$RIFTMAP_BASE_URL/repositories/lookup?full_path=${{ github.repository }}" | jq -r '.id')

          # A repo Riftmap has not scanned yet returns nothing. Skip loudly rather than pass silently.
          if [ -z "$REPO_ID" ] || [ "$REPO_ID" = "null" ]; then
            echo "Repo not in the Riftmap graph yet; skipping blast-radius check."
            exit 0
          fi

          IMPACT=$(curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
            "$RIFTMAP_BASE_URL/repositories/$REPO_ID/impact?max_depth=3&min_confidence=0.8")

          DIRECT=$(echo "$IMPACT" | jq '[.affected_repositories[] | select(.depth == 1)] | length')
          TOTAL=$(echo "$IMPACT"  | jq '.total_affected')

          echo "### Blast radius: $DIRECT direct consumers, $TOTAL transitive" >> "$GITHUB_STEP_SUMMARY"

Store the key as an Actions secret named RIFTMAP_API_KEY. Same forty lines, same two calls, same read-only key. The only real difference between the platforms shows up when you want the gate to say something on the pull request rather than in the job log, which is where they diverge, and it is worth being exact about why.

The self-updating version of CODEOWNERS

The useful version of this gate does not block the merge. It routes the review. The instinct with a number and a threshold is to fail the build, and that instinct is the one thing to unlearn here, because the far more valuable job the number can do is decide who should be looking at the change.

Think about what CODEOWNERS already does for you. It routes review by path: touch files under modules/, and the platform team is added as a reviewer automatically. This gate routes review by measured downstream exposure instead: this repository currently has N repositories building on it, so a change to it deserves an owner of that shared surface on the pull request, not just whoever opened it. The difference from a hand-written CODEOWNERS line is that the number tracks the graph. A module that grows from three consumers to forty crosses your threshold on its own, the day the fortieth repository adds the dependency, with nobody remembering to edit a rule. A module that loses its consumers drops out of the lane the same way. It is CODEOWNERS that maintains itself against what is actually downstream, rather than against what someone believed was downstream the last time they touched the file.

That reframing is also why the exposure-not-breakage limit from earlier stops mattering. Routing review by exposure never needed to know whether your change was breaking. It only needs to know how many teams are downstream, because that is what makes pulling in a senior reviewer proportionate. You are sizing a coordination cost, not predicting a failure, and the count is exactly the right instrument for sizing a coordination cost.

Underneath the routing sits a policy split I proposed in an earlier post and never actually shipped. A change with no external consumers gets the fast lane, because there is nothing downstream to coordinate and speed is free. A change under the threshold passes with the consumer list posted as a courtesy, so the author knows what they are near. A change over the threshold, or one that touches a repository tagged customer-critical, engages the review lane. Amazon reached for seniority as its proxy because seniority is trivial to encode: junior author, therefore review. Downstream exposure is the proxy Amazon actually meant. A senior engineer changing a shared base image needs the extra eyes more than a junior engineer fixing a log line in a leaf service, and only the graph can tell those two apart.

Turning the number into a review is where the platforms differ. On GitHub, the built-in token does it for free. Add permissions: { contents: read, pull-requests: write } and one step:

        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          PR=${{ github.event.pull_request.number }}
          if [ "$DIRECT" -ge "$THRESHOLD" ]; then
            gh pr edit "$PR" --add-label "blast-radius/high"
            gh pr comment "$PR" --body \
              "This change affects **$DIRECT** repositories directly ($TOTAL transitively). Engaging the shared-artifact review lane."
          fi

On GitLab the merge-request notes API needs a token with api scope, and the pipeline's own $CI_JOB_TOKEN will not post notes, so store a project access token as a masked variable (GITLAB_NOTE_TOKEN) and call the notes endpoint directly:

      curl -sf --request POST \
        --header "PRIVATE-TOKEN: $GITLAB_NOTE_TOKEN" \
        --data-urlencode "body=This change affects $DIRECT repositories directly ($TOTAL transitively)." \
        "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes"

Here is what that comment looks like on a pull request to a genuinely high-fan-in repository. When Riftmap scanned Cloud Posse, terraform-null-label came back with 147 direct consumers, 61% of the whole organisation:

Blast radius: 147 repositories build on this

This change is to terraform-null-label, which 147 repositories in the organisation declare as a direct dependency. Engaging the shared-artifact review lane.

That is the comment the forty-line recipe produces from the impact call alone. Pull the consumer view for the artifact as well, the same shape the worked example in Post A returns, and the comment can carry the version detail that tells a reviewer where the coordination actually lands: of those 147, 138 are on the latest tag and 9 are lagging across six older ones. The reviewer arrives already knowing there are nine repositories to nudge onto the new version, not 147 to panic about. That is the difference between an exposure number and a coordination plan, and it is why the number is worth surfacing where a human will read it.

Those numbers are a real scan, not a hypothetical. The whole org behind them — why the 147 stays flat at depth one, where the 133 identical context.tf lines live, and what grep and a symbol graph each return for the same module — is walked through in what 242 Cloud Posse repos actually depend on.

If you do want the gate to hard-stop a merge, that is a one-line change and an opt-in, not the default. Exit non-zero over the threshold, and mark the job as a required check in branch protection or merge-request approval settings:

      [ "$DIRECT" -ge "$THRESHOLD" ] && exit 1 || true

Start with the label and the comment. Reach for exit 1 only on the handful of repositories where a large blast radius genuinely should stop the world, and even then, expect to spend a week tuning THRESHOLD before anyone trusts a red check that came from a consumer count.

One last discipline, because the neighbours are already crossing it. The open-source AWS Config gate has an ai-gate mode, and Port's guide has an LLM reason over catalogue relations and score the risk. Both are reasonable, and a written risk narrative is a genuinely nice thing to drop into a pull request. But it is a judgement, and you should not block a merge on a judgement that can vary between two runs on the same diff. The consumer count is not a judgement. It is a graph traversal that is either right or wrong about who declares the dependency, and it is the part you can safely automate a routing decision on. Gate on the enumeration. Treat the narrative as advice.

Where this gate stops

This gate sees one layer of dependency, and three kinds of blast radius sit outside it. Being exact about all three is the difference between a tool your team keeps and one they mute.

The first is the one already covered: it enumerates, it does not diff. It answers "who is downstream," not "does this change break them." You keep the false alarms down by holding min_confidence at 0.8 and, if a particular repository is noisy, by only running the job when the files that actually declare interfaces change, through paths: on GitHub or changes: on GitLab. What you never do is describe it as catching breaking changes, because it does not, and the copy that says it does is the copy that gets the tool uninstalled.

The second is freshness. The graph is a scan artifact, not live state, and a gate makes a merge decision, so eventual consistency with your scan cadence is a sharper caveat here than the same lag would be in a dashboard. State it symmetrically and it reads as engineering judgement rather than apology. Overmind and the AWS Config gate read live state, so their answer is current to the second, and they pay for it: both need cloud credentials inside the pipeline and a real plan step before they can say anything at all. Riftmap reads a graph built by a scan, so the answer is current to your scan cadence, and that is precisely why it is two GET requests with no cloud credentials in CI and no terraform plan in the job. One buys freshness with access. The other buys cheapness with lag. For a coarse routing decision on a mature shared artifact, whose fan-in barely moves week to week, the lag is not what bites you, and you can surface it explicitly anyway. Every repository the lookup returns carries last_scanned_at and last_activity_at, so keep the whole lookup payload instead of pulling out only the id, and check the two against each other:

# capture the whole lookup response, not just .id
REPO=$(curl -sf -H "X-API-Key: $RIFTMAP_API_KEY" \
  "$RIFTMAP_BASE_URL/repositories/lookup?full_path=$REPO_PATH")
REPO_ID=$(echo "$REPO" | jq -r '.id')

STALE=$(echo "$REPO" | jq 'if .last_activity_at > .last_scanned_at then true else false end')
[ "$STALE" = "true" ] && echo "Note: this repo has changed since Riftmap last scanned it; the count may be behind."

The third is scope, and it is where the honest line with the live-state tools lives. This gate is artifact-scoped and source-scoped. Overmind's is plan-scoped and runtime-scoped. Overmind sees the resource someone created in the console that your plan is about to touch, and the GitLab Blast Radius Reviewer walks the symbol graph and prunes any change with no public symbols, so it catches an exported function's callers and never sees a FROM bump at all. This gate sees the six repositories whose FROM line resolves to the image you just rebuilt, and cannot see a runtime HTTP call that no manifest declares. A serious platform team at scale plausibly wants more than one of these running side by side, because they are blind in opposite directions, and the artifact layer is the one that has been empty until now.

The pull requests nobody looks at twice

A merge gate is a bet about which mistakes are worth stopping to look at. The blast-radius gates shipping this year all make that bet on a graph, and the graph decides which mistakes the gate can even see. Bet on the live-cloud graph and you catch the console resource and miss the cross-repo consumer. Bet on the artifact graph and you catch the change your reviewer, and your coding agent, both think is safe: the FROM line, the source ref, the shared module whose consumer list nobody has counted since the person who set it up handed in their notice. And it matters more for the agent than the human, because a person opening that pull request at least half-remembers what is downstream, whereas an agent making the same change from one repository's clone knows nothing about the other repositories at all. Wire the number to a review lane, and the pull requests with the largest blast radius stop being the ones that merge with a single approval from a phone. They become the ones the right person was pulled in to see.

None of this runs until Riftmap has a graph of your organisation to answer the two calls, and that graph is the part worth having. It is one read-only token across your GitLab group or GitHub organisation, no per-repo config and no YAML catalogue to keep current, and once it exists the blast radius of any repository is a single API call. The recipe above is just the reason the scan pays for itself. And there is deliberately no Riftmap Action or GitLab component yet: the gate is forty lines you can read, own, and change, and I would rather ship that than a black box you have to trust. If you would use a maintained drop-in instead, tell me, and I will build it.

Questions engineers actually ask

How do I add a blast radius check to my CI pipeline?

Compose two Riftmap API calls in a CI job. Look the repository up by its path ($CI_PROJECT_PATH on GitLab, ${{ github.repository }} on GitHub Actions) to get its id, then call the impact endpoint to get every repository that declares a dependency on it. Count the direct consumers and comment or route on a threshold. It runs in seconds, needs only a read-only Riftmap key, and works the same whether a person or an agent opened the pull request.

How do I fail a pull request when it affects too many repos?

Set a threshold on the direct downstream consumer count, exit the CI job non-zero when a change is over it, and mark that job as a required check in branch protection. The more useful default is not to block but to route: keep the job advisory and use the number to request review from the owners of the shared surface when exposure is high. Gate on the consumer count, which is a deterministic graph traversal, rather than on an AI-generated risk score, which is a judgement that can vary between runs.

Do I need cloud credentials to check blast radius in CI?

Not for the artifact layer. Riftmap answers from a dependency graph it built during a one-off scan of your organisation, so the pipeline needs only a read-only Riftmap API key and never touches your cloud. Live-state tools like Overmind and the AWS Config based gates do need cloud access, because they read the current state of your running infrastructure at pull-request time. The trade-off is scope: the artifact graph sees cross-repo build-time dependencies, live-state tools see runtime resources including ones created outside your IaC.

Can an AI coding agent run the same blast-radius check?

Yes, and it matters more for the agent than for a human. A person opening the pull request often half-remembers what is downstream, whereas an agent making the same cross-repo change from a single repository's clone knows nothing about the other repositories at all. The same impact call gives either one the downstream consumer list before the merge, which is context the agent structurally cannot reconstruct on its own.