Anthony Johnson II

Posted on May 18 • Originally published at etherealogic.ai

The Agent's Word Is Not Enough: External Validation in the Agentic Governance Stack

#softwareengineering #githubactions #ai #devops

This article was originally published on EthereaLogic.ai.

The first two articles in this series established a distinction that anchors the whole governance framework: the layers that live in documents tell the agent what to do, and the layers that run as code are what make the system trustworthy. Documents explain the rules. Hooks make rules physically impossible to violate inside the harness. But hooks intercept actions, not claims. The hook in GovForge blocks a direct push to main. It says nothing about whether the code the agent wrote is correct, whether the tests the agent ran actually covered the changed behavior, or whether a dependency in the last install carries a known CVE the agent did not flag. Those questions live outside the hook's jurisdiction — and outside the agent's own reporting as well.

On April 20, 2026, in the post-PR #258 sync record, the GovForge project's primary agent reported 1,361 passing tests from its local validation run: 1,159 backend and 202 frontend, all green. The most recently completed CI run on main at that moment reported 1,152 passing backend tests. The backend discrepancy was 7 tests, and the agent had not misreported anything. The tests it ran locally were real. They passed. CI's lower count reflects GOVFORGE_RUN_LLM_TESTS=0, which GovForge's CI configuration sets explicitly to disable LLM-integration tests that require a local Ollama endpoint — unsuitable for a clean CI runner that has no GPU or local model dependency. The agent's count was accurate for its local development environment. It was not accurate for the environment that governs a merge to main. CI is the layer that knows the difference.

That is what the external-validation layer is for. This is the third and final article in the EthereaLogic series on the agentic governance stack. It goes inside Layer 5 — the one that runs independently of the agent, from a clean environment with no access to the agent's session state, and treats the agent's self-report as a starting point, not a conclusion.

Layer 5 runs in an environment the agent did not configure, with tools the agent does not control, producing reports the agent cannot overwrite. The three-job shape — quality, static analysis, dependency scanning — independently covers the three principal failure modes an agent can produce without triggering a hook.

The Failure Mode Hooks Cannot Close

Hooks intercept actions. They stop an agent from taking a destructive step inside the harness — committing to a protected branch, deleting a file outside the permitted root, constructing a shell pipeline that smuggles a protected operation through a nested command. The GovForge pre-tool-use.js guard covers all of that. What it cannot cover is everything that happens once the allowed action lands.

An agent can write a test suite that passes because it tests the wrong behavior. An agent can run a dependency install and report success without checking whether any installed package has a known vulnerability. An agent can complete a typecheck under a configuration that silences the errors the updated code introduced. None of these are destructive operations in the sense the hook is designed to block. All of them are failure modes that a clean external CI run — starting from a fresh checkout with an authoritative environment definition — is positioned to surface before the PR merges.

The failure mode hooks cannot close is not about what the agent does wrong. It is about what the agent cannot see. The agent's session state is its own. Its test runner runs in its own process. Its dependency install uses its own cache. Its environment variables are its own. None of that maps cleanly onto what CI sees, because CI starts over from nothing every time. The divergence the April 20 sync record surfaced — 1,159 local backend vs. 1,152 CI backend — is not a failure. It is a correct representation of two different environments answering the same question differently. The external-validation layer's contribution is precisely that: it answers the question from outside.

What External Validation Actually Is

External validation, in the context of this governance stack, means a CI suite that runs in a clean runner environment, on a fresh checkout, with no access to the agent's session state, and produces reports the agent cannot overwrite or amend.

Each of those properties matters independently. Clean runner means no implicit carry-over from the agent's local environment — no agent-generated environment variables, no state from prior sessions, and no packages except those explicitly cached in the workflow definition itself. Fresh checkout means CI sees exactly the committed code, not the agent's working tree. No session-state access means CI does not know what the agent ran locally, what the agent reported, or what the agent believes to be true. Reports the agent cannot overwrite is what makes the external-validation layer irreversible: Codacy's analysis, Snyk's dependency scan, and Codecov's coverage upload are generated by tools the agent did not write and does not control, attached to the commit or PR as artifacts that exist independently of anything the agent says.

The tool configuration that produces those properties is not complex, but it has a shape that has emerged consistently across the four production projects in the development directory: a quality job, a static-analysis job, and a dependency-scanning job, each running independently with blocking behavior assigned deliberately. The shape is the point. Any one job can miss a failure mode the other two catch; all three together cover the principal failure modes an agent can introduce without triggering a hook.

The Three-Job Shape

The GovForge ci.yml is 79 lines and contains three jobs: lint-and-test, codacy, snyk. It is the reference shape for the pattern.

The lint-and-test job runs on Python 3.11 and 3.12 in a matrix, which means every push produces two job instances — each running the full gate sequence: marker scan, Ruff lint, mypy typecheck, pytest with coverage, frontend tests, and frontend build. The matrix is load-bearing: an agent can inadvertently introduce a type annotation or syntax form that is valid in one Python version and invalid in another, and the matrix catches the divergence before the PR merges. The job sets GOVFORGE_RUN_LLM_TESTS: "0" at the env level, which is the environment variable whose value explains the April 20 test delta. That variable is the CI configuration's way of stating that LLM-integration tests are out of scope for a clean runner: they require a local Ollama endpoint (http://localhost:11434), and a clean CI runner has no GPU or locally running model to satisfy that dependency. The agent runs them locally because local development benefits from the full test surface. CI does not run them because CI is not local development.

The codacy job runs Codacy's analysis CLI from a SHA-pinned action. Codacy is a static-analysis platform that inspects code for quality issues, security patterns, complexity violations, and duplication — patterns that pass linting and typechecking but signal structural problems. It applies its own rule set, not the project's. An agent that writes code that passes Ruff and mypy can still produce code that Codacy flags as a cyclomatic complexity violation or a security anti-pattern. The codacy job has no continue-on-error flag, which means a Codacy block fails the overall CI status.

The snyk job runs Snyk's dependency scanner from a SHA-pinned action, with continue-on-error: true. The continue-on-error flag is not a concession; it is a deliberate design choice. Snyk operates against a live vulnerability database that is updated continuously. A Snyk finding on a push may reflect a CVE disclosed hours ago against a dependency that has not yet shipped a patched version. Blocking the merge on a finding with no available fix produces a CI configuration that generates blocked PRs with no actionable resolution path. continue-on-error: true means the scan executes in CI and its output is visible in the workflow logs without blocking the merge; the finding is produced independently of the agent's self-report and is the operator's responsibility to triage. AetheriaForge and DriftSentinel carry the same Snyk configuration for the same reason.

AetheriaForge and DriftSentinel add a fourth element: Codecov upload via codecov/codecov-action@75cd11691c0faa626561e295848008c8a7dddffe # v5, configured with fail_ci_if_error: true. Codecov is a coverage-tracking service. The upload produces a coverage report attached to the PR that is independent of the agent's local coverage output. fail_ci_if_error: true means that if the upload fails — network error, invalid token, malformed report — CI fails rather than silently omitting the coverage signal. The Codecov report is not a gate on a coverage percentage floor in these projects, but it makes coverage trends visible across PRs and does so from outside the agent's session. The agent's local pytest run also produces coverage output; the Codecov report is the one that is persisted, diffed against prior runs, and attached to the PR as an independent artifact.

ADWS Pro implements the same governance intent with a different job layout: a test job that runs the quality and coverage gates with local Codacy-equivalent and Codecov-equivalent checks inline, a separate security job for the local Snyk-equivalent vulnerability gate, a post-merge-signal job that writes the CI outcome (passed or regressed) to a named artifact (adws-post-merge-outcome), and an sbom job that generates a software bill of materials on every push. A separate drift-sentinel.yml workflow (50 lines) adds PR drift detection via drift_report.json. The ADWS Pro CI surface across both workflow files totals 163 lines.

The Environment Gate and the Test Delta

The April 20, 2026 sync record is the clearest available example of why the external-validation layer matters even when the agent is operating in complete good faith.

The agent's local count — 1,361 passing tests — was a correct measurement of the GovForge test suite running in the local development environment. Every test the agent ran passed against real code. The sync record documents the measurement in detail: 1,160 backend tests collected, 1,159 passed, 1 skipped; 202 frontend tests passed across 33 test files; make validate exited 0. The agent reported what it measured. The sync record documents the claim with precision.

CI's count — 1,152 passing backend tests, 8 skipped — reflects a different environment definition. The GOVFORGE_RUN_LLM_TESTS=0 environment variable, declared at the job level in ci.yml, disables the LLM-integration test suite. That suite has 7 tests marked @pytest.mark.llm that require a locally running Ollama endpoint at http://localhost:11434. Those tests exercise real production code paths through GovForge's model-routing layer, but a clean CI runner has no GPU or local model process to satisfy the endpoint check in conftest.py. The CI configuration excludes them deliberately. The agent's local development environment, where Ollama is running, does not.

The result is a documented, reproducible, and fully explained divergence between the agent's self-report and CI's independent count — 7 backend tests' difference. Neither number is wrong. Both are correct descriptions of different environments applying different criteria to the same question. The external-validation layer's role is not to catch the agent lying. It is to answer the question from the environment that governs whether code ships. The agent's local environment is useful evidence. It is not authoritative evidence. CI is.

The agent's local count and CI's count are both correct. Both accurately describe their respective environments. Only CI's count governs whether the branch merges — and CI's environment is defined by the workflow file, not by the agent's session.

This distinction is the single most important property of the external-validation layer, and it is the one most likely to be papered over in an agentic deployment that has only the first four layers. A team that treats an agent's self-reported test pass as a merge signal without independent CI confirmation is implicitly trusting that the agent's environment matches the CI environment, that the agent's test configuration matches the CI configuration, and that the agent's dependency state matches what a fresh install would produce. All three assumptions are wrong on a long enough timeline, and all three are corrected by the time an external CI run finishes.

SHA-Pinning and the Infrastructure CI Runs On

The external-validation layer depends on the CI infrastructure itself being trustworthy. If the actions that CI invokes are mutable — that is, if the identifier used to reference them can resolve to different code on different days — then CI is not actually independent. It is dependent on whatever the action maintainer most recently published under a given tag.

This is not a theoretical risk. GitHub's own documentation on hardening workflows for third-party actions names mutable version tags as a documented attack vector. A tag like v4 points to the latest commit on the v4 release line; if the action maintainer pushes a new commit to that line, every workflow referencing @v4 begins running the new code on its next invocation. The workflow author may not know. CI does not inherently warn that the referenced tag now resolves to different code. The behavior change is silent.

SHA-pinning closes this class entirely. A reference like actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 resolves to exactly one commit, permanently. If the action maintainer pushes a new commit to the v6.0.2 tag, the SHA-pinned workflow is unaffected — it still resolves to the commit that was current at the time the workflow was authored. The comment annotation (# v6.0.2) serves the human reader; the SHA serves the runtime. Both are required, in the same way that a well-written hook has a clear stderr message for the agent and an exit code 2 for the harness.

Representative SHA-pinned actions from across the four production projects include:

# Checkout — GovForge and ADWS Pro
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

# Checkout — AetheriaForge and DriftSentinel
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1

# Python setup — GovForge
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0

# uv setup — GovForge
- uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0

# Bun setup — GovForge
- uses: oven-sh/setup-bun@0c5077e51419868618aeaa5fe8019c62421857d6 # v2

# Codecov — AetheriaForge and DriftSentinel
- uses: codecov/codecov-action@75cd11691c0faa626561e295848008c8a7dddffe # v5

# Codacy — GovForge, AetheriaForge, DriftSentinel
- uses: codacy/codacy-analysis-cli-action@d43360362776a6789b47b99ae8973510854e2d3d # master

# Snyk — GovForge, AetheriaForge, DriftSentinel
- uses: snyk/actions/python@9adf32b1121593767fc3c057af55b55db032dc04 # master

# PyPI publish — DriftSentinel and AetheriaForge
- uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0

The two earlier-stage projects — spec-driven-docs-system and sdlc_app — use unversioned tag references (actions/checkout@v6, actions/checkout@v4, actions/setup-node@v4) for their standard setup actions. spec-driven-docs-system SHA-pins the non-standard gitleaks action (gitleaks/gitleaks-action@ff98106e4c7b2bc287b24eaf42907196329070c7 # v2.3.9) while leaving the standard actions on floating tags. sdlc_app pins nothing. Both gaps are documented as known rather than deliberate — the same production-project standard has not yet been backported to either earlier-stage project.

SHA-pinning is the practice that most visibly distinguishes a CI configuration that has been audited from one that has been copied from a tutorial. Most tutorials use version tags because version tags are easier to read and maintain. That ease is the same property that makes them mutable. SHAs trade legibility for integrity. The # v6.0.2 comment restores most of the legibility without giving up the integrity. For an agentic project where CI is the independent verifier, allowing the verifier to silently change its behavior is the same class of problem as allowing the agent to modify its own test suite. The SHA is not legibility overhead. It is integrity.

The Rule That Makes the Layer Load-Bearing

External validation collapses into theater if the agent's self-report can substitute for CI's report when the two disagree.

The rule that prevents this is stated in the first article in this series and worth repeating precisely: if the agent claims tests pass, CI confirms it; if CI disagrees, the claim is unverified. There is no third case. A PR does not merge because the agent says it should. A PR merges because CI says the gates passed. The distinction is operationally significant: when an agent reports that a branch is ready to merge, the response is not to merge it but to wait for CI.

This rule has to be enforced at the workflow level, not the instruction level. An instruction that says "wait for CI before merging" is a document-layer directive with document-layer enforceability: the agent reads it and acts on it correctly, or it does not. The recommended enforcement mechanism is branch protection — a GitHub branch protection rule that requires all CI checks to pass before a PR can be merged, with no administrative override available to the agent. When configured, that setting exists outside the agent's control and outside the operator's day-to-day attention; if CI fails, the merge button is unavailable. The rule becomes structural rather than advisory.

This independence holds only if the workflow definition and required status checks are themselves protected from agent modification. An agent with repository write access could propose a change to .github/workflows/ci.yml in a PR, but it cannot merge that PR if branch protection requires the existing CI checks to pass first — and it cannot bypass the checks by renaming or removing them without a merge that the required checks themselves would block. That circularity is the structural guarantee.

For agentic workflows specifically, branch protection is the intended analog of the hook: both transform a written policy into a structural barrier. The hook prevents the agent from committing to main directly. Branch protection prevents anyone — agent or operator — from merging a PR without a clean CI run. Together they close the full path from agent action to main: the hook closes the direct-push path; branch protection closes the PR-merge path. Neither is sufficient without the other, and the external-validation layer is what branch protection is designed to enforce. Whether that enforcement is currently wired is a per-project configuration decision; the pattern described here is the target state.

Facts

The following are measured facts drawn from the development directory and the local workflow configurations of the projects referenced, verified on May 17, 2026. They should be read within the scope of those projects.

Across the six active projects in the development directory, 465 total lines of GitHub Actions workflow YAML are in place across the primary CI workflows: GovForge ci.yml (79 lines, 3 jobs), AetheriaForge ci.yml (72 lines, 3 jobs), DriftSentinel ci.yml (74 lines, 3 jobs), ADWS Pro ci.yml (113 lines, 4 jobs), spec-driven-docs-system ci.yml (97 lines, 3 jobs), and sdlc_app ci.yml (30 lines, 1 job). Additional workflow files — project-sync.yml in GovForge, drift-sentinel.yml in ADWS Pro (50 lines), and separate publish.yml files in AetheriaForge and DriftSentinel — are not included in this count.
The three-job production shape (quality via lint-and-test, static analysis via codacy, dependency scanning via snyk) is present in GovForge, AetheriaForge, and DriftSentinel. ADWS Pro implements the same governance intent with a different layout — test, security, post-merge-signal, and sbom jobs, with quality and coverage checks inline in test and the vulnerability gate in security — plus a separate drift-sentinel.yml workflow. spec-driven-docs-system carries a three-job shape with different tools (smoke, security, isolated-install). sdlc_app carries a single validate job.
The April 20, 2026 GovForge sync record (anchor commit cabee9e72ca57b860bc1a967ec8d40fe9b37cda5) documents the agent-local vs. CI test count divergence: 1,361 passing tests locally (1,159 backend passed + 1 skipped + 202 frontend) vs. 1,152 passing backend tests in the reference CI run (with 8 skipped across 1,160 collected). The 7-test backend delta — local 1,159 vs. CI 1,152 — is the LLM-integration test suite, disabled in CI via GOVFORGE_RUN_LLM_TESTS: "0" declared at the lint-and-test job level. The CI run for PR #258 (commit cabee9e, which added 4 frontend tests) was in-progress at the moment the sync record was finalized; its CI counts were not yet confirmed at that timestamp.
All GitHub Actions invoked in ADWS Pro, GovForge, AetheriaForge, and DriftSentinel workflow files are pinned to specific commit SHAs rather than version tags. Representative SHA-pinned references are shown in the SHA-Pinning section above; the full set of pinned actions in each workflow file exceeds what is listed there. The # <version> comment annotation appears alongside each SHA to preserve human readability.
spec-driven-docs-system and sdlc_app use unversioned tag references for standard GitHub actions (actions/checkout@v6, actions/checkout@v4, actions/setup-node@v4, actions/setup-python@v6). spec-driven-docs-system SHA-pins the non-standard gitleaks action (gitleaks/gitleaks-action@ff98106e4c7b2bc287b24eaf42907196329070c7 # v2.3.9) while leaving the standard actions on floating tags. Both gaps are documented as known rather than deliberate.
Snyk is configured with continue-on-error: true in GovForge, AetheriaForge, and DriftSentinel — the dependency scan executes and its output is visible in the workflow logs, but a Snyk finding does not block the merge. Codecov is configured with fail_ci_if_error: true in AetheriaForge and DriftSentinel — a coverage upload error is CI-blocking. The codacy job in all three projects has no continue-on-error flag, meaning a Codacy analysis failure causes the check to fail.
The GovForge lint-and-test job declares GOVFORGE_RUN_LLM_TESTS: "0" at the env level and runs a Python 3.11 / 3.12 matrix. No equivalent LLM-test gate appears in the AetheriaForge, DriftSentinel, or ADWS Pro CI configurations at the time of writing — AetheriaForge and DriftSentinel also run a Python 3.11 / 3.12 matrix but do not use an environment-gated test suite.
DriftSentinel runs 416 tests under pytest as of the measurements in the first article in this series (verified April 30, 2026).

Interpretation

The following are engineering judgments drawn from operating the external-validation layer on these projects. They should be read as claims about the author's experience, not universal prescriptions.

The self-reporting problem is structural, not behavioral. The April 20 test delta is not a case where the agent made an error. It is a case where the agent's environment and CI's environment differ in a defined, documented, and deliberate way. The agent's count is true in the agent's environment. CI's count is true in CI's environment. The difference between the two is load-bearing — it reflects a decision about what should and should not gate a merge to main. Without CI, that decision has no enforcement mechanism. The agent cannot know what CI knows, because CI's environment is not the agent's environment and is designed not to be.

The three-job shape is the minimum, not the target. Lint-and-test, static analysis, and dependency scanning together cover the three principal failure modes an agent can produce without triggering a hook: incorrect behavior, code-quality regressions, and supply-chain vulnerabilities. Any one job alone misses the other two. A team that runs only tests will ship code that passes tests and fails static analysis. A team that runs only Codacy will have no coverage signal and no dependency exposure. The three jobs are the minimum surface for an external-validation layer that can plausibly verify an agent's self-report across the dimensions that matter most in a production context.

The continue-on-error: true decision for Snyk is an operational judgment, not a governance gap. Snyk reports against a live vulnerability database. A CVE can be disclosed and Snyk's database updated within hours of a merge. Blocking a merge on a finding with no available fix produces a situation where the project cannot merge until someone patches a transitive dependency that the project does not control. The right response is to surface the finding in CI output and make it the operator's responsibility to triage. Treating continue-on-error: true as a gap misunderstands the tradeoff; treating it as equivalent to not running Snyk misunderstands the value. The scan runs, the output exists in CI logs, and that output is produced independently of the agent's self-report regardless of whether it blocks the merge.

SHA-pinning is the practice that distinguishes a configured CI pipeline from a tutorial copy. The cost of SHA-pinning an action is seconds per action: look up the SHA for the version you want, substitute it in the workflow file, annotate the comment. The benefit is that the CI pipeline's behavior is frozen at the version you chose, permanently, regardless of what the action maintainer does next. For an agentic project where CI is the independent verifier, allowing the verifier to silently change its behavior is the same class of problem as allowing the agent to modify its own test suite. The SHA is not legibility overhead. It is integrity.

The external-validation layer makes one assumption the rest of the stack does not. Every other layer in the governance stack works with the agent: documents guide it, hooks constrain it, agent specialization shapes it. The external-validation layer does not work with the agent at all. It assumes the agent's self-report is not authoritative, and it provides the authoritative answer from outside. That assumption is the one most agentic coding deployments quietly omit, because the agent's self-report is usually right and building a layer that assumes it might not be feels like friction. It is not friction. It is the layer in the stack least exposed to the influence of an adversarial subagent, a misconfigured local environment, a stale cache, an undeclared environment variable, a mutable action tag, or a newly disclosed CVE — and the one whose reports exist independently of whatever the agent says about them. It is the layer that makes the output of the whole stack verifiable.

Practical Implications for Teams Considering the Pattern

If your team has hooks and no external validation, the next step is to wire a CI workflow with at least a quality job, a static-analysis job, and a dependency-scanning job. Each job should run independently, with blocking behavior assigned deliberately: quality and static analysis are good candidates for merge-blocking required checks, while dependency scanning may be better configured as advisory — surfacing findings in CI output without blocking merges on CVEs that have no available fix yet. The recommended enforcement mechanism for the blocking jobs is branch protection configured at the repository level — required status checks that block the merge button until those checks pass — rather than an agent instruction that relies on the agent reading it correctly. An instruction to wait for CI is a document; branch protection is the structural control.

When wiring the workflow, SHA-pin every action you reference. This step is the one most teams defer because it feels like premature hardening. It is not premature. The cost is minutes per repository. The benefit is that your CI infrastructure does not silently change behavior because an action maintainer updated a tag. For a project that relies on CI to independently verify agent output, CI's own stability is not a detail. A workflow that SHA-pins its own actions and then uses those actions to verify agent-produced code is consistent end-to-end. A workflow that uses floating tags is consistent except for the part that matters most.

Choose your error-handling flags deliberately. Snyk with continue-on-error: true and Codecov with fail_ci_if_error: true are not inconsistent. They reflect different judgments about what should block a merge and what should surface as a report. The choice is not "block or ignore" but "block or surface." A blocking Snyk finding with no available fix produces a stalled project; a non-blocking Snyk scan still produces independent CI output about the dependency surface regardless of what the agent reported.

If your team has CI but still treats the agent's self-report as sufficient before CI completes, the operational habit to build is: the agent's count does not close the question, CI's count does. This habit is mechanical in principle and harder in practice than it sounds, because the agent's self-report arrives earlier — usually before CI has finished — and it is usually right. The times it diverges from CI are exactly the times the external-validation layer earns its place in the stack. Those times are not rare; they are the scheduled condition of every project that has environment-gated tests, matrix builds, or a dependency surface that drifts faster than local installs.

If you are starting a new project, wire the three-job shape on the first commit alongside the hook and the governance documents. The reference workflow is 79 lines. The SHA-pinning adds one annotation comment per action. The branch protection rule is a repository setting, not an agent instruction. A project that ships its first commit with a working hook, a working CI pipeline, and SHA-pinned actions has answered the three questions engineering leaders actually ask about agentic coding — governance, error rates, and security vulnerabilities — from its first day of operation. Retrofitting this layer onto a project that has been running without it requires re-auditing every previous agent-produced output that was merged on the agent's word alone. Starting governed is the lower-cost path, and it is only available at the beginning.

The five-layer governance stack is complete when all five layers are in place. The external-validation layer is the last one, and it is the one that makes the whole stack verifiable from outside. Without it, the stack is better than documentation alone — the hooks hold, the agents are specialized, the constitution governs the directives. But the output is still self-reported. The external-validation layer changes "the agent says it passed" to "CI confirms it passed." That distinction is what regulated businesses need before they can ship agentic output into a production environment with confidence.

Get the templates

The CI workflow configurations described in this article — the three-job GovForge reference shape with SHA-pinned actions, the branch protection rule guidance, and the Snyk/Codecov configuration patterns — are available as part of the agentic governance starter kit at etherealogic.ai/agentic-governance-stack-templates. The starter kit includes the document-foundation templates from the first article, the protected-branch hook from the second, and the CI workflow from this one.

References

Anthropic Claude Code documentation — Claude Hooks specification and Settings reference.
GitHub — "Security hardening for GitHub Actions" — recommends pinning third-party actions to a full commit SHA to defend against mutable-tag supply-chain risk.
AGENTS.md open standard — agentsmd/agents.md, governed by the Linux Foundation's Agentic AI Foundation.
Codacy analysis CLI action — codacy/codacy-analysis-cli-action.
Snyk GitHub Actions — snyk/actions.
Codecov GitHub Action — codecov/codecov-action.
First article in this series — CLAUDE.md Is Not Enough: The Governance Stack for Agentic Development.
Second article in this series — Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers.

This is the third and final article in the EthereaLogic series on the agentic governance stack. The full five-layer stack — navigation files, constitutional governance, agent specialization, runtime enforcement, and external validation — is available as a drop-in starter kit at etherealogic.ai/agentic-governance-stack-templates.

DEV Community