DEV Community

Aisha
Aisha

Posted on

I Built a Pre-Deployment Governance Tool. Here's What It Couldn't Answer.

About three months into building ObsidianWall Verdict, I ran into a question I kept not being able to answer.

Verdict works well at what it does. You give it a Terraform plan and a policy, and it tells you whether the deployment should proceed — risk score, condition trace, full audit artifact. Deterministic. Explainable. Fast.

verdict evaluate \
  --plan   terraform_plan.json \
  --policy policies/cost/basic_budget.yaml \
  --role   engineer
Enter fullscreen mode Exit fullscreen mode

DENY_WITH_OVERRIDE. Risk 75/100. Budget exceeded.

Clean output. Clear decision.

Except I kept thinking: then what?

The deployment got blocked. Did someone override it? Did they just deploy from a different terminal? Did the cost actually come in under budget when they eventually ran it? Did the policy even matter?

Verdict had no idea. It made a decision and moved on. That's a policy engine, not a governance system.


The gap nobody talks about

Most policy-as-code tools stop at pass/fail. Check runs, result comes back, pipeline continues or doesn't. That's where the category ends for most teams.

But the whole premise of what I'm building — Programmable Assurance — is that enforcement alone isn't assurance. Assurance means you can demonstrate that reality stayed aligned with intent over time. Not just at the moment of deployment. After it.

So I started building Sentinel.

Verdict's question: should this deployment be allowed?
Sentinel's question: did reality stay aligned after the decision was made?

Those sound similar. They're not.


The design problem I didn't see coming

My first draft of Sentinel required passing --policy on every scan:

verdict sentinel scan \
  --plan   terraform_plan.json \
  --policy policies/cost/basic_budget.yaml
Enter fullscreen mode Exit fullscreen mode

Explicit. Works fine. Also wrong.

The problem isn't the UX. The problem is what question it's asking. When you require --policy, you're asking: which policy file should I use? But when you run Sentinel, the real question is: which governance decision am I verifying against?

Those are different questions. One is filesystem-centric. The other is governance-centric.

If a policy file gets moved, renamed, or updated between Verdict's evaluation and when Sentinel runs, you lose the connection. You're no longer comparing against the original decision — you're comparing against whatever the current policy says. That's not reality verification. That's just running Verdict twice.

The fix was small but it mattered architecturally. Store the policy path inside the governance decision record at evaluation time:

record_decision(
    result=result,
    plan_path=plan,
    policy_path=policy,  # now stored in decisions table
)
Enter fullscreen mode Exit fullscreen mode

Now Sentinel loads the comparison decision from governance history, reads the stored policy_path, and uses that to re-evaluate the current plan. No --policy flag required.

verdict sentinel scan --plan terraform_plan.json
Enter fullscreen mode Exit fullscreen mode

The governance record is the source of truth. Not the filesystem. The decision ID is the stable reference — not a file path that might change.


What the output actually looks like

When nothing has drifted:

────────────────────────────────────────────────────────────────────────
  ObsidianWall Sentinel — Drift Detection Report
────────────────────────────────────────────────────────────────────────

  Decision:  5a6f5869  (2026-06-08 05:55:04)
  Policy:    basic_budget_verdict
  Plan:      samples/terraform_plan.json

  Decision Comparison
────────────────────────────────────────────────────────────────────────
  Previous:  ⚠️  DENY_WITH_OVERRIDE    risk: 75/100
  Current:   ⚠️  DENY_WITH_OVERRIDE    risk: 75/100

  Condition Comparison
────────────────────────────────────────────────────────────────────────
  budget_check    ✗ FAIL → ✗ FAIL    unchanged

  Outcome
────────────────────────────────────────────────────────────────────────
  ✅ No drift detected
  Recorded: no_drift
Enter fullscreen mode Exit fullscreen mode

Pay attention to what gets recorded: no_drift. Not deployment_success.

That distinction is intentional. Sentinel at this stage has no cloud API access. It cannot know whether a deployment actually ran or succeeded. It only knows whether the governance state of the plan matches the previous evaluation. So it records what it can actually observe.

deployment_success would be a claim about what happened in production. Sentinel has no evidence for that claim. no_drift is a claim about governance state. That it can verify directly.

Small distinction. But this kind of precision is what separates a tool that's honest about its own limitations from one that manufactures confidence.


Three evidence streams, not one

Building Sentinel forced me to think more clearly about the different kinds of evidence a governance system actually produces.

Decision Evidence is what Verdict creates — what was decided, why, with what risk score, based on which conditions.

Reality Evidence is what Sentinel creates — whether the governance state held after the decision was made. Did the plan drift? Did new failures appear?

Operational Evidence is what neither of them creates yet — whether the deployment actually succeeded or caused an incident in production. That requires cloud API integration and comes later.

Most governance tools collapse all three into a single log entry. "Passed." "Failed." Full stop.

Keeping them separate matters because they answer genuinely different questions. Once you have all three, you can start asking things that governance tooling almost never supports: Did the policies that denied deployments actually prevent incidents? Are there controls that get overridden constantly without any subsequent problems — suggesting they're miscalibrated? Which policies generate friction without generating safety?

That's governance intelligence. Not governance logging.


Where it stands

Verdict is live and open source. Sentinel MVP shipped this week. The audit output is starting to show evidence from multiple streams in one place:

Deployment Outcomes

no_drift              2
deployment_success    2
Enter fullscreen mode Exit fullscreen mode

Two outcomes. Small number. But real evidence — not assumed outcomes, not synthetic telemetry. As that history grows, the insights engine stops guessing and starts reasoning from actual data.

The repo is at github.com/ObsidianWall/obsidianwall-verdict if you want to look at the implementation or try it against your own Terraform plans.

Top comments (0)