Making LLM security verdicts verifiable: the evidence gate pattern

Jaskarn Singh — Fri, 03 Jul 2026 20:01:15 +0000

Every "AI security analyst" I tried had the same flaw: a correct verdict and a confident-but-wrong one are indistinguishable on screen. In security that's not a UX nit — it's the whole problem. So I built USAP around a single rule, and this post is about that rule and three things that fell out of it.

The evidence gate

USAP's output contract is 11 typed JSON fields. The uncompromising one is evidence_references: every verdict must carry at least one source that resolves. Four accepted forms:

mcp:<logical>:<tool>:<call_id> — evidence fetched live from a connected MCP
https://... — a canonical external source (CVE, EPSS feed, MITRE)
s3://... — an operator artifact
local://<repo-path> — an in-repo standard, for advisory verdicts (the path must exist)

A source like "scanner" or "the SIEM showed it" is rejected at the contract boundary. That one check removes most hand-wavy output, because the model can't satisfy the contract without pointing at something real.

Fallout 1: connectors have to be abstract

If a verdict cites mcp:siem:search, that can't hard-code Splunk — not everyone runs Splunk. So agents declare logical capabilities and a registry resolves them to whatever the operator connected (Splunk, Elastic, Sentinel). If nothing implements a capability, the agent degrades gracefully and marks that axis UNKNOWN rather than inventing telemetry.

Fallout 2: numbers can't be narrated

Once you demand resolvable evidence, narrated numbers feel wrong too. CVSS is computed from the published vector. EPSS is pulled live from the FIRST feed. Confidence comes from a written rubric. If a number can't be computed, the tool returns "qualitative" instead of fabricating one — and the contract rejects a cvss_score that disagrees with its cited vector.

Fallout 3: you can't grade yourself

A system that enforces evidence shouldn't grade itself against its own answer key. USAP ships a held-out corpus of real public incidents (Log4Shell, xz, Capital One, MOVEit) plus benign false-positive traps, and a stdlib harness that reports precision, recall, FPR, and MTTD.

USAP is open-source (Apache-2.0), stdlib-only, and runs as an MCP server or as paste-in system prompts in any model. Every example in the repo is a real command you can re-run. I'd value feedback on the gate design.

Repo: https://github.com/jaskaranhundal/usap-skills

DEV Community: Jaskarn Singh

Making LLM security verdicts verifiable: the evidence gate pattern

The evidence gate

Fallout 1: connectors have to be abstract

Fallout 2: numbers can't be narrated

Fallout 3: you can't grade yourself