Every "AI security analyst" I tried had the same flaw: a correct verdict and a confident-but-wrong one are indistinguishable on screen. In security that's not a UX nit — it's the whole problem. So I built USAP around a single rule, and this post is about that rule and three things that fell out of it.
The evidence gate
USAP's output contract is 11 typed JSON fields. The uncompromising one is evidence_references: every verdict must carry at least one source that resolves. Four accepted forms:
-
mcp:<logical>:<tool>:<call_id>— evidence fetched live from a connected MCP -
https://...— a canonical external source (CVE, EPSS feed, MITRE) -
s3://...— an operator artifact -
local://<repo-path>— an in-repo standard, for advisory verdicts (the path must exist)
A source like "scanner" or "the SIEM showed it" is rejected at the contract boundary. That one check removes most hand-wavy output, because the model can't satisfy the contract without pointing at something real.
Fallout 1: connectors have to be abstract
If a verdict cites mcp:siem:search, that can't hard-code Splunk — not everyone runs Splunk. So agents declare logical capabilities and a registry resolves them to whatever the operator connected (Splunk, Elastic, Sentinel). If nothing implements a capability, the agent degrades gracefully and marks that axis UNKNOWN rather than inventing telemetry.
Fallout 2: numbers can't be narrated
Once you demand resolvable evidence, narrated numbers feel wrong too. CVSS is computed from the published vector. EPSS is pulled live from the FIRST feed. Confidence comes from a written rubric. If a number can't be computed, the tool returns "qualitative" instead of fabricating one — and the contract rejects a cvss_score that disagrees with its cited vector.
Fallout 3: you can't grade yourself
A system that enforces evidence shouldn't grade itself against its own answer key. USAP ships a held-out corpus of real public incidents (Log4Shell, xz, Capital One, MOVEit) plus benign false-positive traps, and a stdlib harness that reports precision, recall, FPR, and MTTD.
USAP is open-source (Apache-2.0), stdlib-only, and runs as an MCP server or as paste-in system prompts in any model. Every example in the repo is a real command you can re-run. I'd value feedback on the gate design.
Top comments (0)