Three checks that separate an agent demo from a production agent

Alex Duch — Sat, 06 Jun 2026 09:05:58 +0000

Shipping an agent demo takes an afternoon. Shipping one that survives a quarter in production is a different job — and the gap is almost never the model. It's three boring things that are usually missing entirely.

I maintain an open, MIT-licensed Agentic Product Standard, and v2.0 was mostly about turning those three things from advice into code you can run. Here they are, with the actual code.

Security is structural, not a filter The most common mistake is treating safety as a guardrail — an input/output filter near the edge. The problem is that filters have a ceiling. The best content classifiers run around 97% accuracy, which means ~3% of prompt-injection attempts land by design. That's not a bug you tune away; it's the nature of filtering.

Real safety comes from architecture. The check I reach for first is Simon Willison's lethal trifecta: an agent becomes an exfiltration tool the moment it has all three of —

access to private data,
exposure to untrusted content, and
the ability to communicate externally.
Any one is fine. All three together is a data-exfiltration channel waiting for a payload hidden in a retrieved document. The fix is never "better filter" — it's break a leg: gate egress, quarantine untrusted input, or scope the data.

Here's the gate, as a CI step. You declare what your agent can touch; the build fails if the trifecta is unmitigated:

agent-capabilities.json

{ "name": "support-agent",

"private_data": true, "untrusted_content": true, "external_comms": true,

"mitigations": [{"leg": "external_comms", "control": "egress allow-list + approval"}] }

LEGS = ("private_data", "untrusted_content", "external_comms")

def evaluate(spec):
present = [l for l in LEGS if spec.get(l) is True]
if len(present) < 3:
return 0, f"OK: only {len(present)}/3 legs present"
broken = {m["leg"] for m in spec.get("mitigations", []) if m.get("control")}
if broken & set(LEGS):
return 0, f"OK: trifecta present but broken at {', '.join(broken)}"
return 1, "FAIL: lethal trifecta, no leg broken"
The second structural control is MCP supply chain. Community MCP servers are untrusted code, and a server can hand you a benign tool description at approval time, then mutate it later (a "rug pull," or tool-definition poisoning). So: pin tool definitions by hash and alert on change. A few lines of bash in CI catches it:

canonical hash per tool, sorted so reordering doesn't trip it but content changes do

jq -cS '(.tools)|sort_by(.name)[]|{name,description,inputSchema}' tools.json \
| while read -r t; do printf '%s %s\n' "$(printf '%s' "$t"|sha256sum|cut -d' ' -f1)" \
"$(printf '%s' "$t"|jq -r .name)"; done > current.lock
diff -u tools.lock current.lock || { echo "::error::MCP tool def changed — possible rug pull"; exit 1; }

Cost is a circuit breaker, not a dashboard Agents burn tokens at a rate chat never did — Anthropic has put multi-agent systems at roughly 15× a chat interaction. Most published multi-agent architectures leave the cost ceiling wide open: there's no breaker on a runaway loop.

v2.0 makes cost a hard control:

a per-run token/cost ceiling enforced in code (a breaker, checked before each model call), not a bill you read later;
prompt/KV caching on stable prefixes (system prompt, tool schemas);
model routing — small model for classification, flagship for reasoning;
and the economics rule people learn the expensive way: only pay the 15× for multi-agent when the task value justifies it. If one agent clears the bar, the orchestra is waste.
The point is that "cost" stops being a surprise the moment it's a number your code enforces and your traces record per task.

Delete your own scaffolding This is the one I think about most. Every rule you write to patch a model's weakness becomes dead weight the day that weakness is gone — and worse, it competes for the model's attention and degrades the rules that still matter.

So the standard adds a maintenance doctrine (adapted, with credit, from Daniel Miessler's PAI): audit every instruction on a cadence with one question —

Would a smarter model make this rule unnecessary?

If yes, it's scaffolding, not architecture — cut it. Tag rules anti-fragile (eval sets, verification harnesses, tool contracts, real failure gotchas → keep) vs fragile (chain-of-thought orchestrators, output parsers, retry cascades → cut or re-test on the next model upgrade). A standard that only grows is one that rots.

Making it checkable
Principles are easy to agree with and impossible to audit. So v2.0 ships a self-assessment scorecard — a binary Yes/No maturity check (M0 Prototype → M1 Shippable → M2 Production → M3 Autonomous-ready), mapped to an autonomy ladder. Your level is the highest band where every gate passes; the first "No" is your next task. "Is it production-ready?" becomes a checklist instead of a vibe.

Alongside it: the red-team kit above plus a CI workflow template that blocks merges when the eval pass-rate slips, and a 2026 refresh (MCP + OAuth 2.1, A2A at the Linux Foundation, OpenTelemetry GenAI tracing, trajectory + pass^k eval metrics).

The one rule under all of it
The model is the variable. The harness is the constant. Invest proportionally.

…with the v2.0 twist: the harness isn't something you accumulate forever. You curate it — growing the parts that compound, deleting the parts that only propped up a weaker model.

It's MIT and vendor-neutral (deliberately not a framework), with an optional Claude Code skill set. I'd rather be told what's wrong with it than collect stars — the deferred list is where I'm least sure.