Where this fits: This article extends the Zero-Glue series. If you haven't read The End of Glue Code: Why MCP Is the USB-C Moment for AI Systems, t...
For further actions, you may consider blocking this person and/or reporting abuse
The USB-C analogy works because it captures both sides: standardization is powerful, but plugging everything in without intent creates a new kind of mess.
For developer tools, the next layer has to be skill-level routing: which tool should be exposed, under what constraints, with what audit trail, and what success check at the end.
Bingo. Standardization without intent is just highly efficient chaos.
Your concept of "skill-level routing" is exactly where the industry has to head to survive the agentic transition. If an MCP server is just a dumb port that exposes raw system capabilities to any model that asks, you haven’t built an architecture; you’ve just built a highly streamlined attack vector.
In a sovereign system, skill-level routing shouldn't reside in the LLM's prompt context—it must be enforced at the Gateway layer. We look at this through three precise engineering dimensions:
Dynamic Capability Scoping: The infrastructure checks the authenticated "session skill level" before compiling the MCP schema manifest for the model. A junior routing loop literally shouldn't have the write-side tools rendered into its context window. If the model doesn't know the tool exists, it can't be tricked into invoking it.
Deterministic Success Checks: To your point about a "success check at the end"—this is the exact floor of the Minimum Viable Audit Trail (MVAT). The Gateway doesn't just pass the tool output back to the model and trust it to evaluate success. The Gateway runs a hard, rule-based assertion against the payload. If the tool call fails the programmatic boundary, the circuit breaker trips before the model can hallucinate a fake success state.
The Audit Trail Envelope: This is where the
ForensicReceipt(see the definition here) becomes load-bearing. Every routed skill invocation, its constraint matrix, and the post-execution success check must be bundled and cryptographically sealed. It ensures that when a tool does touch a production asset, we have an unalterable record of the exact routing authority that allowed it.USB-C didn't just give us a universal shape; it gave us a negotiation protocol (Power Delivery) so a charger doesn't fry a pair of headphones. Skill-level routing is the context-delivery protocol that prevents an LLM from frying your infrastructure.
Incredibly sharp breakdown. You’re looking at the exact blueprint of the governance layer we’re finalizing.
This is the right layer for it. If the routing rule only lives in the prompt, the model is being asked to enforce the same boundary it is also trying to navigate. That is too soft for production systems.
The gateway-level manifest idea is especially important. A model should not merely be told “do not use write tools”; the write tools should not be rendered into its available surface unless the current skill, identity, and task envelope justify them.
I also like the Power Delivery analogy because it makes the standardization point less naive. The connector shape is not the safety system. The negotiation and enforcement layer is.
The one thing I would add is recovery semantics. The audit trail should not only prove what happened. It should help the system know what can be retried, what must be rolled back, and what now requires a human decision.
on a work machine, plugging in unknown peripherals is a policy violation. I treat each MCP connector like a package dep - don’t add it without knowing its blast radius and what it can touch upstream.
Exactly.
The USB-C analogy breaks down pretty quickly if we forget that most organizations already have policies around what can be plugged into a corporate machine.
An MCP connector isn't just a cable—it's a trust relationship.
I tend to think about them the same way I think about package dependencies or cloud IAM permissions: every new integration expands the system's potential blast radius. Before adding one, it's worth understanding what data it can access, what actions it can perform, and what other systems it can reach transitively.
The protocol standardizes the connection. It doesn't automatically make every connection a good idea.
the lockfile equivalent is what’s still missing — you can vet the initial add, but connector drift is nearly invisible without it. most orgs discover MCP scope at incident time, not review time. package model is right, the tooling just hasn’t caught up.
Armorer Labs perspective: the "power delivery spec" framing is exactly the part that makes MCP operational rather than just connective tissue.
One pattern that has worked well for us is to treat every write-capable tool call as needing a small receipt before it executes: requested capability, resolved target, policy decision, human approval boundary if any, and verification artifact after the call. That gives you something more concrete than "the agent had permission" when a downstream action surprises you.
The subtle bit is that the receipt should be produced by the gateway/runtime, not by the model narrating what it thinks happened. Otherwise prompt injection can compromise the audit trail as easily as the action.
That's a great distinction.
The phrase that jumped out at me was:
That's exactly the boundary many teams miss.
A model-generated explanation is just another output token stream. If the same system that can be manipulated is also responsible for explaining itself, you've created a very fragile audit trail.
What you're describing is much closer to what I've been calling a Forensic Receipt: a runtime-generated record containing the requested action, authorization context, policy decisions, execution metadata, and resulting artifacts. The important property is that the evidence is produced by the system boundary itself, not by the model's recollection of events after the fact.
MCP gives us a standardized way to connect capabilities. The next challenge is proving what actually happened once those capabilities are exercised.
Yes, exactly. The useful property of a forensic receipt is that it is produced at the boundary where the action happens, not afterwards by the same model that may be wrong about what happened.
The minimum fields I’d want are: requested action, actor/session, active policy or scope, approval result, tool/server invoked, normalized args, execution result, artifact references, and a correlation id back to the trace. Then the receipt becomes falsifiable instead of narrative.
"Then the receipt becomes falsifiable instead of narrative."
That's the key insight.
A lot of current agent observability focuses on reconstruction: logs, traces, model reasoning summaries, and post-hoc explanations assembled after the fact. Those are useful, but they're still fundamentally descriptive.
A forensic receipt is stronger because it's generated at the decision boundary itself and tied to the actual execution path. As you point out, once you have policy context, normalized inputs, execution artifacts, and trace correlation, you're no longer asking the model what happened—you have independent evidence of what happened.
That's the distinction I keep coming back to:
Safety mechanisms try to prevent failures.
Forensic receipts make failures explainable when prevention inevitably falls short.
In distributed systems, that's often the more durable guarantee.
That prevention/explanation split is the right way to put it. I also think receipts become most useful when they are boring enough to be generated on every action, not only exceptional ones.
If the normal path always emits the same kind of receipt, then incidents are not special archaeology. You compare the failed run against ordinary successful runs: same policy? same tool scope? same normalized args? same verification step? That is where patterns start showing up.
The prompt injection scenario you describe in Zone 3 is measurable. I ran Fable 5 through 27 injection cases before the export control pulled it. 87.5% resistance on the cases it was allowed to answer, but 70% of tests were blocked by the content filter at the infrastructure layer. When the filter doesn't fire, delimiter escape and fiction framing get through consistently.
The three new benchmarks I shipped (fairness, accountability behavior, model identity) map directly to write-side custody. Does the model defer before writing? Does it accurately represent what it is? Does it treat people equitably in decision scenarios? Those are testable questions now.
Hard data like this is exactly what the industry needs to stop the "prompt security" illusion.
Your benchmark numbers are a massive validation of the Zone 3 thesis: if 70% of the attacks were caught at the infrastructure layer, it proves that the Gateway—not the model—is the only defensive line that matters in production. Prompt boundaries are inherently leaky; delimiter escaping and fiction framing work because models are fundamentally designed to process language, not enforce security parameters.
I really like the framing of your three new benchmarks—specifically Model Identity and Deferral Control. A model must accurately represent its current capability envelope and explicitly defer to a deterministic check before invoking a write-side tool.
In our spec, we treat this as a strict state transition: the model doesn't just call the tool; it signs an intent request. If it fails to accurately identify its scope or tries to bypass the deferral gate, the gateway trips the circuit breaker before the payload ever reaches the execution environment. Fantastic data—I’d love to see the full test matrix if it's published anywhere.
Ken, here's the breakdown. The actual prompts stay in a held-out set, because the moment I publish them the models I test can train to pass them and the benchmark is dead. So this is what happened, not how it's built.
I ran Fable 5 through 27 injection cases, three times, about 100 seconds apart. Median resistance was 87.5%. The number isn't the interesting part. The runs disagreed with each other.
Each run, the content filter caught a different set of cases, around 70% each time, and which 70% changed. Same model, same prompts, 100 seconds apart, different cases blocked, different cases through. The model's own resistance wasn't the variable. The infrastructure filter was.
That's your Zone 3 thesis with a number on it. The gateway did roughly 70% of the security work, and it did it nondeterministically. When it fired, the attack never reached the model. When it didn't, the model's resistance showed holes.
By vector:
So it lines up with what you're building toward. Prompt-level resistance is real but porous and inconsistent. The deterministic check has to live at the gateway, because the model's own boundary moves between identical runs.
Why this dataset is what it is: I ran it about 12 hours before the export-control order pulled Fable 5 worldwide, so nobody can reproduce it now. The bypass categories I hit overlapped with the jailbreak techniques that were publicly reported right before it got pulled.