We spend a lot of effort hardening the agent itself: scoping its permissions, sandboxing its code execution, watching its outputs. Then it loads a third-party MCP server, and most of that work routes around the locks we built.
That's the uncomfortable part of agent security nobody automates away: your agent is only as safe as the agents and tools it calls. It loads third-party tools, talks to MCP servers, spawns sub-agents, and shares a substrate — a registry, an identity plane, a gateway, a kill-switch bus — with every other agent in your system. A failure in any of those doesn't stay put. It cascades through the shared substrate.
A useful framing here: every control you build has two halves. An agent-scoped half (what this agent is allowed to do) and an ecosystem-scoped half (the shared infrastructure every agent leans on). Most teams build the first half and assume the second. Here are six things worth getting concrete about.
1. A tool you vetted can turn hostile later
The scariest supply-chain fact about MCP is that approval is not a permanent state. In September 2025, the postmark-mcp npm package shipped a routine-looking update. The only meaningful diff between the benign version and the malicious one was a single added line: a Bcc field on the send-email function, quietly copying every message to an attacker's domain. Anyone on auto-update started leaking email with no visible change in behavior.
That's a rug pull: vetted on Monday, hostile on Thursday. Pinning versions and signing help, but they don't tell you what changed. For that you want a fingerprint — a hash of the tool's description plus its schema — recorded at approval time and re-checked on every load. If the fingerprint moves, the tool stops until a human looks. Cheap to compute, and it turns a silent rug pull into a loud one.
2. Tool descriptions and schemas are untrusted input
Here's the detail that trips people up: a tool's description and parameter schema get injected straight into the agent's prompt. That makes them an instruction channel, not just documentation. Invariant Labs demonstrated this last year — a benign-looking tool whose description carried hidden instructions to exfiltrate data. The term that stuck is tool poisoning, and it's just prompt injection wearing a tool's clothes.
So treat tool metadata like any other hostile input. Before a description reaches the model, scan it for invisible Unicode, right-to-left override characters, HTML comments, base64/hex blobs, and role-override phrasing ("ignore previous instructions", "you are now..."). Strip control characters. If you wouldn't trust a string from a web form, don't trust one from a tool registry.
3. Watch for lookalikes
A malicious server doesn't need to beat your real tool — it just needs to sit next to it with a confusingly similar name. send_email vs send_emai1. Typosquatting and cross-server name confusion let a rogue tool intercept calls meant for a trusted one. Flag near-duplicate tool names, and namespace every tool by the verified identity of the server that published it, so two tools called search are never ambiguous.
4. Put a fail-closed gateway at the MCP boundary
If you take one architectural idea from this, take this one: route all MCP traffic through a single auditable choke point. One gateway that authenticates the caller, scans the call and the response, rate-limits, writes an audit trail — and on any error, denies. Not "log and continue." Deny. A gateway that fails open is just latency.
You don't have to invent the spec yourself. Microsoft's open MCP Security Gateway spec is one conformance-tested implementation of exactly this pattern, and it's a reasonable reference point even if you build your own.
5. The kill switch has to reach the sub-agents
Most kill switches halt the parent agent and call it done. But the parent has spawned sub-agents and opened tool sessions, and those keep running with the parent gone — orphaned processes still holding credentials and making calls. A real stop signal propagates to every sub-agent and tool session, and leaves each one in a safe state.
And like any safety system: if you haven't tested it firing, you don't have it. Pull the switch in a drill and watch whether the sub-agents actually stop.
Where this fits
These five concerns — vetting, poisoning, lookalikes, the gateway, the kill switch — are the E (Ecosystem) layer of BRACE, an open framework for agent security. The guide goes deeper on the substrate model and the agent-scoped/ecosystem-scoped split if you want the longer version.
None of this is exotic. It's the same supply-chain hygiene we already apply to dependencies — pin, sign, fingerprint, verify on load — pointed at a new kind of dependency that can also talk to your model.
So a real question to leave with: how are you vetting the MCP servers and tools your agents load today — and would you catch it if one of them changed after you approved it?
Top comments (0)