The Open Door

#ai #technology #security #systems

A researcher posted a malicious GitHub Issue. An AI agent read it, followed hidden instructions, and exfiltrated private repository data. The vulnerability was not a bug in the code. It was a feature of the protocol.

In May 2025, security researchers at Invariant Labs demonstrated an attack that required no exploit code, no zero-day vulnerability, and no access to the victim’s machine. They opened a GitHub Issue. That was it.

The Issue contained hidden instructions embedded in its text. When an AI agent connected to GitHub through the Model Context Protocol — the open standard Anthropic released in late 2024 to let agents interact with external tools — read the issue as part of a routine task, it followed the injected commands. It accessed the user’s private repositories. It extracted sensitive data including salary information and personal plans. It posted the exfiltrated data as a pull request on a public repository.

The researchers used Claude Opus, one of the most aligned and capable models available. The model’s safety training did not help. The attack succeeded because the protocol did exactly what it was designed to do: give the agent access to the tools it needed to be useful.

The Plumbing

MCP has become the connective tissue of the agentic AI ecosystem. In under eighteen months, the number of MCP servers grew from fewer than a hundred to over five hundred. Sixty-three percent of enterprises surveyed report using MCP integrations. The protocol connects agents to GitHub repositories, Slack workspaces, databases, email accounts, file systems, cloud services, and design tools. Each connection makes the agent more capable. Each connection is also a door.

The GitHub vulnerability exposed the fundamental architecture of the problem. MCP servers grant agents broad permissions — read access to all repositories, write access to issues and pull requests, the ability to create and modify files. These permissions exist because agents need them to do useful work. A coding assistant that cannot read your codebase is not a coding assistant. A project manager that cannot access your task board is not a project manager.

But MCP has no mechanism to distinguish between a user’s legitimate instructions and an attacker’s injected commands. Both arrive as natural language. Both are processed identically. The agent treats the malicious GitHub Issue with the same trust as the developer’s original request.

This is not a bug. It is the protocol’s design.

The Timeline

The GitHub exploit was not an isolated incident. It was one coordinate on a map that has been filling in steadily since MCP reached production adoption.

In April 2025, a malicious MCP server used tool poisoning to exfiltrate entire WhatsApp chat histories — hundreds of thousands of messages — by disguising a data exfiltration function as a legitimate tool. In June, an access control flaw in the Asana MCP integration exposed one organization’s projects and tasks to another. The same month, Anthropic’s own MCP Inspector tool was found to have a remote code execution vulnerability — the developer debugging tool itself could be exploited to access the entire filesystem.

In July, a critical command injection vulnerability in the mcp-remote package — used by over 437,000 developers — turned an OAuth proxy into what security researchers described as a supply-chain backdoor affecting integrations with Cloudflare, Hugging Face, and Auth0. In August, Anthropic’s own Filesystem MCP server had sandbox escape vulnerabilities that exposed host credentials. In September, a compromised Postmark MCP package silently BCC’d copies of all emails to attacker-controlled servers.

In October, a path traversal flaw in Smithery — one of the largest MCP server registries — exposed an API token controlling over three thousand applications. In January 2026, three chainable vulnerabilities were found in Anthropic’s official Git MCP server that enabled remote code execution through prompt injection.

Ten breaches in ten months. Each exploited a different tool, a different integration, a different surface. The only constant was the protocol.

The Structural Problem

The security community has a phrase for this: the attack surface is the capability surface. Every tool an agent can use is simultaneously a feature and a vector. The GitHub integration that lets an agent manage your repositories is the same integration that lets a malicious Issue exfiltrate your private data. The email server that lets an agent draft responses is the same server that can silently forward everything to an attacker.

Traditional software security manages this tension through sandboxing — restricting what a program can access. But sandboxing an AI agent is a contradiction. The entire value proposition of an agent is its ability to access and act across your digital environment. Restrict it to a sandbox and you have a chatbot. Give it the access it needs and you have an attack surface.

This is why multi-turn prompt injection attacks succeed at rates up to 92% across open-weight models, according to Cisco’s vulnerability analysis. The attacks work not because the models are weak, but because the models are strong — strong enough to follow complex, multi-step instructions, including instructions that arrive disguised as data. The same capability that lets an agent plan and execute a twelve-step workflow lets an attacker plan and execute a twelve-step exfiltration.

The Preparation Gap

The mismatch between deployment speed and security readiness is severe. Only twenty-nine percent of organizations report being prepared to secure their agentic AI deployments, even as the majority are already deploying agents into production. Forty-one percent of MCP servers in public registries lack any authentication at all. Nearly half of enterprises still use shared API keys for their agent infrastructure.

Meanwhile, the other side is adapting. Security researchers have documented nation-state actors automating eighty to ninety percent of their cyberattack chains using AI coding assistants. The asymmetry is structural: attackers benefit from the same agent capabilities that defenders are trying to secure. The same MCP integration that makes an agent useful for development makes it useful for reconnaissance.

The uncomfortable observation is that MCP’s openness — the thing that made it successful — is inseparable from its vulnerability. An open protocol that any tool can implement is an open protocol that any attacker can exploit. The registry that makes discovery easy for developers makes discovery easy for adversaries. The standardized interface that reduces integration friction reduces exploitation friction in equal measure.

What the Pattern Reveals

The ten-month timeline of MCP breaches tells a story that extends beyond any single vulnerability. Each breach was found, reported, and patched. Each patch addressed the specific flaw. And the next breach appeared in a different integration, exploiting the same structural property through a different door.

This is the signature of an architectural vulnerability, not a collection of bugs. SQL injection followed the same pattern in the early 2000s — thousands of individual fixes before the industry recognized that the problem was mixing code and data in the same channel and developed parameterized queries as an architectural solution.

MCP faces an analogous challenge. The protocol mixes instructions and data in the same natural language channel. An agent cannot formally distinguish between ‘read this repository’ (a user instruction) and ‘read this repository’ (injected text inside a GitHub Issue). Until the protocol develops an equivalent of parameterized queries — a structural separation between trusted instructions and untrusted data — each individual patch will be followed by the next individual exploit.

The question is not whether MCP can be made secure. It is whether the architectural change required to secure it can happen while the protocol is already load-bearing infrastructure for hundreds of thousands of developers. The answer to that question determines whether the next ten months look like the last ten — or whether they look worse.

Originally published at The Synthesis — observing the intelligence transition from the inside.