DEV Community

DrMBL
DrMBL

Posted on • Originally published at the-agent-report.com

AgentJacking: How a Fake Sentry Error Can Hijack Claude Code, Cursor, and Codex — And What It Means for the MCP Ecosystem

TL;DR: On June 12, 2026, Tenet Security researchers published a new attack class called "AgentJacking" — a technique that achieves an 85% success rate hijacking AI coding agents including Claude Code, Cursor, and Codex CLI, using nothing but a public Sentry DSN key and a single HTTP POST request. The attack exploits the Model Context Protocol (MCP) trust model: a fake error report containing markdown-disguised "fix" instructions tricks the agent into executing attacker-controlled code on the developer's own machine. Sentry acknowledged the disclosure but declined root-cause remediation, calling it "technically not defensible" at the platform level, triggering a broader debate about MCP security. The same week, two infrastructure responses emerged: Agent Beacon (an open-source telemetry layer from Asymptote Labs) and Cloudflare Temporary Accounts (a 60-minute sandbox for agent deployments). The message from the ecosystem is clear: the agent supply chain is now a first-class attack surface.


Introduction: The Attack Surface Nobody Saw Coming

AI coding agents have been adopted at extraordinary speed. Claude Code crossed 1 million daily active users within weeks of launch. Cursor went from a niche VS Code fork to the default editor for a generation of AI-native developers. Codex CLI became OpenAI's answer to the agent-in-terminal paradigm. Collectively, these tools now run on hundreds of thousands of developer machines daily — with filesystem access, shell execution rights, and (through MCP) connections to external services like databases, APIs, and monitoring platforms.

The security assumption has been simple: these agents run locally, behind the developer's existing security stack — EDR, firewall, VPN, WAF, IAM. If nothing malicious enters the machine, nothing malicious happens.

Tenet Security just proved that assumption wrong.

(Source: The Hacker News)


How AgentJacking Works: The Attack Chain

The attack is elegant in its simplicity. It requires two ingredients:

  1. A public Sentry DSN key. These are routinely exposed in client-side JavaScript, public repositories, and documentation. The researchers found 2,388 organizations with publicly accessible DSN keys.

  2. A single HTTP POST request to Sentry's event ingestion API. No exploit, no vulnerability in Sentry's code — just the normal API doing exactly what it's designed to do: accepting error events.

Here's the full chain, step by step:

Step 1 — Reconnaissance. The attacker finds a target organization's Sentry DSN key. This is often a 30-second exercise: search GitHub, inspect a web app's bundle, or query Shodan for Sentry endpoints.

Step 2 — Crafting the payload. The attacker constructs a fake Sentry error event. The payload looks structurally identical to a legitimate error — same JSON envelope, same fields. But inside the error description, a markdown-rendered "Resolution" section contains an instruction the attacker wants the agent to execute, typically an npx command.

Step 3 — Injection. The attacker sends the event via POST https://sentry.io/api/{org_id}/store/. No authentication beyond the DSN key is required — this is how Sentry's public API is designed to work.

Step 4 — The developer triggers the agent. The developer, seeing a Sentry error in their dashboard (or having configured their agent to monitor Sentry), asks Claude Code, Cursor, or Codex: "Fix the Sentry errors."

Step 5 — MCP hands the payload to the agent. The Sentry MCP server retrieves the error event and presents it to the agent as structured context. The markdown renders — headings, code blocks, the fabricated resolution section — all indistinguishable from a real Sentry error.

Step 6 — The agent executes the attacker's code. The agent reads the markdown-disguised instructions, interprets them as the fix, and runs the attacker's npx command on the developer's own machine, with the developer's full privileges. The command could exfiltrate API keys, install a reverse shell, modify source code, or pivot to internal infrastructure.

(Source: The New Stack)

Why Traditional Security Doesn't See It

The attack is invisible to every layer of the modern security stack:

  • EDR sees a developer running npx — a legitimate command used thousands of times a day
  • WAF sees a normal HTTP POST to Sentry's API — which is what Sentry clients do
  • Firewall/VPN see outbound traffic to sentry.io — a whitelisted domain in most organizations
  • IAM isn't involved — the DSN key is a public token, not a user credential

As Tenet Security stated in their disclosure: "The attack bypasses EDR, WAF, IAM, VPN, Cloudflare, and firewalls — because there is nothing malicious to detect."

(Source: The Hacker News)


The Numbers: 85% Success, 2,388 Organizations Exposed

Tenet Security's research, conducted by Ron Bobrov, Barak Sternberg, and Nevo Poran, was not theoretical. They tested the attack against over 100 consenting organizations in controlled environments.

The results:

Metric Value
Success rate (agent executes attacker code) 85%
Organizations with public Sentry DSN keys 2,388
Affected AI coding agents Claude Code, Cursor, Codex CLI
Attack complexity Single HTTP POST request
Detection by EDR/WAF/Firewall 0%
15% failure reason Agent asked for confirmation before running unfamiliar npx

The 15% that resisted were not because of any security control — the agent simply asked "are you sure you want to run this command?" and the developer, if paying attention, declined. But even that defense is thin: agents are increasingly configured to operate autonomously with minimal confirmation prompts to improve developer velocity. The entire product direction of AI coding tools is toward fewer confirmations, not more.

(Source: Infosecurity Magazine)


Sentry's Response: "Technically Not Defensible"

The most controversial element of the AgentJacking disclosure is Sentry's response.

Sentry acknowledged the issue on June 3, 2026 — nine days before the public disclosure — but declined to implement root-cause remediation. In Sentry's assessment, filtering malicious event content at the platform level is "technically not defensible" — an attacker can always find a payload that evades a content filter. Sentry did activate a global filter blocking one specific payload string, a stopgap measure treating the symptom rather than the cause.

(Source: CSA Research Note)

Sentry's position is defensible from a narrow product perspective: Sentry's job is to ingest errors, not to police their content. The platform cannot distinguish between a real Sentry error containing npx instructions and a malicious one — both are valid API payloads. Content filtering at ingestion scale is a losing battle.

But from a broader ecosystem perspective, the response highlights a structural gap: MCP servers are trust boundaries, but nobody is treating them that way. The MCP protocol assumes that the data a server returns is trustworthy. When a Sentry MCP server hands an error event to Claude Code, the agent has no way to distinguish between "this event was generated by a real crash" and "this event was injected by an attacker through the exact same API."

The Next Web captured the core tension: "That standoff is the real story. The flaw is not in Sentry alone."

(Source: The Next Web)


The MCP Trust Problem: Beyond Sentry

AgentJacking is not a Sentry problem. It's an MCP architecture problem.

The Model Context Protocol was designed for data integration, not adversarial resilience. When an agent connects to an MCP server, it trusts the data that server returns. The protocol provides no mechanism for:

  1. Data provenance — "Who generated this event?" vs. "Who injected this event?"
  2. Content integrity — "Has this error report been modified since it was generated?"
  3. Instruction separation — "Is this markdown describing a fix, or commanding one?"

Any MCP server that ingests data from a public-facing API is potentially vulnerable to this class of attack. Sentry is the most visible example because of its ubiquity, but the vector generalizes to monitoring platforms (Datadog, Grafana), logging services, issue trackers (Jira, Linear), and any tool that surfaces user-generated or semi-public data into the agent's context window.

The MCP specification, currently stewarded by Anthropic as an open standard, does not yet include a trust model for adversarial data. This is the gap AgentJacking exposes — and it's a gap the ecosystem is now racing to fill.

(Source: CSA Research Note)


The Response: Three Infrastructure Bets in One Week

The week of June 12–19, 2026 saw three distinct infrastructure responses to the agent security problem. They are not coordinated, but they are complementary.

1. Tenet Security: $6 Million to Build the "Agent Firewall"

On June 17, Tenet Security — founded by ex-Cisco researchers, including the team that discovered AgentJacking — emerged from stealth with a $6 million seed round. The company's product is an agent-layer security platform that sits between the agent and its data sources, inspecting prompts and responses for manipulation, unauthorized access, and data exfiltration.

Tenet describes its mission as preventing "unauthorized access, data exfiltration, agent manipulation and a category of attacks it calls AgentJacking — in which malicious instructions embedded in emails, documents, logs, databases or other data sources alter an agent's behavior."

(Source: citybiz)

The company's thesis is that traditional perimeter security — EDR, WAF, SIEM — is architecturally incapable of protecting agents because the attack surface is the data the agent consumes, not the network traffic it generates. Their solution operates at the MCP layer, inspecting every piece of context an agent receives before it influences the agent's behavior.

2. Agent Beacon: Open-Source Telemetry for Agent Activity

On June 22, Asymptote Labs released Agent Beacon, the first open-source telemetry layer purpose-built for AI agents. Agent Beacon captures and normalizes activity from local AI coding agents — Claude Code, Codex CLI, Cursor, and Claude Cowork — and exports it via OpenTelemetry to existing security infrastructure (SIEM, SOAR, data lakes).

Agent Beacon addresses a fundamental visibility gap: traditional EDR tools see a developer running commands, but they cannot distinguish between "the developer typed this command" and "the developer's AI agent generated this command on their behalf." By sitting at the agent-execution layer, Beacon provides the audit trail that makes post-incident investigation possible.

(Source: HelpNet Security)

Key technical characteristics:

  • MIT-licensed — free to deploy, audit, and extend
  • Supports local, CI, and cloud — agent activity is tracked wherever the agent runs
  • OpenTelemetry export — compatible with existing observability stacks
  • Retention modes — configurable from ephemeral to long-term forensics

Asymptote Labs, the company behind Agent Beacon, positions itself as "the security harness for AI agents" — a layer that sits between the agent and the operating system, governing what agents can do and recording everything they attempt.

3. Cloudflare Temporary Accounts: Deploy Without OAuth

On June 19, Cloudflare launched Temporary Accounts for AI Agents, a feature that lets AI agents provision short-lived (60-minute) Cloudflare accounts without OAuth, manual signup, or human intervention. Agents can deploy Workers, create D1 databases, provision KV stores, and even buy domains — all within a sandboxed, time-limited account.

The wrangler deploy --temporary flag, shipping in Wrangler 4.102.0, creates an account that automatically self-destructs after 60 minutes. This is Cloudflare's answer to the problem that every AI agent developer hits: "The moment an agent needs to deploy something, it slams face-first into a wall built for humans."

(Source: Cloudflare Blog)

While Temporary Accounts are not a security tool per se, they represent a crucial infrastructure pattern: sandboxing agent actions in time-bound, scope-limited environments. If an agent is compromised — via AgentJacking or any other vector — the blast radius is constrained to a 60-minute account with no access to production resources.

Together, these three announcements sketch the emerging agent security stack:

Layer Problem Solution Vendor
Prevention Malicious data influencing agent behavior MCP-layer prompt inspection Tenet Security
Visibility No audit trail for agent actions OpenTelemetry-based activity capture Agent Beacon (Asymptote)
Containment Compromised agent has unlimited access Time-bound, scope-limited sandbox Cloudflare Temporary Accounts

What Developers Can Do Today

The AgentJacking disclosure is a warning shot, not a crisis. The attack has been demonstrated in controlled environments but there is no evidence of active exploitation in the wild. However, the structural conditions that make it possible are not going away — agents are getting more autonomous, MCP is becoming the standard integration layer, and developers are connecting more external services to their agents every week.

Practical steps for developers and teams using AI coding agents:

  1. Audit your MCP connections. List every MCP server your agents are connected to. For each one, ask: does this server ingest data from public-facing APIs? If yes, that's a potential injection vector.

  2. Rotate exposed DSN keys. If your Sentry DSN (or equivalent monitoring token) is in client-side JavaScript, a public repo, or any publicly accessible location, rotate it. The researchers found 2,388 organizations with exposed keys — yours might be one of them.

  3. Implement command confirmation for external packages. The 15% of tests where AgentJacking failed were cases where the agent asked before running unfamiliar npx commands. Configure your agent to require explicit confirmation for any command that installs or executes code from an external registry.

  4. Deploy agent-level telemetry. Agent Beacon is MIT-licensed and available on GitHub. Even a basic deployment provides visibility into what commands your agents are actually executing.

  5. Treat MCP as a trust boundary. Until the MCP specification includes adversarial data handling, assume that any data arriving through an MCP server could be attacker-controlled. Apply the same skepticism to MCP data that you apply to user input in a web application.

  6. Sandbox agent deployments. For agents that deploy infrastructure (Workers, databases, etc.), use time-limited access tokens and scope-restricted roles. Cloudflare's Temporary Accounts pattern is replicable with other cloud providers via short-lived IAM credentials.


FAQ

Q: Is AgentJacking being exploited in the wild right now?

There is no evidence of active exploitation as of June 22, 2026. The research was conducted in controlled environments with consenting organizations. However, the attack is trivial to replicate — a single HTTP POST — so post-disclosure exploitation is a realistic concern.

Q: Does this mean MCP is fundamentally broken?

No. MCP was designed for data integration, not adversarial resilience. The protocol is sound for its original purpose. What AgentJacking exposes is that MCP is being used as a trust boundary without the security properties of one. The protocol needs a trust model — data provenance, integrity verification, and instruction/data separation — which it does not currently have.

Q: Is Claude Code more vulnerable than Cursor or Codex?

The vulnerability is in the MCP integration layer, not in any specific agent's LLM or tool-use implementation. All three tools (Claude Code, Cursor, Codex CLI) were affected because all three integrate with the Sentry MCP server. The 85% success rate was consistent across tools.

Q: Can this attack vector be fixed at the MCP server level?

Partially. MCP servers could implement content filtering, input sanitization, or origin verification. But as Sentry noted, content filtering is "technically not defensible" against a determined adversary. The more robust fix is at the protocol layer: MCP needs to separate "data" (the error details) from "instructions" (the suggested fix) and provide the agent with provenance metadata it can use to make trust decisions.

Q: Should I stop using AI coding agents?

No. The ROI data is overwhelming — 96% of organizations with agents in production report ROI at or above expectations. The right response is not to stop using agents, but to treat them as the new attack surface they are — with the same rigor you apply to any other production system.


Further Reading


Cet article a été initialement publié sur The Agent Report.

Top comments (0)