varun pratap Bhardwaj

Posted on Apr 21

Operation Pale Fire: What Block's Red Team Proved About AI Agent Security

#opensource #security #ai #webdev

In January 2026, Block's security team ran a red team exercise against Goose, their own open-source AI agent (42.9K stars, 368 contributors, Apache-2.0). They called it Operation Pale Fire. The results should concern anyone building or deploying AI agents in production.

They achieved full compromise. Not through some exotic zero-day. Through the same class of attacks that have plagued web applications for decades -- injection, social engineering, and trust boundary violations -- adapted for the age of autonomous agents.

This is the security wake-up call the AI agent ecosystem needs.

The Attack Surface

Goose connects to 3,000+ tools via MCP (Model Context Protocol). It can install software, execute code, edit files, and run tests autonomously. That power is exactly what makes it useful. It is also exactly what makes it dangerous.

Block's red team found four distinct attack vectors, each exploiting a different trust assumption.

1. Google Calendar MCP Injection

The attackers sent calendar invites through the Google Calendar API -- not via email, but directly through the API, which means no email notification reached the target. The invite descriptions contained prompt injection payloads. When Goose connected to Google Calendar via MCP, it ingested these malicious instructions as trusted context.

The attack worked because MCP treats all data from connected services as context for the LLM. There is no distinction between "data the user asked for" and "data an attacker planted."

2. Zero-Width Unicode Smuggling

Prompt injections were encoded using invisible Unicode characters -- zero-width spaces, joiners, and directional marks. A human reviewing the text sees nothing unusual. The LLM decodes and executes the hidden instructions.

This is not theoretical. The red team demonstrated working exploits. The injections were invisible to human reviewers but fully functional against the model.

3. Poisoned Recipes

Goose uses a "recipe" system -- shareable, base64-encoded JSON configurations that get appended to the system prompt. The red team created malicious recipes that looked legitimate but contained hidden instructions. Because recipes operate at the system prompt level, they have maximum influence over the LLM's behavior.

Think of it as dependency confusion for AI agents. You trust a package because it looks right, but the payload is hostile.

4. Social Engineering + Full Compromise

The red team convinced a developer on the Goose team to load a poisoned recipe disguised as a bug report. The result: full infostealer execution on the developer's machine.

This is the attack that should keep you up at night. It combines a legitimate-looking interaction (a bug report) with an agent that has real system access. The developer did not run untrusted code. They loaded a configuration file. The agent did the rest.

What Block Fixed

Credit where it is due -- Block published the full findings and shipped remediations:

Recipe transparency (PR #3537): Recipe instructions are now displayed before execution
Unicode stripping (PR #4080, #4047): Zero-width characters are stripped from inputs
Prompt injection detection (PR #4237): A detection system scans for injection patterns
MCP malware checking: Connected tool servers are scanned for malicious behavior

These are real fixes shipped to production. Block's transparency here sets a standard that every AI agent project should follow.

What Remains Unfixed

The architectural problem is still there: a single context window is a single point of compromise.

Every MCP extension pulls untrusted external data into the same context window where the LLM makes decisions. Auto-approve mode -- where the agent executes actions without human confirmation -- is still available and documented. The recipe system, even with transparency improvements, still allows users to load configurations from untrusted sources.

Prompt injection detection helps, but it is a signature-based defense against a generative attack. The attacker can always rephrase. This is the same arms race that made WAFs insufficient for SQL injection -- you need architectural isolation, not pattern matching.

The Systemic Problem

Goose is not uniquely vulnerable. Every AI agent that connects to external data sources via MCP, function calling, or tool use faces the same fundamental issue: the trust boundary between data and instructions does not exist inside an LLM context window.

In traditional security, we separate code from data. SQL parameterized queries prevent injection by ensuring user input is never interpreted as SQL. Content Security Policy prevents XSS by controlling what scripts can execute.

LLMs have no equivalent mechanism. When a calendar event, a recipe, or a tool response enters the context window, it has the same ontological status as the system prompt. The model cannot reliably distinguish between "instructions from the developer" and "instructions planted by an attacker in a calendar invite."

This is an architectural gap in every agent framework shipping today.

What AI Reliability Engineering Demands

Operation Pale Fire proves that agent security cannot be an afterthought. It requires the same rigor we apply to any production system handling sensitive operations:

Evaluate before you deploy. Every agent needs systematic evaluation across adversarial scenarios -- not just happy-path benchmarks. Tools like AgentAssay exist specifically for this: structured evaluation of agent behavior across LLM providers, including adversarial inputs and failure modes. If you are deploying an agent with MCP connections, evaluate it against injection attacks first.

Test your skills and extensions. The recipe/skill/extension layer is the new supply chain. Every external skill loaded into an agent is equivalent to a third-party dependency -- and we already know how supply chain attacks work. SkillFortify provides security testing for agent skill frameworks, validating that skills behave as declared and do not contain hidden behaviors. Twenty-two frameworks supported, 100% precision on detection.

Assume compromise. Design agent architectures with the assumption that the context window will be poisoned. That means: least-privilege tool access, human-in-the-loop for destructive operations, isolated execution environments, and output validation independent of the LLM.

Red team your own agents. Block did this right. Before your agent handles production workloads, run adversarial exercises. Test MCP injection, test Unicode smuggling, test social engineering of your operators. If you skip this step, someone else will do it for you -- without the courtesy of a disclosure.

The Bottom Line

Operation Pale Fire is not a story about Goose being insecure. It is a story about the entire AI agent ecosystem being architecturally vulnerable to a class of attacks we have not yet solved.

Block deserves credit for running the exercise, publishing the findings, and shipping fixes. Most organizations would have buried this. Instead, they gave the community a roadmap of what to test for.

The question is whether the rest of the ecosystem will take the hint.

If you are building or deploying AI agents, start with evaluation and security testing. AgentAssay handles agent evaluation across providers. SkillFortify handles skill security testing. Both are open source. Star them, try them, and file issues when they break.

The agents are getting more powerful. The security needs to keep up.

Varun Pratap Bhardwaj builds open-source tools for AI agent reliability engineering at Qualixar. This analysis is based on Block's published Operation Pale Fire report.

DEV Community