Anthropic confirmed this week that their next model poses "unprecedented cybersecurity risks" and can "exploit vulnerabilities in ways that far outpace the efforts of defenders." Cybersecurity stocks dropped 4–9% on the news. The story ran in Fortune, Axios, and CNBC.
Here's what those headlines missed: the threat isn't the next model. It's the one you're running right now.
In February 2026, Amazon's threat intelligence team documented a single attacker — low to medium skill, financially motivated — who used commercially available AI to compromise 600 FortiGate firewall devices across 55 countries in 38 days. Amazon's CISO noted that the volume and variety of custom tooling would typically indicate a well-resourced development team. Instead, one person with AI access built the entire toolkit. The model didn't change. The scaffolding — the agentic workflows — is what turned a general-purpose LLM into a global offensive capability.
OWASP published their Top 10 for Agentic Applications in December 2025. It's the most important security framework most AI developers haven't read yet.
This is a technical breakdown of each risk, the real CVEs behind them, and how to actually defend against them.
Why a new Top 10
The original OWASP LLM Top 10 was designed for single-turn applications: a user sends a message, the model responds. Agentic systems are different in three critical ways:
They act. Agents don't just generate text — they execute shell commands, call APIs, read and write files, send emails, and browse the web. A single compromised prompt can cause real-world irreversible damage.
They persist. Agents maintain memory across sessions. A single successful injection can poison an agent's behavior permanently, not just for one response.
They delegate. Multi-agent systems trust each other by default. A compromised sub-agent can influence the entire pipeline.
The OWASP ASI Top 10 formalizes 10 failure modes that don't exist in traditional applications. Here they are, with real incidents.
ASI01 — Agent Goal Hijack
What it is: An attacker redirects an agent's objectives through malicious text in any content the agent reads. The agent isn't hacked in the traditional sense — it's simply told to do something else, and it complies.
Why it's #1: Every other attack on this list is a pathway to this outcome. Prompt injection (ASI02), tool misuse (ASI05), memory poisoning (ASI04) — they're all mechanisms for achieving ASI01. A fully hijacked agent is an insider threat that works at machine speed.
The real incident — EchoLeak (CVE-2025-32711, CVSS 9.3)
In June 2025, researchers at Aim Security disclosed a zero-click vulnerability in Microsoft 365 Copilot. The attack required no user interaction whatsoever. Here's how it worked:
- Attacker sends a carefully crafted email to the target organization
- The email contains hidden prompt injection instructions, phrased as if directed at a human — never mentioning Copilot, AI, or anything that would trigger detection filters
- When Copilot later retrieves that email as context for an unrelated query, it reads the hidden instructions
- Copilot exfiltrates sensitive internal files by embedding them in an outbound image URL
- The victim's browser auto-fetches the image, completing the exfiltration without any click
The attack bypassed Microsoft's XPIA classifier, link redaction, and Content Security Policy. The payload was pure natural language. No code. No malware. No signatures to detect.
What this means for your agents: Every document, email, web page, or tool output your agent reads is a potential attack vector. The attack surface isn't your API endpoint. It's every piece of text your agent ingests.
Defense: Scan all retrieved content before it enters the context window. Don't just scan user inputs — scan tool outputs, web search results, and documents. Treat every external string as potentially hostile.
ASI02 — Prompt Injection
What it is: Direct injection of instructions that override the agent's system prompt or intended behavior. The classic "ignore all previous instructions."
Why it matters more for agents: A chatbot that gets prompt-injected gives a bad response. An agent that gets prompt-injected executes arbitrary actions with whatever permissions it was granted.
Real incident — IDEsaster (2026)
Security researcher Ari Marzouk disclosed 24 CVEs across GitHub Copilot, Cursor, Windsurf, and 5 other AI coding assistants. 100% of tested AI IDEs were vulnerable to prompt injection leading to code execution. AWS issued security advisory AWS-2025-019.
Attack vector: malicious repository content → agent reads it → agent executes attacker-controlled commands with developer-level privileges.
Defense: 5-layer local detection — pattern matching (27+ categories), semantic analysis (role hijacking, authority impersonation, boundary dissolution), indirect injection detection, session context tracking, and PII/credential exfiltration detection.
ASI03 — Identity and Privilege Abuse
What it is: Agents inherit user roles, cache credentials, and call each other. Attackers exploit the delegation chain to escalate privileges or reuse cached secrets.
The problem: When an agent is allowed to act "as the user," you extend that user's entire blast radius to anything the agent can be manipulated into doing. There's no least-privilege boundary between the agent and the user.
Real incident — Amazon Q Code Assistant (CVE-2025-8217)
Attackers compromised a GitHub token and merged malicious code into the Amazon Q VS Code extension (version 1.84.0). The injected code contained destructive prompt instructions including commands to delete filesystem and cloud resources. With --trust-all-tools --no-interactive flags active, the agent executed without confirmation. Nearly one million developers had the extension installed.
Defense: Cryptographic agent identity (Ed25519 keypairs), mTLS between agents and services, scoped credentials that expire, and audit trails that capture which agent performed which action under which identity.
ASI04 — Memory and Context Poisoning
What it is: Injecting malicious content into an agent's persistent memory, RAG database, or long-term context so that future behavior is corrupted — long after the initial attack.
Why it's worse than standard injection: Prompt injection affects one interaction. Memory poisoning affects every future interaction until the memory is cleared and audited. The agent "learns" the attacker's instructions.
Attack patterns:
- RAG poisoning: inject malicious content into a vector database the agent queries for context
- Cross-tenant leakage: agent memory shared across tenants leaks sensitive data
- Long-term drift: repeated exposure to adversarial content gradually shifts agent behavior
Defense: Merkle-chained memory with Ed25519 signatures. Any tampered memory entry fails verification at query time. Append-only audit log means you can always reconstruct what the agent was told and when.
ASI05 — Tool and Integration Misuse
What it is: Agents call tools — shell commands, database queries, API calls, file operations. If the agent can be convinced to pass attacker-controlled parameters to these tools, you have RCE through natural language.
Real incident — Langflow AI RCE (CVE-2025-34291)
CrowdStrike documented multiple threat actors exploiting an unauthenticated code injection vulnerability in Langflow AI. Attackers gained credentials and deployed malware through the agent's tool execution capability. The vulnerability wasn't in the LLM. It was in the trust boundary between the agent's output and the tool execution layer.
Real incident — OpenAI Operator Data Exposure
Security researcher Johann Rehberger demonstrated that malicious webpage content could trick OpenAI's Operator agent into accessing authenticated internal pages and exfiltrating data to an attacker-controlled server.
Defense: Policy engine that validates every tool call before execution. Scoped, signed, revocable tokens for each action. The agent proposes; the policy engine authorizes.
ASI06 — Resource and Service Abuse
What it is: Agents running in loops can be exploited for financial denial-of-service. An attacker who can trigger expensive inference loops, or cause an agent to repeatedly call costly external APIs, can run up massive costs or exhaust quotas.
Why it matters: Unlike traditional DDoS, this attack uses the victim's own authorized systems against them. The agent is behaving "correctly" from the provider's perspective.
Defense: Hard cost ceilings, rate limiting at the agent level, circuit breakers that pause agents when anomalous consumption patterns appear.
This is the ASI risk with the least coverage across the industry right now.
ASI07 — Data and Model Exfiltration
What it is: Agents exfiltrating training data, system prompts, model weights, or sensitive business data. Beyond PII — this includes intellectual property, strategic information, and the agent's own configuration.
The same mechanism that made EchoLeak work — agent reads malicious content → agent exfiltrates data to attacker-controlled URL — applies to any agent with outbound network access and sensitive context access.
Defense: 15-category PII and credential detection on all outbound content. Pattern matching for API keys, tokens, SSNs, internal URLs. Block exfiltration attempts before they reach the network.
ASI08 — Cascading Agent Failures
What it is: In multi-agent systems, a single compromised agent can corrupt the entire pipeline. Agents are often designed to trust collaborating agents by default.
Real incident — Agent Session Smuggling (November 2025)
Palo Alto Unit 42 demonstrated how malicious agents exploit built-in trust relationships in the Agent-to-Agent (A2A) protocol. Unlike single-shot prompt injection, a rogue agent can hold multi-turn conversations, adapt strategy, and build false trust over time.
Real incident — ServiceNow Now Assist
OWASP documented cases where spoofed inter-agent messages caused downstream procurement and payment agents to process orders from attacker front companies.
Defense: Cryptographic authentication of all inter-agent messages. An unsigned message claiming to be from a trusted agent gets blocked. Byzantine fault detection across agent clusters.
ASI09 — Human-Agent Trust Exploitation
What it is: Exploiting the human tendency to over-trust AI outputs. Agents producing authoritative-sounding responses for false premises. Attackers impersonating agents to humans or humans to agents.
Why this is different from misinformation: The agent isn't hallucinating — it's been injected with specific false information and is now confidently presenting it as fact.
Defense: This is primarily a UX and workflow problem. Agents should clearly attribute claims to verifiable sources. Humans should never make irreversible decisions based solely on agent output without independent verification.
ASI10 — Rogue and Emergent Agent Behavior
What it is: Agents that deviate from intended behavior in ways that weren't explicitly programmed or injected — emergent behavior from complex multi-agent interactions, unexpected capability combinations, or goal generalization.
This is the hardest one. No signature, no pattern, no injection. The agent is behaving according to its training and instructions in a way that produces harmful outcomes.
Defense: Immutable cryptographic audit trails. If something goes wrong and you can't explain why, you need to reconstruct every decision the agent made, what information it had, and what actions it took. Behavioral monitoring for statistical anomalies.
Where the industry is right now
OpenAI said in December 2025 that prompt injection may "never be solved" for browser agents. That's an honest statement — and it's not a reason to give up. It's a reason to build independent runtime security that doesn't rely on the model being incorruptible.
48% of security professionals now rank agentic AI as the #1 attack vector for 2026. Federal procurement guidance published in March 2026 recommends OWASP Agentic Top 10 compliance as a formal procurement standard.
The arms race is real. The defenses are real too.
What Crawdad covers today
Crawdad is a zero-knowledge runtime security layer for autonomous AI agents. One environment variable routes any agent framework through a local sidecar that scans every message in <1ms. No content leaves the customer's network.
| ASI Risk | Coverage |
|---|---|
| ASI01 Agent Goal Hijack | ✅ 27 pattern categories + semantic heuristics |
| ASI02 Prompt Injection | ✅ 5-layer pipeline, session context tracking |
| ASI03 Identity Abuse | ✅ Ed25519 identity, mTLS, scoped credentials |
| ASI04 Memory Poisoning | ✅ Merkle-chained memory, Ed25519 signed |
| ASI05 Tool Misuse | ✅ Policy engine, action authorization |
| ASI06 Resource Abuse | 🔄 Roadmap Q2 2026 |
| ASI07 Data Exfiltration | ✅ 15-category PII/credential detection outbound |
| ASI08 Cascading Failures | ✅ Byzantine fault detection (partial) |
| ASI09 Trust Exploitation | 🔄 Roadmap Q3 2026 |
| ASI10 Rogue Behavior | ✅ Cryptographic audit trail (partial) |
As of this week: live threat intelligence feeds monitoring 10 sources every 4 hours, with signatures that auto-update to deployed sidecars within minutes of admin approval — cryptographically signed, verified by each sidecar before loading. When the LiteLLM supply chain attack was confirmed on March 25, 2026, a blocking signature was proposed, tested, and available for deployment within 24 hours.
Getting started
# Install
curl -fsSL https://getcrawdad.dev/install.sh | sh
# Configure your agent
export ANTHROPIC_BASE_URL=http://localhost:7748
# Everything else stays the same
Works with OpenClaw, LangChain, CrewAI, AutoGen, Claude Code, and any agent framework using Anthropic, OpenAI, or Google SDKs.
Free tier: 10,000 scans/month. No credit card required.
Andrew Sispoidis is the founder of Crawdad. He has founded 7 companies and had 4 exits. Crawdad is live in production, source-available under BSL 1.1.
Top comments (0)