Google DeepMind published "AI Agent Traps" (SSRN 6372438) on April 1, 2026.
The paper documents 6 attack categories against autonomous AI agents:
- Content Injection — hidden HTML/CSS instructions
- Semantic Manipulation — authority framing, persona hyperstition
- Cognitive State — RAG poisoning, knowledge base contamination
- Behavioral Control — action hijacking, sub-agent spawning
- Systemic — flash crash patterns, fragment assembly
- Human-in-the-Loop — approval fatigue, summary deception
We shipped D17 (Adversarial Resilience) in Warden on April 10.
13 days from paper to production scanner dimension.
What D17 actually checks
D17 scans your codebase for evidence of defenses against each trap type.
It's not a runtime test — it checks whether your code has patterns that
indicate you've thought about adversarial content in your AI pipelines.
Strong signals (3pts each):
- Content sanitization before LLM context injection
- RAG document validation patterns
- Behavioral anomaly detection
- Approval gate verification
Weak signals (1pt each):
- Basic input filtering
- Output content checks
Why this matters
Every type of trap has a documented proof-of-concept.
The attack surface is combinatorial — traps chain.
A single compromised RAG document can trigger a behavioral control trap
that spawns an unauthorized child agent that exfiltrates data.
At each step, the agent is doing exactly what it was told.
Try it
uvx warden-ai scan --format html
The HTML report shows your D17 score with specific findings.
The citation to SSRN 6372438 is in the source code.
The market picture
We also scored 17 known governance vendors on D17.
11 of them score 0.
The three advanced dimensions (D7 HITL, D8 Identity, D17 Adversarial)
require inline gateway architecture to score meaningfully.
Full leaderboard: warden leaderboard (runs in your terminal)
Methodology: SCORING.md in the repo
SharkRouter
/
warden
AI Agent Governance Scanner — 17-dimension scoring across 7 scan layers. Local-only, privacy-first.
Warden — AI Agent Governance Scanner
Open-source, local-only CLI scanner that evaluates AI agent governance posture across 12 scan layers and 17 dimensions. Scans code patterns, MCP configs, infrastructure, secrets, agent architecture, dependencies, audit compliance, CI/CD pipelines, IaC security, framework-specific governance, multi-language code, and cloud AI services. No data leaves the machine.
Website: sharkrouter.ai · PyPI: warden-ai
Quick Start
# With uv (zero setup, one-shot — recommended)
uvx --from warden-ai warden scan /path/to/your-agent-project
# With pip
pip install warden-ai
warden scan /path/to/your-agent-project
# Optional extras
pip install 'warden-ai[pdf]' # adds `--format pdf` (weasyprint)
From zero to governance score in under 60 seconds.
HTML Report
Warden generates a self-contained HTML report with interactive score breakdown, actionable recommendations, and a comparison card — works offline and in air-gapped environments.
What It Does
Warden scores your AI agent project across 17 governance dimensions (out of 235 raw points, normalized to…
| MIT | uvx warden-ai scan
---

Top comments (0)