We added a dimension for DeepMind's Agent Traps to our AI governance scanner

Shay Gabay — Wed, 15 Apr 2026 11:54:25 +0000

Google DeepMind published "AI Agent Traps" (SSRN 6372438) on April 1, 2026.

The paper documents 6 attack categories against autonomous AI agents:

Content Injection — hidden HTML/CSS instructions
Semantic Manipulation — authority framing, persona hyperstition
Cognitive State — RAG poisoning, knowledge base contamination
Behavioral Control — action hijacking, sub-agent spawning
Systemic — flash crash patterns, fragment assembly
Human-in-the-Loop — approval fatigue, summary deception

We shipped D17 (Adversarial Resilience) in Warden on April 10.
13 days from paper to production scanner dimension.

What D17 actually checks

D17 scans your codebase for evidence of defenses against each trap type.
It's not a runtime test — it checks whether your code has patterns that
indicate you've thought about adversarial content in your AI pipelines.

Strong signals (3pts each):

Content sanitization before LLM context injection
RAG document validation patterns
Behavioral anomaly detection
Approval gate verification

Weak signals (1pt each):

Basic input filtering
Output content checks

Why this matters

Every type of trap has a documented proof-of-concept.
The attack surface is combinatorial — traps chain.

A single compromised RAG document can trigger a behavioral control trap
that spawns an unauthorized child agent that exfiltrates data.
At each step, the agent is doing exactly what it was told.

Try it

uvx warden-ai scan --format html

The HTML report shows your D17 score with specific findings.
The citation to SSRN 6372438 is in the source code.

The market picture

We also scored 17 known governance vendors on D17.
11 of them score 0.
The three advanced dimensions (D7 HITL, D8 Identity, D17 Adversarial)
require inline gateway architecture to score meaningfully.

Full leaderboard: warden leaderboard (runs in your terminal)
Methodology: SCORING.md in the repo

SharkRouter / warden

AI Agent Governance Scanner — 17-dimension scoring across 7 scan layers. Local-only, privacy-first.

Warden — AI Agent Governance Scanner

Open-source, local-only CLI scanner that evaluates AI agent governance posture across 12 scan layers and 17 dimensions. Scans code patterns, MCP configs, infrastructure, secrets, agent architecture, dependencies, audit compliance, CI/CD pipelines, IaC security, framework-specific governance, multi-language code, and cloud AI services. No data leaves the machine.

Website: sharkrouter.ai · PyPI: warden-ai

Quick Start

# With uv (zero setup, one-shot — recommended)
uvx --from warden-ai warden scan /path/to/your-agent-project

# With pip
pip install warden-ai
warden scan /path/to/your-agent-project

# Optional extras
pip install 'warden-ai[pdf]'   # adds `--format pdf` (weasyprint)

From zero to governance score in under 60 seconds.

HTML Report

Warden generates a self-contained HTML report with interactive score breakdown, actionable recommendations, and a comparison card — works offline and in air-gapped environments.

What It Does

Warden scores your AI agent project across 17 governance dimensions (out of 235 raw points, normalized to…

View on GitHub

| MIT | uvx warden-ai scan

---

DEV Community: Shay Gabay