DEV Community

Shay Gabay
Shay Gabay

Posted on

We added a dimension for DeepMind's Agent Traps to our AI governance scanner

Google DeepMind published "AI Agent Traps" (SSRN 6372438) on April 1, 2026.

The paper documents 6 attack categories against autonomous AI agents:

  1. Content Injection — hidden HTML/CSS instructions
  2. Semantic Manipulation — authority framing, persona hyperstition
  3. Cognitive State — RAG poisoning, knowledge base contamination
  4. Behavioral Control — action hijacking, sub-agent spawning
  5. Systemic — flash crash patterns, fragment assembly
  6. Human-in-the-Loop — approval fatigue, summary deception

We shipped D17 (Adversarial Resilience) in Warden on April 10.
13 days from paper to production scanner dimension.

What D17 actually checks

D17 scans your codebase for evidence of defenses against each trap type.
It's not a runtime test — it checks whether your code has patterns that
indicate you've thought about adversarial content in your AI pipelines.

Strong signals (3pts each):

  • Content sanitization before LLM context injection
  • RAG document validation patterns
  • Behavioral anomaly detection
  • Approval gate verification

Weak signals (1pt each):

  • Basic input filtering
  • Output content checks

Why this matters

Every type of trap has a documented proof-of-concept.
The attack surface is combinatorial — traps chain.

A single compromised RAG document can trigger a behavioral control trap
that spawns an unauthorized child agent that exfiltrates data.
At each step, the agent is doing exactly what it was told.

Try it

uvx warden-ai scan --format html

The HTML report shows your D17 score with specific findings.
The citation to SSRN 6372438 is in the source code.

The market picture

We also scored 17 known governance vendors on D17.
11 of them score 0.
The three advanced dimensions (D7 HITL, D8 Identity, D17 Adversarial)
require inline gateway architecture to score meaningfully.

Full leaderboard: warden leaderboard (runs in your terminal)
Methodology: SCORING.md in the repo


GitHub logo SharkRouter / warden

AI Agent Governance Scanner — 17-dimension scoring across 7 scan layers. Local-only, privacy-first.

Warden — AI Agent Governance Scanner

PyPI version License: MIT Python 3.10+

Open-source, local-only CLI scanner that evaluates AI agent governance posture across 12 scan layers and 17 dimensions. Scans code patterns, MCP configs, infrastructure, secrets, agent architecture, dependencies, audit compliance, CI/CD pipelines, IaC security, framework-specific governance, multi-language code, and cloud AI services. No data leaves the machine.

Website: sharkrouter.ai · PyPI: warden-ai

Quick Start

# With uv (zero setup, one-shot — recommended)
uvx --from warden-ai warden scan /path/to/your-agent-project

# With pip
pip install warden-ai
warden scan /path/to/your-agent-project

# Optional extras
pip install 'warden-ai[pdf]'   # adds `--format pdf` (weasyprint)
Enter fullscreen mode Exit fullscreen mode

From zero to governance score in under 60 seconds.

HTML Report

Warden generates a self-contained HTML report with interactive score breakdown, actionable recommendations, and a comparison card — works offline and in air-gapped environments.

Warden HTML Report

What It Does

Warden scores your AI agent project across 17 governance dimensions (out of 235 raw points, normalized to…




| MIT | uvx warden-ai scan



---
Enter fullscreen mode Exit fullscreen mode

Top comments (0)