<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shay Gabay</title>
    <description>The latest articles on DEV Community by Shay Gabay (@shay_gabay_005d3b9ca41233).</description>
    <link>https://dev.to/shay_gabay_005d3b9ca41233</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875472%2F61b04907-f332-4594-8da6-e2fbb3b8d12c.png</url>
      <title>DEV Community: Shay Gabay</title>
      <link>https://dev.to/shay_gabay_005d3b9ca41233</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shay_gabay_005d3b9ca41233"/>
    <language>en</language>
    <item>
      <title>We added a dimension for DeepMind's Agent Traps to our AI governance scanner</title>
      <dc:creator>Shay Gabay</dc:creator>
      <pubDate>Wed, 15 Apr 2026 11:54:25 +0000</pubDate>
      <link>https://dev.to/shay_gabay_005d3b9ca41233/we-added-a-dimension-for-deepminds-agent-traps-to-our-ai-governance-scanner-4p6p</link>
      <guid>https://dev.to/shay_gabay_005d3b9ca41233/we-added-a-dimension-for-deepminds-agent-traps-to-our-ai-governance-scanner-4p6p</guid>
      <description>&lt;p&gt;Google DeepMind published "AI Agent Traps" (SSRN 6372438) on April 1, 2026.&lt;/p&gt;

&lt;p&gt;The paper documents 6 attack categories against autonomous AI agents:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Content Injection — hidden HTML/CSS instructions&lt;/li&gt;
&lt;li&gt;Semantic Manipulation — authority framing, persona hyperstition&lt;/li&gt;
&lt;li&gt;Cognitive State — RAG poisoning, knowledge base contamination&lt;/li&gt;
&lt;li&gt;Behavioral Control — action hijacking, sub-agent spawning&lt;/li&gt;
&lt;li&gt;Systemic — flash crash patterns, fragment assembly&lt;/li&gt;
&lt;li&gt;Human-in-the-Loop — approval fatigue, summary deception&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We shipped D17 (Adversarial Resilience) in Warden on April 10.&lt;br&gt;
13 days from paper to production scanner dimension.&lt;/p&gt;

&lt;h2&gt;
  
  
  What D17 actually checks
&lt;/h2&gt;

&lt;p&gt;D17 scans your codebase for evidence of defenses against each trap type.&lt;br&gt;
It's not a runtime test — it checks whether your code has patterns that&lt;br&gt;
indicate you've thought about adversarial content in your AI pipelines.&lt;/p&gt;

&lt;p&gt;Strong signals (3pts each):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content sanitization before LLM context injection&lt;/li&gt;
&lt;li&gt;RAG document validation patterns&lt;/li&gt;
&lt;li&gt;Behavioral anomaly detection&lt;/li&gt;
&lt;li&gt;Approval gate verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weak signals (1pt each):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic input filtering&lt;/li&gt;
&lt;li&gt;Output content checks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Every type of trap has a documented proof-of-concept.&lt;br&gt;
The attack surface is combinatorial — traps chain.&lt;/p&gt;

&lt;p&gt;A single compromised RAG document can trigger a behavioral control trap&lt;br&gt;
that spawns an unauthorized child agent that exfiltrates data.&lt;br&gt;
At each step, the agent is doing exactly what it was told.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;uvx warden-ai scan --format html&lt;/p&gt;

&lt;p&gt;The HTML report shows your D17 score with specific findings.&lt;br&gt;
The citation to SSRN 6372438 is in the source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The market picture
&lt;/h2&gt;

&lt;p&gt;We also scored 17 known governance vendors on D17.&lt;br&gt;
11 of them score 0.&lt;br&gt;
The three advanced dimensions (D7 HITL, D8 Identity, D17 Adversarial)&lt;br&gt;
require inline gateway architecture to score meaningfully.&lt;/p&gt;

&lt;p&gt;Full leaderboard: warden leaderboard (runs in your terminal)&lt;br&gt;
Methodology: SCORING.md in the repo&lt;/p&gt;





&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/SharkRouter" rel="noopener noreferrer"&gt;
        SharkRouter
      &lt;/a&gt; / &lt;a href="https://github.com/SharkRouter/warden" rel="noopener noreferrer"&gt;
        warden
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      AI Agent Governance Scanner — 17-dimension scoring across 7 scan layers. Local-only, privacy-first.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Warden — AI Agent Governance Scanner&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href="https://pypi.org/project/warden-ai/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/ddd6ef3d93c8c536e5a73e0a04f6f77af373f40f4f65f99ccda97d6055a3ebec/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f77617264656e2d6169" alt="PyPI version"&gt;&lt;/a&gt;
&lt;a href="https://github.com/SharkRouter/warden/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/08cef40a9105b6526ca22088bc514fbfdbc9aac1ddbf8d4e6c750e3a88a44dca/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d626c75652e737667" alt="License: MIT"&gt;&lt;/a&gt;
&lt;a href="https://pypi.org/project/warden-ai/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/e801a66299d2c15286fe0fee660d9ffa666c1c5576e5e1536acc07b49ce8ac8f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f707974686f6e2d332e31302532422d626c7565" alt="Python 3.10+"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open-source, local-only CLI scanner that evaluates AI agent governance posture across &lt;strong&gt;12 scan layers&lt;/strong&gt; and &lt;strong&gt;17 dimensions&lt;/strong&gt;. Scans code patterns, MCP configs, infrastructure, secrets, agent architecture, dependencies, audit compliance, CI/CD pipelines, IaC security, framework-specific governance, multi-language code, and cloud AI services. &lt;strong&gt;No data leaves the machine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://sharkrouter.ai" rel="nofollow noopener noreferrer"&gt;sharkrouter.ai&lt;/a&gt; · &lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/warden-ai/" rel="nofollow noopener noreferrer"&gt;warden-ai&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Quick Start&lt;/h2&gt;
&lt;/div&gt;

&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; With uv (zero setup, one-shot — recommended)&lt;/span&gt;
uvx --from warden-ai warden scan /path/to/your-agent-project

&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; With pip&lt;/span&gt;
pip install warden-ai
warden scan /path/to/your-agent-project

&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Optional extras&lt;/span&gt;
pip install &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;warden-ai[pdf]&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;   &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; adds `--format pdf` (weasyprint)&lt;/span&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;From zero to governance score in under 60 seconds.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;HTML Report&lt;/h2&gt;

&lt;/div&gt;

&lt;p&gt;Warden generates a self-contained HTML report with interactive score breakdown, actionable recommendations, and a comparison card — works offline and in air-gapped environments.&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/SharkRouter/warden/docs/images/warden-report-preview.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FSharkRouter%2Fwarden%2FHEAD%2Fdocs%2Fimages%2Fwarden-report-preview.png" alt="Warden HTML Report"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What It Does&lt;/h2&gt;

&lt;/div&gt;

&lt;p&gt;Warden scores your AI agent project across &lt;strong&gt;17 governance dimensions&lt;/strong&gt; (out of 235 raw points, normalized to…&lt;/p&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/SharkRouter/warden" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;
 | MIT | uvx warden-ai scan



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>security</category>
      <category>news</category>
    </item>
  </channel>
</rss>
