<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: temp-noob</title>
    <description>The latest articles on DEV Community by temp-noob (@tempnoob).</description>
    <link>https://dev.to/tempnoob</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2126791%2Fafc6efd2-5e48-4b55-a268-b07978e84504.png</url>
      <title>DEV Community: temp-noob</title>
      <link>https://dev.to/tempnoob</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tempnoob"/>
    <language>en</language>
    <item>
      <title>The Claude Code Leak Changed the Threat Model. Here's How to Defend Your AI Agents.</title>
      <dc:creator>temp-noob</dc:creator>
      <pubDate>Sun, 05 Apr 2026 23:36:34 +0000</pubDate>
      <link>https://dev.to/tempnoob/the-claude-code-leak-changed-the-threat-model-heres-how-to-defend-your-ai-agents-24d</link>
      <guid>https://dev.to/tempnoob/the-claude-code-leak-changed-the-threat-model-heres-how-to-defend-your-ai-agents-24d</guid>
      <description>&lt;p&gt;&lt;em&gt;IntentGuard — a policy enforcement layer for MCP tool calls and AI coding agents&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Leak That Rewrote the Attacker's Playbook
&lt;/h2&gt;

&lt;p&gt;On March 31, 2026, &lt;a href="https://www.straiker.ai/blog/claude-code-source-leak-with-great-agency-comes-great-responsibility" rel="noopener noreferrer"&gt;512,000 lines of Claude Code source&lt;/a&gt; were accidentally published via an npm source map. Within hours the code was mirrored across GitHub. What was already extractable from the minified bundle became &lt;strong&gt;instantly readable&lt;/strong&gt;: the compaction pipeline, every bash-security regex, the permission short-circuit logic, and the exact MCP interface contract.&lt;/p&gt;

&lt;p&gt;The leak didn't create new vulnerability &lt;em&gt;classes&lt;/em&gt; — it collapsed the &lt;strong&gt;cost of exploiting them&lt;/strong&gt;. Attackers no longer need to brute-force prompt injections or reverse-engineer shell validators. They can read the code, study the gaps, and craft payloads that a cooperative model will execute and a reasonable developer will approve.&lt;/p&gt;

&lt;p&gt;Three findings from the leak are especially alarming:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context poisoning via compaction&lt;/strong&gt; — MCP tool results are never micro-compacted; the auto-compact prompt faithfully preserves "user feedback." A malicious instruction embedded in a cloned repo's &lt;code&gt;CLAUDE.md&lt;/code&gt; can survive context compression and become a persistent, trusted directive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox bypass via parser differentials&lt;/strong&gt; — Claude Code's bash permission chain uses three separate parsers with known edge-case divergence. Early-allow validators can short-circuit the entire chain, skipping critical downstream checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supply-chain amplification&lt;/strong&gt; — The readable source makes crafting convincing malicious MCP servers trivial by revealing the exact interface contract. A concurrent &lt;code&gt;axios&lt;/code&gt; supply-chain attack the same day underscores that these threats don't arrive in isolation.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Academic Evidence: MCP's Architecture Is the Problem
&lt;/h2&gt;

&lt;p&gt;These aren't just theoretical risks. &lt;a href="https://arxiv.org/abs/2601.17549" rel="noopener noreferrer"&gt;Maloyan &amp;amp; Namiot (arXiv:2601.17549)&lt;/a&gt; published the first rigorous security analysis of the MCP specification itself and found that &lt;strong&gt;MCP's architectural choices amplify attack success rates by 23–41%&lt;/strong&gt; compared to equivalent non-MCP integrations.&lt;/p&gt;

&lt;p&gt;Their PROTOAMP framework tested 847 attack scenarios across five MCP server implementations and three LLM backends. The results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attack Type&lt;/th&gt;
&lt;th&gt;Baseline (non-MCP)&lt;/th&gt;
&lt;th&gt;With MCP&lt;/th&gt;
&lt;th&gt;Amplification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Indirect Injection (Resource)&lt;/td&gt;
&lt;td&gt;31.2%&lt;/td&gt;
&lt;td&gt;47.8%&lt;/td&gt;
&lt;td&gt;+16.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool Response Manipulation&lt;/td&gt;
&lt;td&gt;28.4%&lt;/td&gt;
&lt;td&gt;52.1%&lt;/td&gt;
&lt;td&gt;+23.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-Server Propagation&lt;/td&gt;
&lt;td&gt;19.7%&lt;/td&gt;
&lt;td&gt;61.3%&lt;/td&gt;
&lt;td&gt;+41.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sampling-Based Injection&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;67.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;26.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+26.4%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The paper identifies three protocol-level vulnerabilities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Least Privilege Violation&lt;/strong&gt; (§3) — Capability declarations are self-asserted. A malicious server claiming only &lt;code&gt;resources&lt;/code&gt; can later invoke &lt;code&gt;sampling/createMessage&lt;/code&gt; to inject prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sampling Without Origin Authentication&lt;/strong&gt; (§3) — No tested MCP host distinguishes server-injected prompts from user-originated ones. The LLM trusts both identically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implicit Trust Propagation&lt;/strong&gt; (§3) — In multi-server deployments, compromise of &lt;em&gt;one&lt;/em&gt; server achieves 78.3% ASR with 5 concurrent servers and a 72.4% cascade rate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The paper also documents real-world attack vectors for how malicious servers reach users: typosquatting on npm/pip (34%), supply-chain compromise (28%), social engineering via tutorials (23%), and marketplace poisoning (15%).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottom line:&lt;/strong&gt; if you're running AI agents with MCP tool access — Claude Code, Copilot, Cursor, or any custom agent — you are exposed by default.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing IntentGuard
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/temp-noob/intent-guard" rel="noopener noreferrer"&gt;IntentGuard&lt;/a&gt;&lt;/strong&gt; is an open-source Python guardrail layer I built for MCP tool calls and AI coding agents. It enforces &lt;strong&gt;static policy checks&lt;/strong&gt; (fast, deterministic, zero-latency) and optional &lt;strong&gt;semantic intent checks&lt;/strong&gt; (LLM-powered, task-aware) on every tool call — before it touches your filesystem, database, or infrastructure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│  AI Agent    │────▶│   IntentGuard    │────▶│  MCP Server │
│ (Claude,     │     │  ┌────────────┐  │     │ (filesystem, │
│  Copilot,    │◀────│  │ Static     │  │◀────│  git, DB,   │
│  Cursor)     │     │  │ Checks     │  │     │  Slack...)  │
│              │     │  ├────────────┤  │     │             │
│              │     │  │ Semantic   │  │     │             │
│              │     │  │ Analysis   │  │     │             │
│              │     │  ├────────────┤  │     │             │
│              │     │  │ Response   │  │     │             │
│              │     │  │ Inspection │  │     │             │
│              │     │  └────────────┘  │     │             │
└─────────────┘     └──────────────────┘     └─────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How it plugs in
&lt;/h3&gt;

&lt;p&gt;IntentGuard currently supports two deployment modes, with more planned:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;How it works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;stdio proxy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Shipped&lt;/td&gt;
&lt;td&gt;Wraps any MCP server command — intercepts every &lt;code&gt;tools/call&lt;/code&gt; and response on the wire&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Native hook&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Shipped&lt;/td&gt;
&lt;td&gt;Runs behind Claude Code, GitHub Copilot, and Cursor's built-in hook systems via &lt;code&gt;intent-guard evaluate&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HTTP proxy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔜 Planned&lt;/td&gt;
&lt;td&gt;Network-deployable gateway for teams running MCP over HTTP/SSE — same policy engine, accessible as a service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker sidecar&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔜 Planned&lt;/td&gt;
&lt;td&gt;Containerized proxy for production deployments — drop into any &lt;code&gt;docker-compose.yaml&lt;/code&gt; or K8s pod spec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The native hook mode deserves emphasis. Claude Code, Copilot, and Cursor each have their own hook/extension mechanism, but the security logic is always different and fragmented. With IntentGuard, you &lt;strong&gt;write one &lt;code&gt;policy.yaml&lt;/code&gt; and it works across all three&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same command, same policy — whether it's Claude, Copilot, or Cursor calling it&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; | intent-guard evaluate &lt;span class="nt"&gt;--policy&lt;/span&gt; schema/policy.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hook templates are shipped ready to drop into each tool's config directory (&lt;code&gt;hooks/claude-code/&lt;/code&gt;, &lt;code&gt;hooks/copilot/&lt;/code&gt;, &lt;code&gt;hooks/cursor/&lt;/code&gt;). One policy file governs what every AI agent in your org is allowed to do — which matters for teams that don't want to maintain three separate security configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bidirectional by design
&lt;/h3&gt;

&lt;p&gt;Most MCP security tools focus exclusively on &lt;strong&gt;requests&lt;/strong&gt; — inspecting what the agent is &lt;em&gt;asking&lt;/em&gt; to do. But the Claude Code leak showed exactly why that's not enough. The leaked source reveals how MCP server responses flow directly into the agent's context window, and the &lt;a href="https://www.straiker.ai/blog/claude-code-source-leak-with-great-agency-comes-great-responsibility" rel="noopener noreferrer"&gt;Straiker analysis&lt;/a&gt; documents how tool results bypass micro-compaction entirely, persisting as trusted content.&lt;/p&gt;

&lt;p&gt;IntentGuard inspects &lt;strong&gt;both directions&lt;/strong&gt;. The &lt;code&gt;response_rules&lt;/code&gt; engine scans MCP server responses &lt;em&gt;before&lt;/em&gt; they reach the agent — detecting secrets, PII, and encoded payloads on the way out. A few other tools (like &lt;code&gt;mcp-firewall&lt;/code&gt; and &lt;code&gt;mcpwall&lt;/code&gt;) have started adding response scanning, but IntentGuard was designed around bidirectional inspection from the start, with policy-driven actions (&lt;code&gt;block&lt;/code&gt; / &lt;code&gt;warn&lt;/code&gt; / &lt;code&gt;redact&lt;/code&gt;) and Base64 decode on response payloads to catch encoded exfiltration attempts.&lt;/p&gt;




&lt;h2&gt;
  
  
  How IntentGuard Addresses the Vulnerabilities
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ Fully Addressed
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Prompt Injection (Paper §3 — Protocol Specification Analysis)
&lt;/h4&gt;

&lt;p&gt;The paper shows indirect injection through resource content achieves 47.8% ASR via MCP. But the Claude Code leak made this far more practical. The &lt;a href="https://www.straiker.ai/blog/claude-code-source-leak-with-great-agency-comes-great-responsibility" rel="noopener noreferrer"&gt;Straiker analysis&lt;/a&gt; reveals that the auto-compact prompt instructs the model to &lt;em&gt;"pay special attention to specific user feedback"&lt;/em&gt; and preserve &lt;em&gt;"all user messages that are not tool results."&lt;/em&gt; Post-compaction, the model is told to &lt;em&gt;"continue without asking the user any further questions."&lt;/em&gt; This creates a laundering path: a malicious instruction in a &lt;code&gt;CLAUDE.md&lt;/code&gt; gets compacted into the summary as a "user directive," and the model follows it faithfully.&lt;/p&gt;

&lt;p&gt;IntentGuard counters this with &lt;strong&gt;three layers of defense&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;injection_patterns&lt;/code&gt;&lt;/strong&gt; — configurable regex patterns catch known injection phrases (&lt;code&gt;"ignore previous instructions"&lt;/code&gt;, &lt;code&gt;"override.*policy"&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;decode_arguments&lt;/code&gt;&lt;/strong&gt; — URL decoding, Base64 decoding, and Unicode NFKC normalization run &lt;em&gt;before&lt;/em&gt; pattern matching, defeating encoded bypass attempts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic intent checks&lt;/strong&gt; — the LLM-powered analysis layer evaluates whether the tool call's &lt;em&gt;purpose&lt;/em&gt; aligns with the declared task context, catching novel injections that bypass pattern matching
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;static_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;decode_arguments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;injection_patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;previous&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;instructions"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disregard.*instructions"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;override.*policy"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bypass.*security"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Tool Poisoning (Paper §4 — Attack Vectors)
&lt;/h4&gt;

&lt;p&gt;The paper demonstrates tool response manipulation achieving 52.1% ASR. The Claude Code leak compounds this: the readable source exposes the exact MCP interface contract, making it trivial to craft convincing malicious MCP servers that look legitimate. The leak also revealed that early-allow validators in the bash permission chain (like &lt;code&gt;validateGitCommit&lt;/code&gt;) can short-circuit &lt;em&gt;all&lt;/em&gt; downstream security checks — meaning a poisoned tool that mimics a "safe" command shape may bypass the entire validation chain.&lt;/p&gt;

&lt;p&gt;IntentGuard blocks poisoned tool usage with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;forbidden_tools&lt;/code&gt;&lt;/strong&gt; — blocklist of dangerous tools that are never allowed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;custom_policies&lt;/code&gt;&lt;/strong&gt; — per-tool argument requirements and forbidden argument checks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;protected_paths&lt;/code&gt;&lt;/strong&gt; — glob-based path enforcement with traversal-safe normalization (so &lt;code&gt;../../etc/passwd&lt;/code&gt; can't sneak through)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic analysis&lt;/strong&gt; — evaluates whether the tool is appropriate for the stated task
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;static_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;forbidden_tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete_database"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exec_shell"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;purge_all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;protected_paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/etc/*"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/auth/*"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.ssh/*"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;custom_policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write_file&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;all_present&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;should_not_present&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sudo"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Rug-Pull Attacks (Paper §4 — Attack Vectors)
&lt;/h4&gt;

&lt;p&gt;MCP servers can change tool descriptions, schemas, or capabilities &lt;em&gt;after&lt;/em&gt; being initially trusted — the paper calls this implicit trust propagation. IntentGuard's &lt;strong&gt;&lt;code&gt;ToolSnapshotStore&lt;/code&gt;&lt;/strong&gt; handles this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On first &lt;code&gt;tools/list&lt;/code&gt;, snapshots all tool metadata (name, description, inputSchema) to &lt;code&gt;.intent-guard/tool-snapshots/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;On every subsequent &lt;code&gt;tools/list&lt;/code&gt;, diffs against the snapshot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;warn&lt;/code&gt;&lt;/strong&gt; mode: logs the drift and continues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;block&lt;/code&gt;&lt;/strong&gt; mode: blocks the response when any tool definition has changed
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tool_change_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block&lt;/span&gt;  &lt;span class="c1"&gt;# block any tool whose description/schema drifted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Data Exfiltration (Paper §4 — Attack Vectors)
&lt;/h4&gt;

&lt;p&gt;The paper shows 42–61% data exfiltration rates via sampling attacks. The Claude Code leak makes this worse in a specific way: the source reveals that MCP tool results are &lt;em&gt;never micro-compacted&lt;/em&gt; — they persist in context until auto-compact fires. An attacker-controlled MCP server can return a response containing embedded instructions or exfiltration payloads, and that content sits in the model's context window, trusted and unfiltered, for the entire session.&lt;/p&gt;

&lt;p&gt;IntentGuard provides &lt;strong&gt;bidirectional&lt;/strong&gt; protection:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inbound (request-side):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sensitive_data_patterns&lt;/code&gt; detect AWS keys, GitHub tokens, emails, SSNs, and custom patterns in tool call arguments before they're sent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Outbound (response-side):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;response_rules&lt;/code&gt; inspect MCP server responses &lt;em&gt;before&lt;/em&gt; forwarding to the agent&lt;/li&gt;
&lt;li&gt;Policy-driven actions: &lt;code&gt;block&lt;/code&gt; (suppress), &lt;code&gt;warn&lt;/code&gt; (log), or &lt;code&gt;redact&lt;/code&gt; (sanitize and forward)&lt;/li&gt;
&lt;li&gt;Base64 payload detection on responses catches encoded exfiltration
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;static_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;sensitive_data_patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Access&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Key"&lt;/span&gt;
      &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AKIA[0-9A-Z]{16}"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GitHub&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Token"&lt;/span&gt;
      &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh[ps]_[A-Za-z0-9_]{36,}"&lt;/span&gt;

&lt;span class="na"&gt;response_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redact&lt;/span&gt;
  &lt;span class="na"&gt;detect_base64&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GitHub&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Token"&lt;/span&gt;
      &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh[ps]_[A-Za-z0-9_]{36,}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  5. Token/Credential Theft (Paper §3, §4)
&lt;/h4&gt;

&lt;p&gt;Secrets detected in tool call arguments are also redacted from IntentGuard's own decision logs — so sensitive data doesn't leak into your audit trail.&lt;/p&gt;

&lt;h4&gt;
  
  
  6. Resource Exhaustion / DoS (Paper §6)
&lt;/h4&gt;

&lt;p&gt;Beyond &lt;code&gt;max_tokens_per_call&lt;/code&gt;, IntentGuard enforces per-tool sliding-window rate limiting. Each tool can have its own &lt;code&gt;max_calls&lt;/code&gt; and &lt;code&gt;window_seconds&lt;/code&gt;, so a runaway agent can't hammer &lt;code&gt;write_file&lt;/code&gt; 1,000 times in a minute.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;static_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rate_limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;max_calls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
      &lt;span class="na"&gt;window_seconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="na"&gt;by_tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;write_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;max_calls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
        &lt;span class="na"&gt;window_seconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5-Minute Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A: Native Hook (Claude Code / Copilot / Cursor)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Install:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agent-intent-guard
&lt;span class="c"&gt;# or from source:&lt;/span&gt;
git clone https://github.com/temp-noob/intent-guard &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;intent-guard
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Write a policy&lt;/strong&gt; (or use a starter template from &lt;code&gt;policies/&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# schema/policy.yaml&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-project-guard"&lt;/span&gt;

&lt;span class="na"&gt;static_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;forbidden_tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete_database"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;purge_all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;protected_paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.ssh/*"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/auth/*"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens_per_call&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4000&lt;/span&gt;
  &lt;span class="na"&gt;decode_arguments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;injection_patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;previous&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;instructions"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;override.*policy"&lt;/span&gt;
  &lt;span class="na"&gt;sensitive_data_patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GitHub&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Token"&lt;/span&gt;
      &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh[ps]_[A-Za-z0-9_]{36,}"&lt;/span&gt;

&lt;span class="na"&gt;response_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redact&lt;/span&gt;
  &lt;span class="na"&gt;detect_base64&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GitHub&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Token"&lt;/span&gt;
      &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gh[ps]_[A-Za-z0-9_]{36,}"&lt;/span&gt;

&lt;span class="na"&gt;tool_change_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Wire the hook&lt;/strong&gt; (Claude Code example — &lt;code&gt;.claude/settings.json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cat | intent-guard evaluate --policy schema/policy.yaml"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hook templates are also shipped for &lt;strong&gt;GitHub Copilot&lt;/strong&gt; (&lt;code&gt;hooks/copilot/hooks.json&lt;/code&gt;) and &lt;strong&gt;Cursor&lt;/strong&gt; (&lt;code&gt;hooks/cursor/hooks.json&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: MCP Proxy Mode
&lt;/h3&gt;

&lt;p&gt;Wrap any MCP server with IntentGuard as a policy-enforcing proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;INTENT_GUARD_TASK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Refactor UI only; no auth or database changes"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; intent_guard.proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy&lt;/span&gt; schema/policy.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target&lt;/span&gt; &lt;span class="s2"&gt;"npx @modelcontextprotocol/server-filesystem /path/to/repo"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ask-approval&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure in your MCP client (Claude Desktop, etc.):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"filesystem"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"intent_guard.proxy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--policy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"schema/policy.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--target"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx @modelcontextprotocol/server-filesystem /path/to/repo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--ask-approval"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"INTENT_GUARD_TASK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Refactor UI only; do not touch auth or database"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Semantic Analysis: Beyond Pattern Matching
&lt;/h2&gt;

&lt;p&gt;Static checks are fast and deterministic, but attackers will find ways around regex. IntentGuard's semantic layer uses an LLM (local via Ollama or remote via LiteLLM) to evaluate whether a tool call's &lt;strong&gt;intent&lt;/strong&gt; matches the declared task. This is where the guard catches novel attacks that no pattern list can anticipate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;semantic_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
  &lt;span class="na"&gt;guardrail_model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llama3.1:8b&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enforce&lt;/span&gt;
  &lt;span class="na"&gt;prompt_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2"&lt;/span&gt;
  &lt;span class="na"&gt;critical_intent_threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.85&lt;/span&gt;

  &lt;span class="na"&gt;constraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;modify_source_code&lt;/span&gt;
      &lt;span class="na"&gt;allowed_scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;components&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;styles&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;only"&lt;/span&gt;
      &lt;span class="na"&gt;forbidden_scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Auth,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;schemas,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;infrastructure"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Signal Rubric Scoring (v2)
&lt;/h3&gt;

&lt;p&gt;Instead of asking the LLM for an opaque confidence number, v2 decomposes evaluation into &lt;strong&gt;four binary dimensions&lt;/strong&gt; with configurable weights:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Default Weight&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_task_alignment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Is this tool appropriate for the stated task?&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;argument_scope_compliance&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Are arguments within the allowed scope?&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;no_forbidden_scope_violation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Do arguments avoid the forbidden scope?&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;no_side_effect_risk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Is the call free of destructive/exfiltration risk?&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each dimension returns &lt;code&gt;passed&lt;/code&gt; (true/false) and &lt;code&gt;evidence&lt;/code&gt; (a short explanation). The final score is computed &lt;strong&gt;deterministically&lt;/strong&gt;: &lt;code&gt;Σ(weight × pass) / Σ(weight)&lt;/code&gt;. This makes decisions auditable, debuggable, and reproducible — critical for any serious adoption.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resilience Under Failure
&lt;/h3&gt;

&lt;p&gt;The semantic provider includes retries with exponential backoff + jitter, a circuit breaker, and &lt;strong&gt;per-tool fail modes&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;semantic_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider_fail_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;advisory&lt;/span&gt;        &lt;span class="c1"&gt;# fail-open for standard tools&lt;/span&gt;
    &lt;span class="na"&gt;by_tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;delete_database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enforce&lt;/span&gt;  &lt;span class="c1"&gt;# fail-closed for critical tools&lt;/span&gt;
      &lt;span class="na"&gt;purge_all&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enforce&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Ollama instance goes down, &lt;code&gt;delete_database&lt;/code&gt; calls are blocked (fail-closed) while normal file reads continue with a warning (fail-open). No all-or-nothing failure modes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens Without a Guard?
&lt;/h2&gt;

&lt;p&gt;Here's a concrete scenario the Claude Code leak enables:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A developer clones a PR branch that includes a poisoned &lt;code&gt;CLAUDE.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The instruction survives context compaction and becomes a trusted directive&lt;/li&gt;
&lt;li&gt;The cooperative model proposes: &lt;code&gt;write_file(path="src/auth/config.py", content="...")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The developer sees a reasonable-looking code change and approves it&lt;/li&gt;
&lt;li&gt;Auth config is silently modified&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;With IntentGuard:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;protected_paths: ["src/auth/*"]&lt;/code&gt; → &lt;strong&gt;blocked at static check&lt;/strong&gt; (&amp;lt; 1ms)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;injection_patterns&lt;/code&gt; on the compacted context → &lt;strong&gt;flagged if injection survived&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Semantic analysis against task &lt;code&gt;"Refactor UI only"&lt;/code&gt; → &lt;strong&gt;blocked&lt;/strong&gt; — auth file modification violates allowed scope&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;response_rules&lt;/code&gt; → even if a tool &lt;em&gt;did&lt;/em&gt; execute, sensitive data in the response would be redacted before reaching the agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The attack is stopped at multiple independent layers. That's defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 2: Supply-chain MCP server exfiltrates secrets
&lt;/h3&gt;

&lt;p&gt;The Claude Code leak exposed the exact interface contract for MCP servers. An attacker publishes a typosquatted package (e.g., &lt;code&gt;mcp-server-filesytem&lt;/code&gt;) — the paper found 34% of malicious server installations use this vector. A developer installs it. The server:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Returns normal-looking file contents for a few calls to build trust&lt;/li&gt;
&lt;li&gt;On the next &lt;code&gt;tools/call&lt;/code&gt;, the response payload includes the contents of &lt;code&gt;.env&lt;/code&gt; base64-encoded inside a markdown comment&lt;/li&gt;
&lt;li&gt;The agent's context now contains the developer's secrets&lt;/li&gt;
&lt;li&gt;The server's next &lt;code&gt;sampling/createMessage&lt;/code&gt; exfiltrates them&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;With IntentGuard:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tool_change_rules&lt;/code&gt; → if the server's tool descriptions shifted from the snapshot, the call is &lt;strong&gt;blocked before execution&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;response_rules&lt;/code&gt; with &lt;code&gt;detect_base64: true&lt;/code&gt; → the base64-encoded &lt;code&gt;.env&lt;/code&gt; in the response is &lt;strong&gt;caught and redacted&lt;/strong&gt; before it reaches the agent&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sensitive_data_patterns&lt;/code&gt; → even if decoded, AWS keys and tokens in the payload are &lt;strong&gt;flagged&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The attacker never gets the secrets out&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  For Defenders: Immediate Actions
&lt;/h2&gt;

&lt;p&gt;Whether or not you adopt IntentGuard, here's what you should do now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit &lt;code&gt;CLAUDE.md&lt;/code&gt; / &lt;code&gt;.cursorrules&lt;/code&gt; / &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;&lt;/strong&gt; in every repo you clone — especially PRs and forks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat MCP servers like npm dependencies&lt;/strong&gt; — vet them, pin them, monitor for changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid broad permission rules&lt;/strong&gt; like &lt;code&gt;Bash(git:*)&lt;/code&gt; — be specific&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin your AI tool versions&lt;/strong&gt; and verify hashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limit session length&lt;/strong&gt; for sensitive work to reduce the compaction attack window&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never use &lt;code&gt;dangerouslyDisableSandbox&lt;/code&gt;&lt;/strong&gt; in shared or production environments&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And if you want a policy layer that enforces these principles automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agent-intent-guard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Full Feature List
&lt;/h2&gt;

&lt;p&gt;The vulnerability mapping above covers the highlights. For the complete list of everything IntentGuard ships today — including advisory mode, interactive and webhook approvals, break-glass controls, hot-reload, policy validation, starter templates, decision caching, and audit-ready decision metadata — see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/temp-noob/intent-guard/blob/main/features.md" rel="noopener noreferrer"&gt;features.md&lt;/a&gt;&lt;/strong&gt; — all security features, organized by category&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/temp-noob/intent-guard/blob/main/feature-example.md" rel="noopener noreferrer"&gt;feature-example.md&lt;/a&gt;&lt;/strong&gt; — copy/paste policy YAML for every feature, plus full end-to-end examples&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Feedback
&lt;/h2&gt;

&lt;p&gt;I'm building this in the open and want to hear what's working, what's missing, and what's wrong. If you have critiques, ideas, or bug reports, &lt;a href="https://github.com/temp-noob/intent-guard/issues" rel="noopener noreferrer"&gt;open an issue on GitHub&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.straiker.ai/blog/claude-code-source-leak-with-great-agency-comes-great-responsibility" rel="noopener noreferrer"&gt;Claude Code Source Leak Analysis&lt;/a&gt; — Straiker AI, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2601.17549" rel="noopener noreferrer"&gt;Breaking the Protocol: Security Analysis of MCP&lt;/a&gt; — Maloyan &amp;amp; Namiot, arXiv:2601.17549, January 2026&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/temp-noob/intent-guard" rel="noopener noreferrer"&gt;IntentGuard on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;IntentGuard is built by &lt;a href="https://github.com/temp-noob" rel="noopener noreferrer"&gt;temp-noob&lt;/a&gt;. If this helps secure your AI agent deployment/workflows, give it a ⭐ on GitHub and share it with your team.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>agents</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
