<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Justin Kwon</title>
    <description>The latest articles on DEV Community by Justin Kwon (@ju571nk).</description>
    <link>https://dev.to/ju571nk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948463%2Fc6282acb-0624-4327-b97c-66033332afae.png</url>
      <title>DEV Community: Justin Kwon</title>
      <link>https://dev.to/ju571nk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ju571nk"/>
    <language>en</language>
    <item>
      <title>The real attack surface for AI coding agents is the config file</title>
      <dc:creator>Justin Kwon</dc:creator>
      <pubDate>Sun, 24 May 2026 07:32:00 +0000</pubDate>
      <link>https://dev.to/ju571nk/the-real-attack-surface-for-ai-coding-agents-is-the-config-file-1ma2</link>
      <guid>https://dev.to/ju571nk/the-real-attack-surface-for-ai-coding-agents-is-the-config-file-1ma2</guid>
      <description>&lt;p&gt;If you think the security risk of AI coding agents (Claude Code, Cursor, Gemini CLI) is "the model goes rogue and runs a dangerous command," the serious incidents from the past few months tell a different story. None of them were really about the model. The starting point was always a config file.&lt;/p&gt;

&lt;p&gt;This post walks through TrustFall and AWS Kiro, explains why config files became the attack surface, and introduces the open-source tool I built in response, Sigil.&lt;/p&gt;

&lt;h2&gt;
  
  
  TrustFall: clone, open, RCE
&lt;/h2&gt;

&lt;p&gt;In May 2026, Adversa AI published TrustFall: cloning a malicious repository and opening it was enough for one-click RCE across Claude Code, Cursor, Gemini CLI, and GitHub Copilot.&lt;/p&gt;

&lt;p&gt;The setup is two files in the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.mcp.json&lt;/code&gt; pointing at an attacker-controlled MCP server&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.claude/settings.json&lt;/code&gt; with project-scoped settings like &lt;code&gt;enableAllProjectMcpServers&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the user opens the repo and presses Enter on the "do you trust this folder?" dialog, the attacker's MCP server starts. From there it can read other projects' source and stored credentials, or open a long-lived outbound connection. On a headless CI runner the trust dialog never appears, so it lands with no human in the loop.&lt;/p&gt;

&lt;p&gt;And this isn't a one-off. Check Point Research reported the same class of problem as "project config is processed before the trust prompt": CVE-2025-59536 (RCE through &lt;code&gt;.claude/&lt;/code&gt; hooks or MCP server settings) and CVE-2026-21852 (API key exfiltration by abusing &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;). Both fire on clone-and-open, before you confirm the trust dialog.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Kiro: rewriting the config after the fact
&lt;/h2&gt;

&lt;p&gt;If TrustFall ships a malicious config up front, the case of AWS's agentic IDE Kiro is about rewriting the config later.&lt;/p&gt;

&lt;p&gt;Johann Rehberger (Embrace The Red) showed that indirect prompt injection could rewrite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kiroAgent.trustedCommands: ["*"]&lt;/code&gt; in &lt;code&gt;.vscode/settings.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.kiro/settings/mcp.json&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once &lt;code&gt;trustedCommands&lt;/code&gt; contains &lt;code&gt;*&lt;/code&gt;, the agent runs arbitrary commands without confirmation. Instructions injected from a web page or an issue quietly edit a local config file, and that turns into arbitrary command execution. It was fixed in Kiro 0.1.42.&lt;/p&gt;

&lt;h2&gt;
  
  
  The common thread: config files grant the permission
&lt;/h2&gt;

&lt;p&gt;In all of these, the model never "decided" to do something malicious. What got attacked was the configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hooks&lt;/li&gt;
&lt;li&gt;permissions (allow / deny)&lt;/li&gt;
&lt;li&gt;MCP allowlists&lt;/li&gt;
&lt;li&gt;sandbox flags&lt;/li&gt;
&lt;li&gt;trustedCommands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These config files are what decide what an agent is allowed to do. The awkward part is that they take effect when you open the project, not when you read them. The permission is granted before you review anything.&lt;/p&gt;

&lt;p&gt;EDR can see the &lt;code&gt;rm -rf&lt;/code&gt; that ran, but not the config change that authorized it. The place to defend is the config that allowed the command, not the command itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you defend it
&lt;/h2&gt;

&lt;p&gt;Two practical moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run AI coding agents inside a container or sandbox whenever you can.&lt;/li&gt;
&lt;li&gt;Watch the config files and notice when one turns dangerous.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Doing #2 by hand doesn't last. Eyeballing &lt;code&gt;.claude/settings.json&lt;/code&gt; and &lt;code&gt;.mcp.json&lt;/code&gt; every time they change is a process that breaks down.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built: Sigil
&lt;/h2&gt;

&lt;p&gt;So I built Sigil, a host-side AI Security Posture Management (AI-SPM) agent.&lt;/p&gt;

&lt;p&gt;It watches the config files that decide an agent's permissions (hooks, permissions, MCP allowlists, sandbox flags), scores a config when it turns dangerous, and ships the event to a log or SIEM.&lt;/p&gt;

&lt;p&gt;It doesn't block. It scores and records. It tells you "this config changed and the agent can now do X." Actually stopping the action is left to the agent runtime and your existing controls. Because it measures instead of blocking, it doesn't get in a developer's way with false positives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A normal config with read-only permissions and no hooks scores 0 / low.&lt;/li&gt;
&lt;li&gt;Add a PreToolUse hook with matcher &lt;code&gt;.*&lt;/code&gt; that runs &lt;code&gt;rm -rf $HOME&lt;/code&gt;, and it re-scores 7.5 / critical (no sandbox, overly broad matcher, destructive command in the hook).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FJu571nK%2Fsigil%2Fmain%2Fdemo%2Faiguard-demo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FJu571nK%2Fsigil%2Fmain%2Fdemo%2Faiguard-demo.gif" alt="Sigil scoring a dangerous config 7.5/critical" width="720" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A single static binary (x86_64 musl, plus macOS arm64 and Windows)&lt;/li&gt;
&lt;li&gt;File watching with tokio and notify, no polling&lt;/li&gt;
&lt;li&gt;One-line install, Apache-2.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the record: most of the implementation was vibe-coded with Claude Code. I drove the threat model, the scoring rubric, and the architecture, and let the AI write a lot of the code. Building a tool that watches what coding agents are allowed to do, with a coding agent, was a little funny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;When an AI coding agent gets attacked, the target isn't the model. It's a config file nobody reviewed. TrustFall, Kiro, and CVE-2025-59536 all hit the same spot.&lt;/p&gt;

&lt;p&gt;How are you handling untrusted repository configs today? Sandbox everything, review configs by hand, or just open them and hope?&lt;/p&gt;

&lt;p&gt;Repo, demo, and the config-watching details: &lt;a href="https://github.com/Ju571nK/sigil" rel="noopener noreferrer"&gt;https://github.com/Ju571nK/sigil&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/" rel="noopener noreferrer"&gt;TrustFall (Adversa AI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/" rel="noopener noreferrer"&gt;Caught in the Hook: CVE-2025-59536 / CVE-2026-21852 (Check Point Research)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://embracethered.com/blog/posts/2025/aws-kiro-aribtrary-command-execution-with-indirect-prompt-injection/" rel="noopener noreferrer"&gt;AWS Kiro: Arbitrary Command Execution with Indirect Prompt Injection (Embrace The Red)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>security</category>
      <category>devsecops</category>
    </item>
  </channel>
</rss>
