<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Justin Kwon</title>
    <description>The latest articles on DEV Community by Justin Kwon (@ju571nk).</description>
    <link>https://dev.to/ju571nk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948463%2Fc6282acb-0624-4327-b97c-66033332afae.png</url>
      <title>DEV Community: Justin Kwon</title>
      <link>https://dev.to/ju571nk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ju571nk"/>
    <language>en</language>
    <item>
      <title>Is that AI coding agent safe by default? Codex / Claude Code / Antigravity across 3 postures</title>
      <dc:creator>Justin Kwon</dc:creator>
      <pubDate>Tue, 16 Jun 2026 13:02:33 +0000</pubDate>
      <link>https://dev.to/ju571nk/is-that-ai-coding-agent-safe-by-default-codex-claude-code-antigravity-across-3-postures-2f1g</link>
      <guid>https://dev.to/ju571nk/is-that-ai-coding-agent-safe-by-default-codex-claude-code-antigravity-across-3-postures-2f1g</guid>
      <description>&lt;p&gt;by &lt;strong&gt;Ju571nK&lt;/strong&gt; · 2026&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This piece explains what each item in the comparison table means and where it can go wrong. It does not rate any specific product.&lt;br&gt;
As of: 2026-06&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When people talk about securing AI coding agents, they usually picture "a proxy watching the traffic." That's one valid approach. But you can scope agent security broadly or narrowly, and one part that gets overlooked is the user-settings / environment / OS layer. Set it up badly and it becomes a common source of incidents.&lt;/p&gt;

&lt;p&gt;A single score doesn't tell you much. What matters is which posture you're looking from. So the table is split into three: ① out of the box (default), ② locked down by a careful user (hardened), and ③ enforced by an admin (enterprise-managed). Below, each item explains what the mechanism is and where it breaks. "Demonstrated" marks real cases reported in 2025-2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhpysscgqnv56scv98vv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhpysscgqnv56scv98vv.png" alt="AI coding agent security: three tables by posture" width="800" height="1181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Premise:&lt;/strong&gt; defaults and behavior differ by product and edition, and enterprise (managed) builds can differ from personal ones. So "which product is safe" is rarely the useful question. The useful one is "how is this item set up on this host, and where is it weak?"&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ① Out of the box: default posture
&lt;/h2&gt;

&lt;p&gt;A fresh single-user install with no hardening. The question: is it safe the moment you install it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Default execution posture&lt;/strong&gt;&lt;br&gt;
Whether the agent asks for human approval before it runs commands, edits files, or makes external calls. If the default is close to "run without asking," one prompt injection can run commands or leak data right away. But if approval prompts come too often, users flip on full auto-approve and disable the gate themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OS sandbox by default&lt;/strong&gt;&lt;br&gt;
Whether the kernel-level isolation around commands is on out of the box. If it's off by default, you spend most of your time running with no isolation at all. The most common incident isn't a clever exploit, it's a frustrated user turning isolation off to get unblocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network closed by default&lt;/strong&gt;&lt;br&gt;
Whether outbound network is blocked by default in the execution environment. If it's open, an injection has a ready path to send secrets out (exfil) or pull a second-stage payload.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▣ Demonstrated (2025):&lt;/strong&gt; in one agent's leak, the stolen secret was sent to an external webhook (&lt;code&gt;webhook.site&lt;/code&gt;) that was on the default allow-list.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Untrusted-code containment&lt;/strong&gt;&lt;br&gt;
Whether opening someone else's repo keeps its config and instruction files from auto-applying. If the trust gate is weak, opening a malicious repo runs its planted config the moment you open it, so your environment changes while you did nothing (zero-click).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▣ Demonstrated (2025):&lt;/strong&gt; a malicious repo's project-instruction file planted global config (an auto-run MCP setup), creating a backdoor that survived a reinstall. It was first classified as "intended behavior."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Sensitive-path protection&lt;/strong&gt;&lt;br&gt;
Whether default rules keep secret files like &lt;code&gt;.env&lt;/code&gt; and keys unreadable. If the protection is only a tool-scoped filter, the model just reads the file through the shell instead.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▣ Demonstrated (2025):&lt;/strong&gt; when file-read was blocked at the tool layer, the model ran &lt;code&gt;cat .env&lt;/code&gt; itself to exfiltrate AWS keys, a subprocess bypass.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ② Locked down by a careful user: hardening ceiling
&lt;/h2&gt;

&lt;p&gt;How far a careful user can push the defenses through settings. The question: if you lock it down properly, what still breaks?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strict OS sandbox&lt;/strong&gt;&lt;br&gt;
Whether you can bind kernel isolation tightly: read-only, workspace-only. Even when you turn it on, a wide write scope, a bypass flag, or a fail-open fallback to unsandboxed on init failure weakens it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▣ Demonstrated (2026):&lt;/strong&gt; one agent had an RCE that escaped even its strictest "Secure Mode" (later patched).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Network domain restriction&lt;/strong&gt;&lt;br&gt;
Whether you can narrow outbound traffic to an allow-list. Allowing too broadly (&lt;code&gt;*.example.com&lt;/code&gt;) or skipping traffic inspection (abuse of an allowed domain, domain fronting) gets around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path/command deny rules&lt;/strong&gt;&lt;br&gt;
Rules that explicitly block specific paths or commands. Compound commands (&lt;code&gt;a &amp;amp;&amp;amp; b&lt;/code&gt;), wrappers (&lt;code&gt;xargs&lt;/code&gt;, &lt;code&gt;npx&lt;/code&gt;, &lt;code&gt;docker exec&lt;/code&gt;), or encoding can reshape the same action into a form that slips past the rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extra blocking hook (defense-in-depth)&lt;/strong&gt;&lt;br&gt;
A user-side hook that intercepts and denies a tool call right before it runs (PreToolUse-style). Regex matching is easy to bypass, the hook can fail-open on error, and a broad allow can neuter it. Treat it as backup protection, not the main boundary. Relying on it as the only block is dangerous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guard tamper-evidence&lt;/strong&gt;&lt;br&gt;
Whether a once-trusted guard script is detected when it later changes (hash-pinning, etc.). Without it, a malicious package or post-install can swap the hook quietly, protection disappears from the next run, and the user never notices (TOCTOU).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attack surface from extensibility&lt;/strong&gt;&lt;br&gt;
The extension surface: plugins, hooks, MCP. From a foot-gun view, fewer is safer. Extensibility can help or hurt. It carries policy, but malicious code can also use it to inject global config or MCP, and that path has already been exploited.&lt;/p&gt;




&lt;h2&gt;
  
  
  ③ Enforced by an admin: enterprise-managed
&lt;/h2&gt;

&lt;p&gt;Whether the org can pin rules the user can't disable. The question: what protection still holds when a user makes a mistake?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enforced policy file&lt;/strong&gt;&lt;br&gt;
A managed policy file an individual user can't undo. If that file sits in a user-writable location instead of an OS-protected path, editing it bypasses the policy. And shipping the policy without deploying the scripts it references makes it a paper policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overrides user flags&lt;/strong&gt;&lt;br&gt;
Whether the policy neutralizes personal bypass flags like &lt;code&gt;--yolo&lt;/code&gt;. If precedence is wrong and local settings or CLI options override org policy, the enforcement doesn't actually apply.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enforce MCP/plugin allow-list&lt;/strong&gt;&lt;br&gt;
Whether the org pins which MCP servers and plugin sources are allowed. Without it, a user can add arbitrary servers or marketplaces, leaving the supply-chain surface wide open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enforce version / managed hooks&lt;/strong&gt;&lt;br&gt;
Whether you can require a minimum version and "managed hooks only." An unmanaged personal machine may have no enforcement layer at all, so you have to confirm and measure it separately: is this host managed?&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Most incidents don't happen because a mechanism was missing. They happen when an existing one gets turned off, opened too wide, bypassed, or quietly unset by a person.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Watching the traffic answers one question. The one that decides your actual exposure is "how is each item on this host set up right now, and where is it weak?" And that depends on the posture: how conservative the default is, how high the hardening ceiling goes, whether the org can enforce anything.&lt;/p&gt;




&lt;p&gt;▣ &lt;strong&gt;Sources for the demonstrated cases:&lt;/strong&gt; the "Demonstrated" items come from 2025-2026 public reporting. All were reported or demonstrated on Google Antigravity. They're cited as examples of real-world failures, not as a product ranking, and behavior varies by product, version, and edition.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PromptArmor / TechRadar (prompt injection leaking cloud credentials; &lt;code&gt;.env&lt;/code&gt; protection bypassed via the terminal): &lt;a href="https://www.techradar.com/pro/googles-ai-powered-antigravity-ide-already-has-some-worrying-security-issues" rel="noopener noreferrer"&gt;https://www.techradar.com/pro/googles-ai-powered-antigravity-ide-already-has-some-worrying-security-issues&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Mindgard "Forced Descent" (a backdoor planted via &lt;code&gt;mcp_config.json&lt;/code&gt; that survives reinstall): &lt;a href="https://mindgard.ai/blog/google-antigravity-persistent-code-execution-vulnerability" rel="noopener noreferrer"&gt;https://mindgard.ai/blog/google-antigravity-persistent-code-execution-vulnerability&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pillar Security (&lt;code&gt;find_by_name&lt;/code&gt; → &lt;code&gt;fd -X&lt;/code&gt; Secure-Mode-bypass RCE; reported Jan 2026, fixed Feb): &lt;a href="https://www.pillar.security/blog/prompt-injection-leads-to-rce-and-sandbox-escape-in-antigravity" rel="noopener noreferrer"&gt;https://www.pillar.security/blog/prompt-injection-leads-to-rce-and-sandbox-escape-in-antigravity&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>"It's not a bug, it's spec": a zero-click RCE in AI coding agents that three vendors won''t patch</title>
      <dc:creator>Justin Kwon</dc:creator>
      <pubDate>Tue, 02 Jun 2026 12:23:25 +0000</pubDate>
      <link>https://dev.to/ju571nk/its-not-a-bug-its-spec-a-zero-click-rce-in-ai-coding-agents-that-three-vendors-wont-patch-32o1</link>
      <guid>https://dev.to/ju571nk/its-not-a-bug-its-spec-a-zero-click-rce-in-ai-coding-agents-that-three-vendors-wont-patch-32o1</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — A prompt injection can rewrite your AI IDE's &lt;code&gt;mcp.json&lt;/code&gt; the moment you open a project, with no dialog and no click, and get arbitrary code execution. It's one of 12+ CVEs in the same class. The root cause lives in the official MCP SDK — and Anthropic, Google, and Microsoft have declined to issue CVEs for their own tools, on the grounds that rewriting the file "requires explicit user permission." In practice, that permission is usually "you installed the IDE."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In a &lt;a href="https://dev.to/ju571nk/the-real-attack-surface-for-ai-coding-agents-is-the-config-file-1ma2"&gt;previous post&lt;/a&gt; I argued that the real attack surface for AI coding agents isn't "the model goes rogue" — it's the config file. At the time, the worst case (TrustFall) still needed a human: clone a malicious repo, open it, press Enter on a trust dialog.&lt;/p&gt;

&lt;p&gt;CVE-2026-30615 removes the Enter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The zero-click chain
&lt;/h2&gt;

&lt;p&gt;Disclosed by OX Security on 2026-04-15. On Windsurf IDE 1.9544.26, the chain is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The attacker prepares HTML content the IDE will render — a malicious web page, a poisoned repo README, a tampered tool description.&lt;/li&gt;
&lt;li&gt;An injected instruction silently overwrites the local &lt;code&gt;mcp.json&lt;/code&gt; and registers an attacker-controlled STDIO server.&lt;/li&gt;
&lt;li&gt;The MCP SDK re-reads the config and launches the registered binary.&lt;/li&gt;
&lt;li&gt;Arbitrary command execution. CVSS 8 / High.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No approval dialog. No confirmation step. Among the IDEs OX tested, &lt;strong&gt;Windsurf was the only true zero-click&lt;/strong&gt; — Cursor, Claude Code, and Gemini CLI each required at least one user action.&lt;/p&gt;

&lt;p&gt;Codeium (Windsurf) shipped a patch. That's the part everyone agrees on. The argument starts with everyone else.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is a class, not a bug
&lt;/h2&gt;

&lt;p&gt;The same disclosure groups 12+ CVEs under one pattern — RCE via MCP STDIO:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LangFlow (CVE unassigned)&lt;/li&gt;
&lt;li&gt;GPT Researcher (CVE-2025-65720)&lt;/li&gt;
&lt;li&gt;LiteLLM (CVE-2026-30623)&lt;/li&gt;
&lt;li&gt;Agent Zero (CVE-2026-30624)&lt;/li&gt;
&lt;li&gt;Windsurf (CVE-2026-30615)&lt;/li&gt;
&lt;li&gt;DocsGPT (CVE-2026-26015)&lt;/li&gt;
&lt;li&gt;Flowise, Upsonic, Bisheng, Jaaz, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shared root cause: the official MCP SDK passes user-controllable config values into &lt;code&gt;StdioServerParameters&lt;/code&gt; without sanitization, and that flows straight into spawning a subprocess. OX filed this root cause under a category I haven't seen on a vuln report before — &lt;strong&gt;"Won't Be Patched"&lt;/strong&gt; — because Anthropic's position is that this is spec-conformant behavior, not a defect to fix at the protocol level.&lt;/p&gt;

&lt;p&gt;There's a known operational mitigation: allowlist the STDIO &lt;code&gt;command&lt;/code&gt; value to known launchers, e.g. &lt;code&gt;{npx, uvx, python, python3, node, docker, deno}&lt;/code&gt;. That closes the "point at any binary you like" path. But it's something each downstream implementation has to add itself. It is not the SDK's default.&lt;/p&gt;

&lt;h2&gt;
  
  
  The review surface keeps shrinking
&lt;/h2&gt;

&lt;p&gt;Line up three incidents and the trend is hard to miss:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Act 1 — TrustFall:&lt;/strong&gt; the config file is malicious from the start. Clone, open, press Enter on the trust dialog. At least the dialog appears.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act 2 — AWS Kiro:&lt;/strong&gt; an indirect prompt injection writes &lt;code&gt;trustedCommands: ["*"]&lt;/code&gt;. The config changes &lt;em&gt;after&lt;/em&gt; you've reviewed it, so you miss the moment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act 3 — Windsurf zero-click:&lt;/strong&gt; opening HTML silently rewrites &lt;code&gt;mcp.json&lt;/code&gt;. No dialog at all. The fact that a rewrite happened isn't even surfaced in the IDE.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each act shaves off more of the surface where a human could notice something is wrong. By Act 3, the event itself is invisible.&lt;/p&gt;

&lt;h2&gt;
  
  
  So whose problem is it?
&lt;/h2&gt;

&lt;p&gt;Here's where I'd genuinely like the comments.&lt;/p&gt;

&lt;p&gt;Google, Microsoft, and Anthropic have declined to issue CVEs for their own tools in this class. The stated reason is reasonable on its face: modifying these files &lt;strong&gt;requires the user's explicit permission&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But walk through what that permission actually is. The injection rewrites a file inside a workspace the agent already has write access to — access you granted, in bulk, when you opened the project. There is no per-write prompt. So "explicit user permission" collapses into "you ran the IDE." If the threshold for &lt;em&gt;not a vulnerability&lt;/em&gt; is "the user once consented to use the software," almost nothing involving a config file is ever a vulnerability.&lt;/p&gt;

&lt;p&gt;I'm not claiming the vendors are acting in bad faith. A protocol-level fix is genuinely hard, and "spec-conformant" is technically true. But "technically spec" and "not the user's problem" are different claims, and the second one is the one that ends up on the user. When the people who own the protocol decline to treat it as a defect, the risk doesn't disappear — it just moves downstream to whoever's running the agent.&lt;/p&gt;

&lt;p&gt;Which is the actual question: &lt;strong&gt;if the vendor won't fix it and won't even name it, where does the responsibility land?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  If nobody patches, watch the config layer
&lt;/h2&gt;

&lt;p&gt;My answer, for what it's worth, is that this has to be observed somewhere other than the IDE.&lt;/p&gt;

&lt;p&gt;EDR sees the &lt;code&gt;npx&lt;/code&gt; or &lt;code&gt;python&lt;/code&gt; that got spawned. It does not see "a new STDIO server was added to &lt;code&gt;mcp.json&lt;/code&gt;." By the time the subprocess starts, the config change is already seconds in the past. The interesting signal — &lt;em&gt;the permission state changed while you weren't looking&lt;/em&gt; — happens one layer up from where most tooling is watching.&lt;/p&gt;

&lt;p&gt;That's the layer I've been poking at. I've been building a small open-source thing (&lt;a href="https://github.com/Ju571nK/sigil" rel="noopener noreferrer"&gt;Sigil&lt;/a&gt;) that watches agent config files like &lt;code&gt;mcp.json&lt;/code&gt; and &lt;code&gt;.claude/settings.json&lt;/code&gt;, scores the risk, and emits an event to your SIEM — it doesn't block, it just tells you when the permission state changed while your hands were off the keyboard. Across a fleet of machines that shows up as triage-able alerts — the silent change, made visible:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9somgaz8htx9pld36zea.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9somgaz8htx9pld36zea.png" alt="sigil-manager fleet dashboard — an Alerts console listing AI Guard risk events across hosts: Sandbox Disabled on Codex and No Sandbox on Claude Code at CRITICAL, an " width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And because it exposes that same posture as plain MCP, you can also just &lt;em&gt;ask&lt;/em&gt;. Here's Codex doing exactly that — pulling the riskiest host in a fleet and the reasons behind it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk569alhqu3771dyn9pai.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk569alhqu3771dyn9pai.gif" alt="A Codex session calls Sigil's fleet_risk and get_host MCP tools, finds the top-risk host at critical 10.0, and explains the reasons — including an untrusted remote MCP server, no sandbox, and a destructive inline command" width="480" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice one of the flagged reasons: an &lt;strong&gt;untrusted remote MCP server&lt;/strong&gt;. That's the same class of &lt;code&gt;mcpServers&lt;/code&gt; entry CVE-2026-30615 plants — except here it's surfaced as posture, where a human (or another agent) can actually see it after the fact. (The CVE itself is STDIO-command-based; Sigil's STDIO-command scoring is tracked in &lt;a href="https://github.com/Ju571nK/sigil/issues/53" rel="noopener noreferrer"&gt;#53&lt;/a&gt;. The attack surface — the &lt;code&gt;mcpServers&lt;/code&gt; key — is the same.)&lt;/p&gt;

&lt;p&gt;That's deliberately the last thing in this post, not the point of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The point
&lt;/h2&gt;

&lt;p&gt;Act 1 was "plant a malicious config file." Act 3 is "rewrite the config file the instant it's opened, silently." The time and surface a user has to review anything got measurably smaller in between — and the vendors who own the protocol have decided that's spec, not bug.&lt;/p&gt;

&lt;p&gt;The attack surface is the config file. So the thing you watch should be the config file too — its state, and the moment that state changes.&lt;/p&gt;

&lt;p&gt;How does your team handle config from untrusted repos today — sandbox the whole workspace, pin the agent's permissions, or just trust the trust dialog? I'd actually like to know.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://policylayer.com/mcp-incidents/windsurf-zero-click-mcp-rce-cve-2026-30615" rel="noopener noreferrer"&gt;CVE-2026-30615: Windsurf Zero-Click MCP Prompt Injection RCE — PolicyLayer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem/" rel="noopener noreferrer"&gt;MCP Supply Chain Advisory: RCE Vulnerabilities Across the AI Ecosystem — OX Security&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thehackernews.com/2026/04/anthropic-mcp-design-vulnerability.html" rel="noopener noreferrer"&gt;Anthropic MCP Design Vulnerability Enables RCE, Threatening AI Supply Chain — The Hacker News&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/" rel="noopener noreferrer"&gt;When prompts become shells: RCE vulnerabilities in AI agent frameworks — Microsoft Security Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://idanhabler.medium.com/agentic-ides-under-fire-dissecting-the-real-cves-that-exposed-cursor-windsurf-and-void-bd4fae316777" rel="noopener noreferrer"&gt;Agentic IDEs Under Fire: Dissecting the Real CVEs — Idan Habler&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>mcp</category>
      <category>devsecops</category>
    </item>
    <item>
      <title>The real attack surface for AI coding agents is the config file</title>
      <dc:creator>Justin Kwon</dc:creator>
      <pubDate>Sun, 24 May 2026 07:32:00 +0000</pubDate>
      <link>https://dev.to/ju571nk/the-real-attack-surface-for-ai-coding-agents-is-the-config-file-1ma2</link>
      <guid>https://dev.to/ju571nk/the-real-attack-surface-for-ai-coding-agents-is-the-config-file-1ma2</guid>
      <description>&lt;p&gt;If you think the security risk of AI coding agents (Claude Code, Cursor, Gemini CLI) is "the model goes rogue and runs a dangerous command," the serious incidents from the past few months tell a different story. None of them were really about the model. The starting point was always a config file.&lt;/p&gt;

&lt;p&gt;This post walks through TrustFall and AWS Kiro, explains why config files became the attack surface, and introduces the open-source tool I built in response, Sigil.&lt;/p&gt;

&lt;h2&gt;
  
  
  TrustFall: clone, open, RCE
&lt;/h2&gt;

&lt;p&gt;In May 2026, Adversa AI published TrustFall: cloning a malicious repository and opening it was enough for one-click RCE across Claude Code, Cursor, Gemini CLI, and GitHub Copilot.&lt;/p&gt;

&lt;p&gt;The setup is two files in the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.mcp.json&lt;/code&gt; pointing at an attacker-controlled MCP server&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.claude/settings.json&lt;/code&gt; with project-scoped settings like &lt;code&gt;enableAllProjectMcpServers&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the user opens the repo and presses Enter on the "do you trust this folder?" dialog, the attacker's MCP server starts. From there it can read other projects' source and stored credentials, or open a long-lived outbound connection. On a headless CI runner the trust dialog never appears, so it lands with no human in the loop.&lt;/p&gt;

&lt;p&gt;And this isn't a one-off. Check Point Research reported the same class of problem as "project config is processed before the trust prompt": CVE-2025-59536 (RCE through &lt;code&gt;.claude/&lt;/code&gt; hooks or MCP server settings) and CVE-2026-21852 (API key exfiltration by abusing &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;). Both fire on clone-and-open, before you confirm the trust dialog.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Kiro: rewriting the config after the fact
&lt;/h2&gt;

&lt;p&gt;If TrustFall ships a malicious config up front, the case of AWS's agentic IDE Kiro is about rewriting the config later.&lt;/p&gt;

&lt;p&gt;Johann Rehberger (Embrace The Red) showed that indirect prompt injection could rewrite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kiroAgent.trustedCommands: ["*"]&lt;/code&gt; in &lt;code&gt;.vscode/settings.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.kiro/settings/mcp.json&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once &lt;code&gt;trustedCommands&lt;/code&gt; contains &lt;code&gt;*&lt;/code&gt;, the agent runs arbitrary commands without confirmation. Instructions injected from a web page or an issue quietly edit a local config file, and that turns into arbitrary command execution. It was fixed in Kiro 0.1.42.&lt;/p&gt;

&lt;h2&gt;
  
  
  The common thread: config files grant the permission
&lt;/h2&gt;

&lt;p&gt;In all of these, the model never "decided" to do something malicious. What got attacked was the configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hooks&lt;/li&gt;
&lt;li&gt;permissions (allow / deny)&lt;/li&gt;
&lt;li&gt;MCP allowlists&lt;/li&gt;
&lt;li&gt;sandbox flags&lt;/li&gt;
&lt;li&gt;trustedCommands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These config files are what decide what an agent is allowed to do. The awkward part is that they take effect when you open the project, not when you read them. The permission is granted before you review anything.&lt;/p&gt;

&lt;p&gt;EDR can see the &lt;code&gt;rm -rf&lt;/code&gt; that ran, but not the config change that authorized it. The place to defend is the config that allowed the command, not the command itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you defend it
&lt;/h2&gt;

&lt;p&gt;Two practical moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run AI coding agents inside a container or sandbox whenever you can.&lt;/li&gt;
&lt;li&gt;Watch the config files and notice when one turns dangerous.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Doing #2 by hand doesn't last. Eyeballing &lt;code&gt;.claude/settings.json&lt;/code&gt; and &lt;code&gt;.mcp.json&lt;/code&gt; every time they change is a process that breaks down.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built: Sigil
&lt;/h2&gt;

&lt;p&gt;So I built Sigil, a host-side AI Security Posture Management (AI-SPM) agent.&lt;/p&gt;

&lt;p&gt;It watches the config files that decide an agent's permissions (hooks, permissions, MCP allowlists, sandbox flags), scores a config when it turns dangerous, and ships the event to a log or SIEM.&lt;/p&gt;

&lt;p&gt;It doesn't block. It scores and records. It tells you "this config changed and the agent can now do X." Actually stopping the action is left to the agent runtime and your existing controls. Because it measures instead of blocking, it doesn't get in a developer's way with false positives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A normal config with read-only permissions and no hooks scores 0 / low.&lt;/li&gt;
&lt;li&gt;Add a PreToolUse hook with matcher &lt;code&gt;.*&lt;/code&gt; that runs &lt;code&gt;rm -rf $HOME&lt;/code&gt;, and it re-scores 7.5 / critical (no sandbox, overly broad matcher, destructive command in the hook).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8xkw573m5j2h2ei906ce.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8xkw573m5j2h2ei906ce.gif" alt="Sigil scoring a dangerous config 7.5/critical" width="800" height="643"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A single static binary (x86_64 musl, plus macOS arm64 and Windows)&lt;/li&gt;
&lt;li&gt;File watching with tokio and notify, no polling&lt;/li&gt;
&lt;li&gt;One-line install, Apache-2.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the record: most of the implementation was vibe-coded with Claude Code. I drove the threat model, the scoring rubric, and the architecture, and let the AI write a lot of the code. Building a tool that watches what coding agents are allowed to do, with a coding agent, was a little funny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;When an AI coding agent gets attacked, the target isn't the model. It's a config file nobody reviewed. TrustFall, Kiro, and CVE-2025-59536 all hit the same spot.&lt;/p&gt;

&lt;p&gt;How are you handling untrusted repository configs today? Sandbox everything, review configs by hand, or just open them and hope?&lt;/p&gt;

&lt;p&gt;Repo, demo, and the config-watching details: &lt;a href="https://github.com/Ju571nK/sigil" rel="noopener noreferrer"&gt;https://github.com/Ju571nK/sigil&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/" rel="noopener noreferrer"&gt;TrustFall (Adversa AI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/" rel="noopener noreferrer"&gt;Caught in the Hook: CVE-2025-59536 / CVE-2026-21852 (Check Point Research)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://embracethered.com/blog/posts/2025/aws-kiro-aribtrary-command-execution-with-indirect-prompt-injection/" rel="noopener noreferrer"&gt;AWS Kiro: Arbitrary Command Execution with Indirect Prompt Injection (Embrace The Red)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>security</category>
      <category>devsecops</category>
    </item>
  </channel>
</rss>
