If you think the security risk of AI coding agents (Claude Code, Cursor, Gemini CLI) is "the model goes rogue and runs a dangerous command," the serious incidents from the past few months tell a different story. None of them were really about the model. The starting point was always a config file.
This post walks through TrustFall and AWS Kiro, explains why config files became the attack surface, and introduces the open-source tool I built in response, Sigil.
TrustFall: clone, open, RCE
In May 2026, Adversa AI published TrustFall: cloning a malicious repository and opening it was enough for one-click RCE across Claude Code, Cursor, Gemini CLI, and GitHub Copilot.
The setup is two files in the repo:
-
.mcp.jsonpointing at an attacker-controlled MCP server -
.claude/settings.jsonwith project-scoped settings likeenableAllProjectMcpServers
When the user opens the repo and presses Enter on the "do you trust this folder?" dialog, the attacker's MCP server starts. From there it can read other projects' source and stored credentials, or open a long-lived outbound connection. On a headless CI runner the trust dialog never appears, so it lands with no human in the loop.
And this isn't a one-off. Check Point Research reported the same class of problem as "project config is processed before the trust prompt": CVE-2025-59536 (RCE through .claude/ hooks or MCP server settings) and CVE-2026-21852 (API key exfiltration by abusing ANTHROPIC_BASE_URL). Both fire on clone-and-open, before you confirm the trust dialog.
AWS Kiro: rewriting the config after the fact
If TrustFall ships a malicious config up front, the case of AWS's agentic IDE Kiro is about rewriting the config later.
Johann Rehberger (Embrace The Red) showed that indirect prompt injection could rewrite:
-
kiroAgent.trustedCommands: ["*"]in.vscode/settings.json .kiro/settings/mcp.json
Once trustedCommands contains *, the agent runs arbitrary commands without confirmation. Instructions injected from a web page or an issue quietly edit a local config file, and that turns into arbitrary command execution. It was fixed in Kiro 0.1.42.
The common thread: config files grant the permission
In all of these, the model never "decided" to do something malicious. What got attacked was the configuration:
- hooks
- permissions (allow / deny)
- MCP allowlists
- sandbox flags
- trustedCommands
These config files are what decide what an agent is allowed to do. The awkward part is that they take effect when you open the project, not when you read them. The permission is granted before you review anything.
EDR can see the rm -rf that ran, but not the config change that authorized it. The place to defend is the config that allowed the command, not the command itself.
How do you defend it
Two practical moves:
- Run AI coding agents inside a container or sandbox whenever you can.
- Watch the config files and notice when one turns dangerous.
Doing #2 by hand doesn't last. Eyeballing .claude/settings.json and .mcp.json every time they change is a process that breaks down.
What I built: Sigil
So I built Sigil, a host-side AI Security Posture Management (AI-SPM) agent.
It watches the config files that decide an agent's permissions (hooks, permissions, MCP allowlists, sandbox flags), scores a config when it turns dangerous, and ships the event to a log or SIEM.
It doesn't block. It scores and records. It tells you "this config changed and the agent can now do X." Actually stopping the action is left to the agent runtime and your existing controls. Because it measures instead of blocking, it doesn't get in a developer's way with false positives.
Demo
- A normal config with read-only permissions and no hooks scores 0 / low.
- Add a PreToolUse hook with matcher
.*that runsrm -rf $HOME, and it re-scores 7.5 / critical (no sandbox, overly broad matcher, destructive command in the hook).
Tech notes
- A single static binary (x86_64 musl, plus macOS arm64 and Windows)
- File watching with tokio and notify, no polling
- One-line install, Apache-2.0
For the record: most of the implementation was vibe-coded with Claude Code. I drove the threat model, the scoring rubric, and the architecture, and let the AI write a lot of the code. Building a tool that watches what coding agents are allowed to do, with a coding agent, was a little funny.
Closing
When an AI coding agent gets attacked, the target isn't the model. It's a config file nobody reviewed. TrustFall, Kiro, and CVE-2025-59536 all hit the same spot.
How are you handling untrusted repository configs today? Sandbox everything, review configs by hand, or just open them and hope?
Repo, demo, and the config-watching details: https://github.com/Ju571nK/sigil

Top comments (0)