Claude code

Posted on Jun 25

Vibe Coding Is Not the Risk — Unreviewed Agent Autonomy Is

#securevibecoding

Secure vibe coding is the practice of maintaining rigorous security controls over AI coding agents — governing what Skills they execute, what files they access, and what data they transmit — so that development velocity does not come at the cost of credential theft, supply chain compromise, or unaudited autonomous behavior.

The Risk Is Not the Developer. It Is the Agent.

The term "vibe coding" has attracted a certain amount of skepticism in security circles, most of it misdirected. The concern is not that developers are coding by instinct or moving fast. The concern is that AI coding agents — Cursor, Claude Code, Kiro, and others — now execute instructions autonomously, and almost nobody is governing what those instructions actually do.

When a developer clones a repository, opens it in their AI coding environment, and the agent begins executing Skills from that repository, something significant has happened: untrusted code has been granted access to the developer's local environment. It can read files. It can run shell commands. It can make network requests. And in the default configuration of every major AI coding platform today, it does all of this with no security review, no policy enforcement, and no audit trail.

That is the attack surface. Not the speed of development. Not the informality of the workflow. The attack surface is the gap between what developers think their agent is doing and what it is actually permitted to do.

Skills Are Executable Code, Not Documentation

Most developers treat Skills — the markdown files found in directories like .cursor/skills/ or .claude/skills/ — as configuration or documentation. They are not. They are executable instructions that the AI agent reads and follows, with no installation prompt, no permission dialog, and no review step between the file existing on disk and the agent acting on it.

A malicious Skill does not need to look malicious. It can instruct the agent to read ~/.ssh/id_rsa as part of an ostensibly helpful "set up your development environment" workflow. It can chain a file read with an API call. It can exfiltrate cloud credentials stored in ~/.aws/credentials or a local .env file by embedding the exfiltration step inside a multi-step tool chain where each individual action appears routine.

This is not a theoretical attack. The mechanics are straightforward, and the opportunity is large. Every developer who clones a public repository containing Skills and opens it in an AI coding environment is a potential target. The repository does not need to contain traditional malware. It needs only a carefully written markdown file.

Why Existing Scanners Miss the Attack

Security teams who recognize this risk sometimes reach for existing scanning tools. The problem is that most scanners are not built for Skill files. They truncate file contents at approximately 3,000 characters — a limit that matters enormously here, because attacks can be embedded deep within a markdown Skill file, well past the point where a truncated scan would terminate.

A Skill file that begins with ten paragraphs of legitimate, helpful instructions and embeds a credential-harvesting directive at line 200 will pass a truncated scan cleanly. The scanner reports no threat. The developer sees a clean result. The agent executes the full file.

This is not an edge case. It is a deliberate evasion technique, and it works against the truncation limits baked into general-purpose scanning tools that were never designed for this file format or this attack vector.

The Two Gaps That Leave Teams Exposed

There are two distinct failure modes in how organizations are currently approaching AI coding agent security, and both leave meaningful gaps.

The first gap is the absence of pre-execution scanning. Teams that have not implemented any Skill scanning before agent execution are fully exposed to supply chain attacks delivered through malicious Skills. Any repository a developer touches can serve as a delivery vehicle.

The second gap is subtler and more dangerous: the assumption that scanning alone is sufficient. A Skill file can pass every scan cleanly and still enable harmful behavior at runtime. An agent that reads ~/.ssh/ because a legitimate-looking Skill asks it to as part of an SSH configuration workflow is behaving exactly as instructed — and the Skill itself contains no malicious content. The harm is in the runtime behavior, not the file content.

Runtime guardrails catch what scanning cannot see: autonomous reads of sensitive file paths, unexpected outbound network requests, command sequences that match known exfiltration patterns, and tool chains that combine innocent individual steps into a harmful aggregate action.

Multi-Step Tool Chains and the Audit Trail Problem

AI coding agents are not just executing single commands. They are orchestrating sequences of tool calls — reading a file, then writing output to another location, then making an API call, then reading another file — in chains that can span dozens of steps. Each individual step, viewed in isolation, may appear entirely routine. The harm emerges from the sequence.

Consider a Skill that instructs an agent to: read the project's .env file to "validate environment variable names," then call an external API to "check documentation," passing environment variable values as query parameters. No individual step is obviously malicious. Together, they constitute credential exfiltration.

Without an audit trail, this sequence is invisible. A developer who never reviews agent logs — and most developers do not, because most AI coding platforms do not surface them in a useful format — has no way to know this happened. There is no alert, no warning, no record. The credentials are gone.

At Enkrypt AI, we built our runtime governance layer specifically to address this. We track what commands agents run, what files they read, what network requests they make, and what data leaves the local environment — and we surface this as structured, reviewable audit output, not raw logs that require interpretation.

Skill Sentinel: Pre-Execution Scanning Built for This File Format

Skill Sentinel is Enkrypt AI's open source scanner designed specifically for AI agent Skill files. It scans the full content of Skill files — not truncated at 3,000 characters — before those Skills execute. It is built to handle the markdown format that Skill files use, understand the instruction patterns that agentic execution follows, and identify directives that pose supply chain risk.

It works with the AI coding platforms your team is already using: Cursor, Claude Code, Kiro, CrewAI, LangGraph, OpenAI SDK, and Vercel AI. The integration is designed to fit into existing development workflows without requiring developers to change how they work — the scan happens before execution, and it happens automatically.

Skill Sentinel catches attacks that general-purpose scanners miss, including content embedded beyond the truncation threshold of competing tools. It provides a clean signal: either the Skill is safe to execute, or it is not, with enough specificity to explain why.

Why You Need Both Layers

Skill Sentinel is necessary. It is not sufficient.

The runtime environment is where agents actually cause harm, and a Skill that scans clean can still instruct an agent to do something dangerous. An agent operating autonomously — reading files, executing commands, making network calls — without runtime policy enforcement is ungoverned regardless of what its Skill files contain.

Runtime guardrails define what agents are permitted to do during execution: which file paths they may read, which network destinations they may reach, which command patterns are allowed, and which sequences trigger an alert or a block. This is policy enforcement at the agent level, applied dynamically as the agent operates.

The combination is what secure vibe coding actually requires:

Pre-execution Skill scanning catches supply chain attacks before they run
Runtime guardrails catch misuse that clean Skills can still enable
Audit trails provide visibility into what agents actually did, regardless of whether an alert fired

Organizations that implement only scanning are exposed to runtime exploitation. Organizations that implement only runtime guardrails are exposed to supply chain attacks that could have been stopped before execution. Neither layer alone closes the gap.

What Secure Vibe Coding Looks Like in Practice

A development team operating under secure vibe coding practices does not slow down their AI-assisted development workflow. They add a pre-execution gate that most developers never see unless it fires. They define runtime policy that governs agent behavior without requiring developers to review every tool call. And they have access to audit output that answers the question "what did the agent actually do?" whenever they need to ask it.

The developer experience is largely unchanged. The security posture is fundamentally different.

For security engineers and engineering leaders, the value is in the visibility and the policy control. For the first time, there is a structured answer to "what are our AI coding agents doing with access to our development environment?" — not a log dump, not a vendor promise, but auditable, policy-governed output that can be reviewed, retained, and acted on.

The Window to Address This Is Now

AI coding agent adoption is accelerating. Every engineering organization that is moving toward AI-assisted development workflows is expanding the attack surface described in this article. The Skills that agents execute will multiply. The repositories that contain them will proliferate. The attack techniques targeting this surface will become more sophisticated.

The cost of implementing two-layer protection now — before an incident — is low. The cost of investigating and recovering from a credential exfiltration event that moved through an AI coding agent, with no audit trail and no forensic record, is high.

At Enkrypt AI, we built the Secure Vibe Coding solution because we saw this gap forming and believed it required a purpose-built response: not adapted general-purpose tooling, but security infrastructure designed specifically for the way AI coding agents work, the file formats they consume, and the attack patterns that target them.

Vibe coding is not the risk. Unreviewed agent autonomy is. And there is a practical, deployable answer to it available today.

DEV Community