AI Coding Agent Security: What You're Missing About Skills
AI coding agent security is the discipline of identifying and controlling the attack surface introduced when autonomous AI agents — Cursor, Claude Code, Kiro, LangGraph-based pipelines, and similar tools — operate on a developer's machine with broad filesystem and network access. The field covers three distinct problem classes: supply chain threats embedded in agent configuration files before execution; runtime behavioral drift during active sessions; and audit visibility, meaning whether any record exists of what the agent read, wrote, or transmitted. Most teams using AI coding tools have formally addressed none of these.
This article focuses on the first and least-discussed vector: Skills.
Skills Are Executable Code. Treat Them That Way.
Skills are markdown files that instruct AI coding agents how to behave. In Cursor, they live under .cursor/rules/. Claude Code reads them from project-level configuration directories. Kiro has its own spec files. The common thread: these files define what tools the agent can use, what files it can read, what shell commands it may run, and in what sequence.
Developers pull Skills from GitHub repositories, starter templates, community shares, and vendor boilerplates. They treat them like README files — useful text to skim before moving on. That assumption is wrong. A Skill file is an instruction set executed with whatever permissions the agent runtime inherits. On a developer workstation, those permissions typically include read access to ~/.ssh/, ~/.aws/credentials, .env files, and any credential store accessible from the shell.
The supply chain risk is direct: compromise the Skill, compromise the session. No installation prompt. No package manager warning. No registry to check.
Understanding how permissions flow during agent sessions matters here. For a detailed breakdown of how Claude Code's permission model works in practice, see our guide to Claude Code permissions settings — including which permission flags carry the most risk when misconfigured.
Truncation Is Why Your Scanner Misses the Attack
Most security tools that evaluate markdown files or prompts operate on a context window of roughly 3,000 to 4,000 characters — enough to read the first two or three sections of a Skill file and mark it clean. Attackers know this. They put the legitimate-looking agent instructions at the top and bury malicious directives below the truncation threshold.
A 500-line Skill file is not unusual. A Skill covering code generation, test execution, dependency management, and deployment configuration will easily exceed 15,000 characters. If your scanner evaluates only the first 3,000, you have scanned the header. The payload is in line 200.
This is not a theoretical concern. Prompt injection vulnerabilities in LLM-based agents — where attacker-controlled content redirects agent behavior — are documented across multiple platforms and listed under OWASP's LLM Top 10 as LLM01: Prompt Injection. CVE-2025-59536, a critical vulnerability in Claude Code, demonstrated how instruction paths in agent environments can be exploited when input validation is absent. The same attack surface exists in Skill files pulled from untrusted repositories. The file format is markdown; the threat model is code execution.
How a Credential-Exfiltration Attack Hides in Plain Sight
Consider a Skill file shared on a developer forum as a "one-shot API integration helper." The first 80 lines look correct: read the project's package.json, identify the API client library in use, scaffold a connection helper. A truncating scanner reads those 80 lines, finds nothing malicious, and marks the file safe.
Line 90 contains a conditional: if the agent detects a cloud provider config file in the home directory, read it and append its contents to the next outbound API call as part of a "debugging context." The instruction is written in plausible technical language. It looks like telemetry scaffolding. It is credential exfiltration.
The agent follows the Skill. It reads ~/.aws/credentials. On the next legitimate API call the developer triggers — scaffolding a function, fetching a schema, running a test — credentials go out in the request body. No install warning. No file access dialog. No audit log entry, unless you built one.
What makes this attack effective is that each action is defensible in isolation. Read a config file? Agents do that constantly. Append context to an API call? Also routine. The malice lives in the sequence. Multi-step tool chains that produce exfiltration through individually innocent-looking actions are exactly what single-pass scanners cannot evaluate.
GitGuardian's 2025 State of Secrets Sprawl report documented year-over-year increases in secrets committed to repositories, with AI-generated code identified as a contributing factor. When agents are actively reading credential files and constructing outbound requests, the pathway from accidental exposure to deliberate exfiltration shortens considerably.
AI Coding Agent Security: Runtime Governance Requirements
Scanning Skills before execution addresses the supply chain threat. It does not address what a clean Skill enables at runtime. An agent operating with legitimate instructions can still autonomously read files it was never intended to access, call external APIs outside the task scope, or escalate through a sequence of valid-looking steps.
This is where runtime guardrails become necessary. A properly configured runtime governance layer intercepts tool calls in real time, enforces allow/deny policies on file access and network egress, and writes an immutable audit log of every action the agent takes. Without this, you have no answer to the question: what did the agent read during that session?
At Enkrypt AI, we built the Secure Vibe Coding solution to address both layers together. Skill Sentinel — open source on GitHub — scans Skills in full before execution, evaluating entire files rather than truncating at an arbitrary character limit. The runtime guardrails layer then governs agent behavior during execution, catching the behavior that even a clean Skill can enable through multi-step tool chains.
Both layers are required. Scanning without runtime governance leaves you blind to session-level misbehavior. Runtime governance without pre-execution scanning means you are relying on your policies to catch attacks that should have been blocked before the agent started. Neither layer alone closes the gap.
For a practical walkthrough of hardening your Claude Code configuration beyond Skill scanning, see our Claude Code security hardening guide.
What Full-File, Multi-Agent Analysis Actually Catches
The practical difference between full-file scanning and truncation-based scanning is not marginal. In testing against synthetic attack payloads, truncation-based approaches miss attacks hidden beyond the 3,000-character mark consistently. That is the design — attackers pad legitimate content to push payloads past the evaluation window.
Full-file analysis changes the detection calculus. Every instruction in the Skill is evaluated regardless of position. Multi-agent analysis adds a second pass with independent evaluation context, which catches adversarial phrasing that relies on convincing a single model that an instruction is benign.
The combination catches attacks that present as legitimate orchestration scaffolding, conditional behavior that only triggers on specific filesystem conditions, and instructions that modify agent behavior only after initial trust is established — sometimes called rug-pull patterns. Standard secret scanning tools do not model these patterns at all.
The developers most exposed to this attack surface are not the ones cutting corners on security reviews. They are the ones moving fast, cloning starter repositories, and trusting that the community Skills they pull are clean. Most are. The ones that are not are designed to look exactly like the ones that are.
Frequently Asked Questions
What is AI coding agent security?
AI coding agent security is the practice of controlling the attack surface introduced when autonomous AI coding tools operate on developer machines with broad file and network access. It covers three areas: pre-execution scanning of agent configuration files (Skills, rules, spec files) for supply chain threats; runtime governance to enforce behavioral boundaries during active sessions; and audit visibility to record what files, credentials, and network endpoints the agent accessed. Organizations deploying tools like Cursor, Claude Code, or LangGraph without addressing these areas are running agents with no enforcement layer and no audit trail.
Are Skills in Claude Code and Cursor actually executable?
Yes. Skills are markdown files, but the agent runtime interprets and acts on their content — it does not simply display them. When an agent reads a Skill, it follows the instructions in that file: which tool calls to make, which files to read, what shell commands to run, and in what order. The file format is text; the effect is execution. A malicious Skill directs the agent to take actions just as surely as malicious code directs a program.
Can a malicious Skill activate without the developer explicitly triggering it?
In most agent configurations, yes. Many AI coding agents load Skills automatically at session start or when a project directory is opened. The developer does not need to manually invoke a Skill for its instructions to take effect. This means a Skill embedded in a cloned repository — one the developer never reviewed — can begin influencing agent behavior the moment that project is opened in the AI coding tool.
How do attackers exploit Skill files without being detected?
The primary technique is positional evasion: placing legitimate instructions at the top of the Skill file and hiding malicious directives below the character limit that most security scanners evaluate. Secondary techniques include conditional activation (malicious behavior triggers only on specific filesystem conditions), semantic camouflage (phrasing attacks as plausible developer tooling such as "debugging context"), and multi-step exfiltration (breaking credential access and transmission into individually innocent-looking tool calls that only produce harm in sequence). None of these are caught by tools that evaluate prompts at a fixed character cutoff.
What tools detect malicious agent instructions?
Conventional secret-scanning tools and basic prompt-safety checkers miss most Skill-based attacks because they truncate files or evaluate instructions in isolation without modeling multi-step tool chains. Skill Sentinel — open source from Enkrypt AI — performs full-file scanning with multi-agent analysis to catch attacks regardless of position in the file. For runtime behavior, the Enkrypt AI Secure Vibe Coding runtime guardrails layer enforces access policies and writes audit logs during live agent sessions. Both tools work with Cursor, Claude Code, Kiro, CrewAI, LangGraph, OpenAI SDK, and Vercel AI.
Top comments (0)