How to Audit Your AI Agent Skills for Credential Exposure and Malicious Instructions

#mcp #security #ai #vulnerabilities

Two independent security research groups published this week with findings that land on the same problem from different angles: AI agent skill files are a serious and underaudited supply chain surface, and the attack techniques targeting them are already in active use.

The Scale Finding

Capsule Security's analysis covered more than 200,000 agent skill files and 160,000 code files. The result that stands out: 2,909 of 19,618 distinct skill files carry hardcoded credentials alongside direct database write access. Roughly 15% of distinct skill files in active use. No additional exploit is required. Install the skill, the agent reads the skill configuration, the credentials are there.

The same analysis found that AI workloads present a supply chain attack surface six times larger than traditional software. It also observed that malicious skills continue to persist and propagate after the campaigns that distributed them are officially terminated.

The Active Campaign

A separate disclosure published the same week documents a March 2026 campaign targeting a popular AI coding agent framework. Attackers published deceptive community skills that appeared legitimate at a glance. The payload delivery mechanism was not a traditional malware dropper. It was the installation instruction inside the skill file itself.

The skill's installation instructions directed the agent to perform operations that installed Remcos RAT and GhostLoader. The agent followed those instructions because that is exactly what installation instructions are for. No user interaction beyond installing the skill was required.

This is a distinct campaign from the January 2026 supply chain attack covered in prior security reporting. Different delivery mechanism. Different payloads. The point of connection: both used the skill ecosystem as the distribution channel.

What the Attack Surface Looks Like

An AI agent skill typically consists of a few components:

A metadata file (often named SKILL.md or similar) containing the skill's name, description, and installation instructions
Configuration specifying what tools, permissions, and external resources the skill uses
Optionally, code files the skill executes

The attack surface is broader than the code. The metadata file, particularly the installation instructions, is executed by the agent as part of skill setup. An agent that reads and follows installation instructions is following arbitrary instructions from whoever wrote that file. If the file was tampered with or written by a threat actor, those instructions are arbitrary commands.

The credential exposure problem is a separate issue: skill files that embed API keys, database connection strings, or other credentials expose those values to every developer who installs the skill, to the agent that reads the configuration, and to anything else in the agent's context window.

How to Audit Your Skills

Step 1: Inventory what you have. List every skill file currently active in your agent environment. For community-sourced skills, note the source and whether the version has changed since you installed it.

Step 2: Check skill metadata for credentials. Search skill configuration files for patterns that suggest embedded credentials: connection strings, API key patterns, private key markers. A regex scan for common credential patterns across skill metadata is a reasonable first pass.

Step 3: Review installation instructions for anomalies. Read the installation instruction sections of skill files, particularly community-sourced ones. Installation instructions that invoke shell commands, download additional packages from unverified sources, or reference external URLs outside the skill's stated purpose are worth investigating.

Step 4: Check skill versions and provenance. Skills that have changed since their last verified install are a flag. Skills from sources without a clear maintainer are a flag. If a skill you installed months ago now behaves differently, that is worth examining.

Step 5: Treat skill installs as supply chain events. The same controls that apply to adding a dependency to package.json should apply to adding a skill to an agent environment. Review what it does, check the source, pin to a specific version.

How Armor1 Approaches This

Armor1's skill security scanner evaluates every skill file before execution. The scanner checks for hardcoded credentials and credential misuse patterns, malicious installation instructions, data exfiltration patterns embedded in skill configuration, and supply chain risks such as references to unverified external packages or remote code in skill definitions. The scanner runs two passes: an initial analysis and a verification pass to reduce false positives.

The credential exposure Capsule Security found at scale and the installation instruction attack vector documented in the March 2026 campaign both fall inside the categories the scanner evaluates.

Check the risk of any MCP server in your environment with Armor1's free public catalog.

To cover every agentic app, MCP, tool, skill, and plugin across your stack, sign up free Here.

Top comments (1)

Truong Bui • May 15

The part about installation instructions as the delivery mechanism for the March campaign is the detail that changes the threat model most. Traditional supply chain scanning focuses on code — checking packages against CVE databases, auditing dependencies, looking for known-bad patterns in executable files. But if the attack surface is the instruction set the agent reads and follows during setup, that's a fundamentally different category of input. You can't catch it with a dep scan.

The hardcoded credentials finding at scale tracks with what we've seen scanning MCP servers at MCPSafe (mcpsafe.io) — we built it as a pre-install scanner that accepts a GitHub URL, npm package, or PyPI package and returns line-level findings. Hardcoded secrets came out as the single most common class across 508 servers scanned: 22% had at least one. The distribution is uneven — a handful of servers carry multiple credentials across config and code files — but the floor is low enough that it shows up in a meaningful share of community-published servers.

The point about skills that persist and propagate after campaigns terminate is the part scanner tooling doesn't fully address. You can catch a credential at install time. Catching a skill that was clean when you installed it and is now serving a different payload requires version monitoring and re-scanning, not a one-time check. That gap seems like the next thing to solve.