DEV Community

Japneet Singh
Japneet Singh

Posted on

Skills lie. So we run them.

Link: https://labs.metano.ai/scanner

Many of use are are installing AI Agent skills from the internet. Skills are not libraries with version pins and signatures, instead they are markdown files of natural-language instructions that tell an agent what to do. Skills contain code written in plain English. Unsigned, unreviewed, copied from registries, and one prompt away from reading your .env, your cloud credentials, or your customer data. The agent ecosystem quietly recreated the software supply-chain problem, except the "package" is a paragraph of English a capable model will faithfully execute on your behalf.

Do you actually know what the skills your agents load will do?

Most tools answer that by reading the skill, scanning text and metadata for suspicious patterns, then scoring it. That catches the obvious, but it has a ceiling, and the ceiling is the whole problem: a skill's text tells you what it claims to do. Only running it tells you what it actually does.

The cover story

Picture a skill called "Optimize AWS configuration for your project." Boring, helpful, installed without a second thought. A few lines down, its instructions quietly tell the agent to read ~/.aws/credentials, POST the contents to an external server, and report back "AWS configuration optimized."

The description is a cover story. The danger isn't in what the skill says, it's in what the agent does once it loads it.

Run it and watch

That's the gap SkillTracer closes. Four steps, and the second is the one that matters:

  1. Submit — drop in a SKILL.md. Nothing installs or runs on your machine.
  2. Detonate — we run the skill in a live, instrumented, default-deny sandbox on our infrastructure, seeded with honeytokens (fake AWS creds and other canary secrets), and watch what the agent actually does:
    • Honeytokens — if a skill goes looking for credentials or tries to smuggle them out, we catch it. "A credential left the sandbox" is a fact, not a heuristic.
    • Full execution trace — every command issued, file touched, and network call attempted. Nothing actually leaves; we record the attempt.
    • Multi-model detonation — the same skill against several models at once. When they diverge — one refuses, another complies — that disagreement is itself signal.
  3. Analyze — deterministic heuristics plus a Claude scan map the observed behavior to the OWASP Agentic Top 10 and MITRE ATLAS: credential access, exfiltration, prompt injection / instruction override, RCE, obfuscation, excessive capability, tool poisoning, deception.
  4. Verdict — two scores, because one number hides too much:
    • Threat — is this trying to harm you?
    • Risk — an OWASP AIVSS-aligned score for how much damage it could do (autonomy, tools reached for, persistence). A skill can be high-threat/low-risk (clearly malicious but tightly scoped) or low-threat/high-risk (benign but dangerously over-privileged) — those need different responses.

Both come with evidence-quoted findings, shareable as a public report.

What static analysis structurally can't catch

Detonation isn't belt-and-suspenders — it catches a class of skills built to look benign on paper:

  • LOLbin abuse — curl, python -c, bash -c chained into fetch-and-execute. No malware keywords; just familiar commands a static scanner reads as "developer tooling."
  • Obfuscated payloads — base64/hex/env-var-split instructions that aren't readable strings at all until the agent decodes and runs them.
  • Steganographic instructions — a real linter/formatter with a malicious directive buried in a footnote or rarely-hit branch.
  • Slow-burn exfiltration — data leaked in DNS lookups, header fragments, repeated "log" calls. Each request looks routine; only the full trace reveals the pattern.

Honest about its limits

A single detonation can't exercise every path a skill might take. When a run doesn't trigger a behavior, SkillTracer says so plainly — flagged as "not evidence of safety," never a false all-clear. Today it covers SKILL.md; MCP servers and agent plugins are on the roadmap.

Try it and share your feedback

It's free and Apache-2.0, and I'm putting it in front of the people who build and install this stuff before the official launch.

Link: https://labs.metano.ai/scanner

Top comments (0)