DEV Community

Cover image for From ClawHavoc to Trust Shield: How a Security Incident Inspired Trust Infrastructure for AI Agents
Rotifer Protocol
Rotifer Protocol

Posted on • Originally published at rotifer.dev

From ClawHavoc to Trust Shield: How a Security Incident Inspired Trust Infrastructure for AI Agents

In February 2026, the Claw ecosystem experienced its worst security incident: ClawHavoc. 1,184 malicious Skills were discovered on ClawHub — credential theft, reverse shells, prompt injection — affecting over 300,000 users at a peak infection rate of 12%.

The community's response was swift: VirusTotal scanning, manual audits, emergency takedowns. But once the dust settled, an uncomfortable question remained:

How do you know a Skill is good — not just "not a virus"?

VirusTotal tells you whether code contains known malware signatures. It doesn't tell you whether the code is well-structured, whether it accesses more permissions than it needs, or whether it does what it claims to do. The gap between "not malicious" and "actually trustworthy" is where Trust Shield lives.


The Trust Gap

ClawHub hosts over 13,000 public Skills. Before ClawHavoc, the quality signal available to developers was:

  1. Download count — popularity, not quality
  2. Star ratings — subjective, gameable
  3. "Verified" badge — means the author is real, not that the code is safe

None of these answer the question a developer actually asks before installing a Skill: "Will this code do something I don't expect?"


V(g): Static Analysis for Agent Capabilities

Trust Shield introduces V(g) safety scanning — a lightweight AST-based static analyzer that reads Skill source code and reports objective findings. No AI, no heuristics, no opinion — just pattern matching against 7 rules:

Grade Meaning Badge
A Zero critical + zero high-risk patterns Green
B Zero critical, ≤2 high-risk with justified usage Light green
C Zero critical, >2 high-risk patterns Yellow
D ≥1 critical pattern (eval, command injection, obfuscation) Red
? Prompt-only Skill (no source code to scan) Grey

The scanner detects patterns like eval(), child_process.exec(), base64-decode-then-execute chains, undeclared network calls, and environment variable harvesting. Each finding includes the file, line number, and code snippet — not a judgment, just a fact.

What V(g) is not: It's not a replacement for VirusTotal. It's not a guarantee of safety. It's a complementary signal that fills the gap between "not a known virus" and "trustworthy enough to install."


Trust Badges: One Line of Markdown

Every scanned Skill gets a badge powered by badge.rotifer.dev — a Cloudflare Worker that serves shields.io-compatible JSON endpoints:

![Rotifer Safety](https://img.shields.io/endpoint?url=https://badge.rotifer.dev/safety/@author/skill-name)
Enter fullscreen mode Exit fullscreen mode

Skill authors can embed this in their README with zero setup. The badge updates automatically when the Skill code changes and gets re-scanned.

For Rotifer Genes (not just ClawHub Skills), additional badges are available:

  • Reputation score — R(g) from the Gene Registry
  • Fitness score — F(g) from Arena competition
  • Developer reputation — aggregate score across all published Genes

Why This Matters Beyond Security

Trust Shield is the first layer of what we call Trust Infrastructure for the Claw ecosystem. The scanning rules today are intentionally conservative — they report objective patterns without making intent judgments. But the architecture is designed to evolve:

Today (v0.7.9): Static AST scanning. Binary safe/unsafe patterns. Badge generation.

Next: Quality metrics. Does the Skill handle errors? Does it clean up resources? Does it do what its description claims?

Eventually: The same fitness function F(g) that evaluates Rotifer Genes — measuring actual runtime behavior, not just code patterns — applied to the broader Claw Skill ecosystem.

The path from "not a virus" to "actually good" is long. Trust Shield is the first step.


Try It

Scan any ClawHub Skill:

npm install -g @rotifer/playground
rotifer vg scan ./path-to-skill
Enter fullscreen mode Exit fullscreen mode

Or generate a badge at rotifer.dev/badge.

The scanner, badge service, and CLI are all open source. We built Trust Shield because the Claw ecosystem needed it — and because building trust infrastructure for AI agents is exactly what Rotifer Protocol was designed to do.

Top comments (0)