divyaprakash D

Posted on Apr 26

ClawForge — A Trustworthy, Cross-Variant Skill Hub for the OpenClaw Ecosystem

#devchallenge #openclawchallenge #openclaw #architecture

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Challenge.

What I Built

ClawForge is a curated-from-scratch skill hub for the OpenClaw ecosystem — 30 hand-engineered skills, 6 variant installers, 3 persona "souls", a security trio, and a local cross-architecture evidence harness, all behind a single SKILL.md standard.

It exists because the obvious place to get OpenClaw skills today — ClawHub, with its ~13,729 community uploads — has a trust problem. Roughly 1-in-5 of those skills ship patterns I wouldn't run on a borrowed laptop, and the ClawHavoc campaign already published ~300 outright trojan skills before takedown. "Just claw install whatever" is the new curl | sudo bash.

ClawForge is the boring, opinionated alternative:

30 skills, not 30,000. Every skill is engineered from scratch against one schema — no re-uploads, no forks of forks, no anonymous maintainers.
6 variants, one installer. OpenClaw, ZeroClaw, PicoClaw, NullClaw, NanoBot, IronClaw — each with different runtimes (Node, Rust, Go, Zig, Python, WASM) and different blast radii. One install.sh figures out what your target actually supports.
Static security tiers, not vibes. Every skill declares an L0/L1/L2/L3 tier in frontmatter — read-only, scoped-write, network, or privileged. Scanners reject anything that lies about its tier.
A local QEMU evidence ladder. I don't claim "works everywhere." I claim "x86_64 = emulated guest pass, arm64/armv7/riscv64 = usermode smoke pass, i386 = metadata only," with JSON+Markdown receipts.

Repo: github.com/divyaprakash0426/clawForge
Release: v1.0.0

If ClawHub is npm circa 2018 — open, sprawling, and quietly malicious — ClawForge is trying to be the Debian stable of skills: smaller, slower, signed for what it claims to do.

How I Used OpenClaw

OpenClaw's whole pitch is "AgentSkills are portable." The interface really is portable — every variant speaks the same skill protocol. But execution isn't. A skill that works on OpenClaw (1.5GB Node toolchain) will silently die on NullClaw (1MB Zig binary on a RISC-V board). The blueprint for ClawForge fell out of trying to make that real instead of marketing.

Here's how OpenClaw's design actually shaped the build, and where I had to push back on it.

The `SKILL.md` standard — front-loading every honest constraint

Every skill in ClawForge is a directory with a single SKILL.md. The frontmatter is the contract:

---
name: skill-sentinel
version: 1.2.0
tier: L1                       # read-only audit, no network
variants:
  openclaw: full
  zeroclaw: full
  picoclaw: full
  nullclaw: partial             # no JSON schema validator on Zig build
  nanobot: full
  ironclaw: unsupported         # WASM sandbox blocks subprocess scan
entrypoint: ./run.sh
permissions:
  fs: [read]
  net: []
  exec: [grep, awk, jq]
---

Two design decisions worth defending:

Per-variant compatibility is full | partial | unsupported, not a boolean. Half the pain in the ecosystem comes from skills that pretend to be universal. partial forces the author to write a COMPAT.md explaining what's degraded and why.
Tiers are static, not dynamic. I prototyped a "risk score" early on. It was useless — every author rationalizes their own skill into the green zone. L0–L3 are coarse, declarative, and trivially auditable: if you ask for net permissions in an L1 skill, the scanner just rejects you.

The variant-aware installer

The installer is plain POSIX shell, but it does the one thing OpenClaw itself can't: it refuses to install skills that won't run.

$ ./install.sh --variant nullclaw
[forge] target: nullclaw (zig, riscv64, ~1MB)
[forge] resolving 30 skills against compatibility matrix…
[forge]   ✓ 24 full
[forge]   ⚠ 4 partial   (degrade to L1, see COMPAT.md)
[forge]   ✗ 2 unsupported  skipped: bedrock-rag, kube-scout
[forge] staging to ~/.nullclaw/skills/  (atomic)
[forge] running skill-sentinel on staged tree…
[forge] OK — 28 skills installed, 2 skipped, 0 quarantined

No "install everything and hope." If a skill's frontmatter lies about its variant support, the matching scanner test fails in CI before the skill is ever published.

The security trio

Three skills act as the distribution's immune system, and they all run on every PR:

Skill	Job	Tier
`skill-sentinel`	Static analysis of every `SKILL.md` + entrypoint — tier/permission lies, shell injection patterns, suspicious network calls	L1
`prompt-fence`	Scans souls/personas for prompt-injection payloads and hidden instructions	L1
`secret-guard`	Pre-commit + CI scanner for committed secrets, mirrored config across local + CI	L1

Sentinel output is intentionally loud:

[skill-sentinel] scanning ./skills/arxiv-scout
  ✗ FAILED  declares tier=L1 but entrypoint contains: curl https://...
  → tier L1 forbids network egress. Either:
      (a) bump declared tier to L2 and document it in COMPAT.md, or
      (b) move the fetch into a separate L2 helper skill.
exit 1

This is the part OpenClaw doesn't give you out of the box — and the part I'd argue every claw distribution needs before it goes near a production box.

The scaffolder — keeping the schema honest

The whole standard collapses if writing a compliant skill is harder than writing a non-compliant one. So:

$ ./tools/forge-skill.sh my-new-skill --tier L1 --variants full,full,partial,unsupported,full,full
[forge-skill] generated skills/my-new-skill/
[forge-skill]   SKILL.md       (frontmatter + checklist)
[forge-skill]   run.sh         (entrypoint stub with safe defaults)
[forge-skill]   COMPAT.md      (one section per non-full variant)
[forge-skill]   tests/smoke.sh (skill-sentinel + prompt-fence dry run)
[forge-skill] new skill ready in 27s. Run: ./tools/validate_skills.py my-new-skill

A new compliant skill, end-to-end, in under 30 seconds. The validators (tools/validate_skills.py, tools/validate_souls.py) are stricter than the scaffolder — so the only path that passes CI is the one that passes the scaffolder and the validators.

The cross-architecture evidence harness

This is the part I'm proudest of, and the part that's deliberately not in the public repo.

OpenClaw runs on a deeply heterogeneous fleet — laptops, Pi-class boards, RISC-V dev kits, sandboxed WASM. "It works on my Arch x86_64 box" is a useless claim for a skill hub. So I built a local-only harness under .local/clawforge-vm/ (gitignored) that produces honest receipts:

.local/clawforge-vm/
├── run-host-lane.sh             # 21 deterministic checks on the host
├── bootstrap-e2e-vm.sh          # disposable Ubuntu Jammy KVM guest, reruns the host lane inside
├── run-usermode-arch-lane.sh    # qemu-*-static smoke for arm64, armv7, riscv64
└── report-arch-evidence.sh      # emits JSON + Markdown summary

The host lane runs all 6 variant installers, every validator, every scanner, the soul installer, scaffolder round-trip, and mock flows for kube-scout, permission-lens, docker-hygiene, arxiv-scout, bedrock-rag, and tf-copilot — 21 checks, deterministic, on every host change.

The arch lane does what it can do honestly:

[arch-evidence] summary
  x86_64   : emulated-guest  ✓  21/21 checks (Ubuntu Jammy KVM)
  arm64    : usermode-smoke  ✓  installers + sentinel  (Ubuntu Jammy rootfs)
  armv7    : usermode-smoke  ✓  installers + sentinel  (Alpine 3.20 rootfs)
  riscv64  : usermode-smoke  ✓  installers + sentinel  (Ubuntu Jammy rootfs)
  i386     : metadata-only   —  no rootfs staged, frontmatter-only

That's it. No "fully supported" claims I can't back up. The same script writes a JSON receipt I can check into the next release notes.

Demo

End-to-end proof — running inside a disposable QEMU VM

The GIF below is an unedited asciinema recording of the full ClawForge pipeline running inside a fresh Ubuntu Jammy KVM guest — not on my host machine. 21 deterministic checks. Zero edits.

In the meantime, the project is fully runnable today:

Repo: github.com/divyaprakash0426/clawForge
Release: v1.0.0
Topics: openclaw · claw-ecosystem · ai-tools · skill-hub · security · shell · open-source

Try it in 60 seconds

# Clone
git clone https://github.com/divyaprakash0426/clawForge.git
cd clawForge

# Install the full skill set against your detected variant
./install.sh --variant openclaw     # or zeroclaw, picoclaw, nullclaw, nanobot, ironclaw

# Or pull a single skill into an existing claw install
./install-skill.sh skill-sentinel --variant zeroclaw

# Scaffold your own compliant skill in under 30 seconds
./tools/forge-skill.sh my-skill --tier L1

# Validate the entire repo (what CI runs on every PR)
python3 tools/validate_skills.py
python3 tools/validate_souls.py

What I Learned

1. AgentSkills are portable. Skill execution isn't.

The single biggest mismatch between OpenClaw's marketing and reality: every variant agrees on the skill interface, but the skills themselves silently assume Node, or bash, or curl, or 200MB of RAM. The variants: full | partial | unsupported field only exists because I tried to claim "universal" once and got humiliated by NullClaw.

If you're building anything for a multi-variant ecosystem: front-load the compatibility table in machine-readable frontmatter, then make your installer refuse to lie.

2. Static tiers beat dynamic risk scores every time.

I genuinely tried the dynamic-scoring approach first — heuristics over the entrypoint, weighted permission analysis, "danger floats." It was a beautiful dashboard producing useless numbers. Authors gamed it within an afternoon.

Four flat tiers (L0 read-only metadata, L1 read-only fs, L2 scoped write + network, L3 privileged) are coarse enough that arguing about them is pointless and fine-grained enough that the scanner can mechanically reject lies. Boring, declarative, correct.

3. The harness finds the bugs that demos hide.

Three real bugs the local lane caught that I would 100% have shipped otherwise:

bedrock-rag mock path bug — relative-path assumption that worked from the repo root, broke instantly when the installer staged it under ~/.openclaw/skills/.
arxiv-scout non-determinism — model-eval cache used a hash of the wall-clock minute as a salt. Passed once locally, failed every CI re-run inside the QEMU guest.
Alpine busybox absolute-symlink trap on armv7 — busybox's ln -s resolves absolute symlinks against the host root, not the rootfs. The arm64 Ubuntu lane was happy. The armv7 Alpine lane was a pile of dangling links until I switched to relative symlinks everywhere.

None of these would have been caught by "looks fine on my machine." Build the harness early. Even a usermode-smoke lane is worth ten "trust me" tweets.

4. "30 engineered" beats "300 curated" for a security story.

The first plan was bigger — pull in the most-starred ClawHub skills, audit them, re-ship. I killed that within a week. Auditing someone else's skill is roughly the same effort as writing one from scratch, and you inherit their bus factor and their git history. 30 skills I wrote, against one schema, with one threat model, is a story I can defend. 300 curated ones is a story I'd have to keep apologizing for.

5. The scaffolder is the standard.

A schema is just a wishlist until the easiest path to a new skill is also the only compliant path. forge-skill.sh exists because every skill I added to the repo by hand had at least one frontmatter typo. Now the scaffolder produces compliant skills, and the validators reject anything that drifts. The standard enforces itself.

ClawCon Michigan

I didn't make it to ClawCon Michigan in person this round — wrong continent, wrong calendar week — but the energy coming out of it (the talks, the trip reports, the ClawHavoc post-mortems, the variant maintainers comparing notes in public) is most of what made me stop hand-wringing and actually ship ClawForge. If you were there: the unsigned-skill-distribution debate from day two is the reason skill-sentinel is opinionated instead of advisory. Thank you. I owe you all a drink at the next one.

Built with stubbornness, a lot of `qemu-system-`, and a deep suspicion of any skill hub that has more skills than it has reviewers.*

Repo: github.com/divyaprakash0426/clawForge · Release: v1.0.0

Comments, PRs, and especially "your tier system is wrong because…" arguments are very welcome.

DEV Community

ClawForge — A Trustworthy, Cross-Variant Skill Hub for the OpenClaw Ecosystem

What I Built

How I Used OpenClaw

The `SKILL.md` standard — front-loading every honest constraint

The variant-aware installer

The security trio

The scaffolder — keeping the schema honest

The cross-architecture evidence harness

Demo

End-to-end proof — running inside a disposable QEMU VM

Try it in 60 seconds

What I Learned

1. AgentSkills are portable. Skill execution isn't.

2. Static tiers beat dynamic risk scores every time.

3. The harness finds the bugs that demos hide.

4. "30 engineered" beats "300 curated" for a security story.

5. The scaffolder is the standard.

ClawCon Michigan

Top comments (0)

What I Built

How I Used OpenClaw

The SKILL.md standard — front-loading every honest constraint

The variant-aware installer

The security trio

The scaffolder — keeping the schema honest

The cross-architecture evidence harness

Demo

End-to-end proof — running inside a disposable QEMU VM

Try it in 60 seconds

What I Learned

1. AgentSkills are portable. Skill execution isn't.

2. Static tiers beat dynamic risk scores every time.

3. The harness finds the bugs that demos hide.

4. "30 engineered" beats "300 curated" for a security story.

5. The scaffolder is the standard.

ClawCon Michigan

The `SKILL.md` standard — front-loading every honest constraint