The email
1:39 AM, April 21. Cerebral Valley:
Your status for "Built with Opus 4.7: a Claude Code hackathon" has been updated to APPROVED.
500 builders. $100K pool. Seven days with Opus 4.7 and the Claude Code team watching live.
The product I'm pushing further this week
I've been shipping Verified Skill for months. This week, with Opus 4.7, I compress the next quarter of roadmap into seven days.
Verified Skill sits at the intersection of three gaps every AI agent ecosystem shares:
- Security: skills execute code, access tools, read files. No industry-wide scanning.
- Quality: no eval framework, no benchmarks, no way to grade skills.
- Distribution: no semver, no signed releases, no audit trail.
Think npm + Snyk + Jest — but for AI agent skills, across every agent platform.
Why model-agnostic and agent-agnostic matter
SKILL.md isn't a Claude-only format. It's the de facto standard across 49 agent platforms:
- CLI: Claude Code, Cursor, Copilot, Windsurf, Codex, Gemini CLI, Amp, Cline, Roo Code, Goose, Aider, Kilo, Devin, OpenHands, Qwen Code, Trae
- IDE: VS Code, JetBrains, Zed, Neovim, Emacs, Sublime, Xcode
- Cloud: Replit, Bolt, v0, GPT Pilot, Plandex, Sweep
One skill, 49 places it can run. But only if the infrastructure exists.
What already ships
vskill — the universal CLI
vskill install security-scanner
vskill audit
vskill update --all
vskill scan ./my-skill
vskill blocklist
49 agent platforms. Model-agnostic. Deduplication (install once, works across every agent you have). vskill.lock for reproducibility.
Skill Studio — npx vskill studio
The part nobody else is building. A 100% local eval framework. No cloud. No telemetry.
npx vskill init
npx vskill eval init my-skill # auto-generate tests from SKILL.md
npx vskill eval serve # visual dashboard
npx vskill eval run my-skill # run benchmarks
npx vskill eval sweep # cross-model testing
Three eval modes:
- Benchmark: tests WITH the skill, grades assertions
- A/B comparison: blind side-by-side, skill vs. baseline, semantic grading
- Activation test: does the skill trigger when it should?
Multi-model by default: Claude, GPT, Gemini, Llama, Ollama. Bring your own adapter. MCP-referencing skills get simulated tool responses automatically — no live API calls needed.
verified-skill.com — the registry
3-tier trust: Scanned → Verified → Certified. 52 security scan patterns. Discovery, eval result display, audit trail. Publishing gated on security scan.
What I'm pushing this week with Opus 4.7
- Agent-aware generation: same skill source, tailored output per target agent. Strip Claude-specific fields for Cursor. Add Codex-specific guidance for Codex. One author, every platform.
- Smart routing: feature filtering based on target-agent capabilities.
- Deeper eval loops: regression detection hooked into CI.
Stack
CLI: Node.js 20 + ESM
Registry API: Cloudflare Workers + D1 + Prisma
Dashboard: Next.js 15 (App Router)
Build loop: Opus 4.7, orchestrated through SpecWeave — my open-source spec-driven dev framework for Claude Code. Every feature starts as a spec, generates a plan and tasks, and closes through automated quality gates (code review, simplify, grill, judge-LLM). It's what makes 7-day ships possible.
The stakes
$100K prize pool. Judges: Boris Cherny, Cat Wu, Thariq Shihipar, Lydia Hallie, Ado Kukic, Jason Bigman from the Claude Code team.
Seven days. The hackathon runs on Claude. The product runs everywhere.
Daily build logs here. Follow along.
Top comments (0)