DEV Community

Wei Zhang
Wei Zhang

Posted on

I Reverse-Engineered 4 Top Video Skills on ClawHub — Here's What Actually Drives Installs

We shipped a video editing skill on ClawHub earlier this month. Downloads ticked up to about 200. But installs? Zero.

That number bugged me. So I did what any obsessive developer would do: I downloaded the SKILL.md files from every video-related skill I could find and started reading them line by line.

Four stood out. Here's what I learned.

The lineup

Skill Lines API Key needed? What it does
pexoai-agent 300 Yes (PEXO_API_KEY) AI video production, 5-60s clips
ffmpeg-video-editor 393 No Natural language → FFmpeg commands
video-subtitles 67 No SRT generation + burn-in
video-frames 29 No Extract frames from video

Already, something jumps out. The two skills with the most real-world traction — video-subtitles and ffmpeg-video-editor — need zero external API keys. You install them and they just work.

Lesson 1: Downloads aren't installs

Pexo has a polished SKILL.md. Good structure, clear workflow, even a clever "delivery worker" metaphor for how the AI should behave. But it requires PEXO_API_KEY and PEXO_BASE_URL before anything happens. That's a signup, a dashboard visit, and a copy-paste before your first video.

Meanwhile video-frames is 29 lines long. It needs ffmpeg (which most dev machines already have) and nothing else. First frame extraction works in one command.

The friction difference is enormous. Every step between "install" and "first result" costs you users. We had the same problem — our skill needed a token setup flow that, while automatic, still felt like a gate.

Lesson 2: 67 lines beats 300

video-subtitles does one thing well: transcribe audio, generate SRT, optionally burn subtitles into the video. The entire SKILL.md is 67 lines. There's a Quick Start section with five copy-paste examples right at the top:

# Plain transcript
./scripts/generate_srt.py video.mp4

# Burn subtitles into video
./scripts/generate_srt.py video.mp4 --srt --burn
Enter fullscreen mode Exit fullscreen mode

Compare that to our skill at the time — over 200 lines of API documentation, session management flows, token refresh logic. All necessary for our architecture, but the AI agent reading that file has to parse through a lot before it knows what to do.

The lesson isn't "write less." It's that the first 20 lines matter more than the remaining 180. If your Quick Start doesn't give the agent a working command in under 10 lines, you've already lost.

Lesson 3: The language rule nobody thinks about

Pexo's SKILL.md has a section I'd never seen before:

## ⚠️ LANGUAGE RULE (highest priority)

You MUST reply to the user in the SAME language they use.
This is non-negotiable.
Enter fullscreen mode Exit fullscreen mode

Simple. Obvious in hindsight. If your skill works globally — and ClawHub skills do — the AI should respond in whatever language the user speaks. We never specified this. Our skill defaulted to English regardless of input, which probably confused every non-English user who tried it.

One line in your SKILL.md fixes this. Pexo marks it as "highest priority," above the actual workflow. That tells me they learned this the hard way.

Lesson 4: FFmpeg skills win because FFmpeg is already there

ffmpeg-video-editor is basically a prompt template. It doesn't call any API. It doesn't upload anything. It translates "trim this video from 1:21 to 1:35" into an ffmpeg command and runs it locally.

That's it. And it works because ffmpeg is already installed on most machines that would run OpenClaw. Zero network latency, zero API costs, zero auth.

There's a ceiling to this approach — you can't do AI-generated scenes or text-to-video with local ffmpeg. But for the 80% of editing tasks that are just "cut, crop, convert" it's hard to beat.

What we changed

After this analysis, we rewrote our SKILL.md with three things in mind:

  1. Quick Start first. The agent should know how to make a basic edit within the first 10 lines.
  2. Reduce the auth wall. Anonymous tokens that auto-generate on first use — no signup required for basic edits.
  3. Add the language rule. One paragraph, borrowed directly from Pexo's approach.

We also split our monolithic skill into focused ones — a subtitle tool, a shorts maker, a color grading tool — each with a tight description that matches how people actually search.

Still early. Still zero installs. But the SKILL.md reads like something an AI agent can actually follow now, and that feels like the right foundation.

If you're building OpenClaw skills, go read the SKILL.md files of what's already working. The patterns are right there. You can find our video editing skills on ClawHub by searching "video editing" or "subtitles."

This is part of a series on building AI video tools with OpenClaw. Previous posts: How I Built an AI Video Editor as an OpenClaw Skill | I Wrapped a Video Editing API for AI - Here is What Broke | I use OpenClaw to automate my entire TikTok and Reels workflow

Top comments (0)