We shipped a video editing skill on ClawHub earlier this month. Downloads ticked up to about 200. But installs? Zero.
That number bugged me. So I did what any obsessive developer would do: I downloaded the SKILL.md files from every video-related skill I could find and started reading them line by line.
Four stood out. Here's what I learned.
The lineup
| Skill | Lines | API Key needed? | What it does |
|---|---|---|---|
| pexoai-agent | 300 | Yes (PEXO_API_KEY) | AI video production, 5-60s clips |
| ffmpeg-video-editor | 393 | No | Natural language → FFmpeg commands |
| video-subtitles | 67 | No | SRT generation + burn-in |
| video-frames | 29 | No | Extract frames from video |
Already, something jumps out. The two skills with the most real-world traction — video-subtitles and ffmpeg-video-editor — need zero external API keys. You install them and they just work.
Lesson 1: Downloads aren't installs
Pexo has a polished SKILL.md. Good structure, clear workflow, even a clever "delivery worker" metaphor for how the AI should behave. But it requires PEXO_API_KEY and PEXO_BASE_URL before anything happens. That's a signup, a dashboard visit, and a copy-paste before your first video.
Meanwhile video-frames is 29 lines long. It needs ffmpeg (which most dev machines already have) and nothing else. First frame extraction works in one command.
The friction difference is enormous. Every step between "install" and "first result" costs you users. We had the same problem — our skill needed a token setup flow that, while automatic, still felt like a gate.
Lesson 2: 67 lines beats 300
video-subtitles does one thing well: transcribe audio, generate SRT, optionally burn subtitles into the video. The entire SKILL.md is 67 lines. There's a Quick Start section with five copy-paste examples right at the top:
# Plain transcript
./scripts/generate_srt.py video.mp4
# Burn subtitles into video
./scripts/generate_srt.py video.mp4 --srt --burn
Compare that to our skill at the time — over 200 lines of API documentation, session management flows, token refresh logic. All necessary for our architecture, but the AI agent reading that file has to parse through a lot before it knows what to do.
The lesson isn't "write less." It's that the first 20 lines matter more than the remaining 180. If your Quick Start doesn't give the agent a working command in under 10 lines, you've already lost.
Lesson 3: The language rule nobody thinks about
Pexo's SKILL.md has a section I'd never seen before:
## ⚠️ LANGUAGE RULE (highest priority)
You MUST reply to the user in the SAME language they use.
This is non-negotiable.
Simple. Obvious in hindsight. If your skill works globally — and ClawHub skills do — the AI should respond in whatever language the user speaks. We never specified this. Our skill defaulted to English regardless of input, which probably confused every non-English user who tried it.
One line in your SKILL.md fixes this. Pexo marks it as "highest priority," above the actual workflow. That tells me they learned this the hard way.
Lesson 4: FFmpeg skills win because FFmpeg is already there
ffmpeg-video-editor is basically a prompt template. It doesn't call any API. It doesn't upload anything. It translates "trim this video from 1:21 to 1:35" into an ffmpeg command and runs it locally.
That's it. And it works because ffmpeg is already installed on most machines that would run OpenClaw. Zero network latency, zero API costs, zero auth.
There's a ceiling to this approach — you can't do AI-generated scenes or text-to-video with local ffmpeg. But for the 80% of editing tasks that are just "cut, crop, convert" it's hard to beat.
What we changed
After this analysis, we rewrote our SKILL.md with three things in mind:
- Quick Start first. The agent should know how to make a basic edit within the first 10 lines.
- Reduce the auth wall. Anonymous tokens that auto-generate on first use — no signup required for basic edits.
- Add the language rule. One paragraph, borrowed directly from Pexo's approach.
We also split our monolithic skill into focused ones — a subtitle tool, a shorts maker, a color grading tool — each with a tight description that matches how people actually search.
Still early. Still zero installs. But the SKILL.md reads like something an AI agent can actually follow now, and that feels like the right foundation.
If you're building OpenClaw skills, go read the SKILL.md files of what's already working. The patterns are right there. You can find our video editing skills on ClawHub by searching "video editing" or "subtitles."
This is part of a series on building AI video tools with OpenClaw. Previous posts: How I Built an AI Video Editor as an OpenClaw Skill | I Wrapped a Video Editing API for AI - Here is What Broke | I use OpenClaw to automate my entire TikTok and Reels workflow
Top comments (0)