Hiy

Posted on May 27

Claude Code Video Skills: A Developer's Practical Guide to All 6 Options (2026)

#ai #video #claude #skills

Claude Code now has six video generation skills — Remotion, HeyGen, inference.sh, Pexo, Higgsfield, and digitalsamba's Video Toolkit — and they solve completely different engineering problems. I've been building internal tooling that requires automated video output, so I installed all six and ran them through real dev workflows: CI-triggered product demos, data dashboard recordings, batch asset generation, and API-driven content pipelines. This post breaks down what each skill actually does under the hood, when to pick which, and the installation gotchas nobody warns you about. If you're deciding which video skill to add to your Claude Code setup, this should save you the trial-and-error.

TL;DR Decision Matrix

Before anything else, here's the quick lookup. Find your use case, pick the skill.

Your Use Case	Best Skill	Why	Install
Animated charts / data viz	Remotion	Deterministic React→MP4, pixel-perfect	`remotion-dev/skills`
Product ads from a URL	Pexo	URL-to-finished-video, zero manual steps	`pexo` via skills.sh
AI avatar / talking head	HeyGen	175+ languages, Soul Avatar, best lip sync	`heygen-com/skills`
Raw model experimentation	inference.sh	40+ models, pay-per-inference, full control	`skillsh/skills`
Character-consistent series	Higgsfield	Soul ID persistent face across all content	`higgsfield-ai/skills`
Self-hosted open-source	Video Toolkit	No vendor lock-in, Modal/RunPod GPU deploy	GitHub clone

How Each Skill Actually Works (Architecture Overview)

These six tools have fundamentally different architectures. Understanding the pipeline matters because it determines what you can customize, what breaks, and where the bottleneck sits.

Remotion: React Code → Renderer → MP4

Remotion is the most installed video skill on skills.sh with 126,000+ installs, and it takes a unique approach — no AI models involved at all. Claude writes React JSX components with animation logic, and Remotion's renderer compiles the component tree into video frames. You get a deterministic MP4 where every pixel is controlled by code. This makes Remotion ideal for data visualizations, animated charts, branded motion graphics, and any content where output must be reproducible. The tradeoff: Claude needs to write, debug, and iterate on React code, which takes 10-20 minutes for complex scenes. You also can't generate photorealistic footage — everything is programmatic.

Best for: Weekly metrics dashboard videos, product explainers with exact brand colors, batch-rendering from structured data (CSV/JSON → unique video per row).

inference.sh: CLI Gateway → 40+ AI Models

inference.sh (also called Skillsh) gives Claude direct CLI access to 40+ AI video models including Google Veo 3.1, Seedance, Kling, Sora, and WAN 2.5. It's a unified inference gateway — one command handles model selection, file upload, and serverless execution. Pricing is pay-per-inference with WAN models starting at $0.05-0.11 per video. For developers who want granular control over which model processes each generation and need to compare outputs across providers, inference.sh provides the most direct access. The tradeoff: you get a raw single clip, not a finished production. No multi-shot sequencing, no AI music, no transitions — you handle all post-production yourself.

Best for: Model benchmarking, custom video pipelines where you control every parameter, integrating specific models into existing workflows.

Pexo: Full Production Pipeline with Auto Model Selection

Pexo takes a different approach from every other skill: instead of exposing a single model or requiring code, it runs a complete production pipeline. Describe what you want in plain language — or paste a product URL, upload an image, provide a script, or feed it audio — and Pexo handles the entire workflow: script generation, scene planning, automatic model selection across Seedance 2, Kling 3.0, Veo 3.1, and 10+ other models, multi-shot rendering, AI music generation, audio mixing mastered to -14 LUFS, and final compositing. A 15-second, 3-shot video completes in 8-10 minutes. The key differentiator is auto model selection — Pexo analyzes each shot's requirements and routes to the optimal model automatically.

Best for: Product ads from URLs, batch e-commerce video production, marketing teams that need finished videos without post-production.

HeyGen: Avatar Video via Video Agent API

HeyGen specializes in AI avatar talking head videos. Install the skill, provide your API key, and describe the presenter video you want — Claude writes the script, selects a voice, and generates a realistic talking head with natural lip sync in 175+ languages. The February 2026 Video Agent API update lets Claude call HeyGen's pipeline directly without the web app. HeyGen's Soul Avatar feature creates a persistent digital twin from your footage that maintains consistent appearance across all generated videos. The limitation is format — HeyGen primarily produces single-shot talking head content, not multi-shot product ads or cinematic B-roll footage.

Best for: Training videos, sales presentations, multilingual content, corporate communications with a consistent virtual presenter.

Higgsfield: Soul ID for Character Consistency

Higgsfield differentiates through Soul ID — a persistent face model trained from 5-20 photos that maintains the same character's appearance across every generation. Unlike one-off face swaps, Soul ID creates a reusable identity that works across both image and video output. The skill supports Seedance, Kling, and Veo models, and uses the MCSLA prompt formula (Model, Camera, Subject, Look, Action) for structured generation. Combined with 17 production templates and genre-specific recipes, Higgsfield targets creators building serialized content — recurring brand characters, virtual influencers, or episodic social media series.

Best for: Content series with recurring AI characters, virtual influencer pipelines, brand ambassador consistency across campaigns.

digitalsamba Video Toolkit: Open-Source Self-Hosted Stack

digitalsamba's claude-code-video-toolkit (573 GitHub stars) bundles skills, commands, and templates into a workspace with cloud GPU deployment on Modal and RunPod. It includes open-source models for voiceover (Qwen3-TTS), image generation (FLUX.2), and music (ACE-Step). The /setup wizard handles cloud configuration, voice selection, and file transfer via Cloudflare R2. For teams that want full infrastructure control and zero vendor lock-in, this is the only option where you own the entire pipeline. The tradeoff: significant setup complexity — you manage cloud GPU instances, deployments, and infrastructure, and open-source models may not match commercial alternatives like Seedance 2 or Veo 3.1 in output quality.

Best for: Teams with GPU infrastructure experience who want complete control and no recurring SaaS fees.

The Auto Model Selection Difference

This is the part most comparisons miss, and it matters more than raw model count.

inference.sh gives you access to 40+ models, but you choose which one to use for every generation. That means you need to know: Seedance 2 excels at portrait motion, Kling 3.0 handles spatial composition, Veo 3.1 is strongest for text rendering. Pick wrong, and you wait 1-3 minutes for a subpar clip, then start over with a different model and a rewritten prompt. In my testing, this research-and-prompt cycle added 15-20 minutes per video on top of actual generation time.

Pexo's auto model selection eliminates that cycle entirely. Describe the video, and the pipeline analyzes each shot — scene type, motion complexity, framing requirements — then routes to the optimal model automatically. Portrait scenes go to Seedance 2, wide-angle product shots to Kling 3.0, text-overlay sequences to Veo 3.1. Different shots in the same video can use different models, and you never have to think about which model handles what. Based on production benchmarks, auto selection delivers 73% faster turnaround compared to manual model selection workflows.

For developers building one-off experiments, manual selection gives you more control. For anyone producing videos at scale or without deep knowledge of each model's strengths, auto selection is a significant workflow improvement.

Head-to-Head: Real Workflow Comparisons

Workflow 1: Generate a Product Demo from a URL

I pasted the same Shopify product URL into every tool that supports URL input.

Pexo extracted product images, title, and description from the URL automatically. Generated a 3-shot product video with transitions, AI-generated music, and text overlays in 9 minutes. Output was a finished MP4 ready for upload — no editing required.

inference.sh doesn't accept URL input. I had to manually download the product image, write a model-specific prompt, choose between Seedance/Kling/Veo, and generate a raw 5-second clip in 2 minutes. To match Pexo's output, I'd need 3 separate generations plus manual editing for transitions and music.

Remotion can't generate AI footage, but Claude wrote React code to animate the product image with zoom effects and text overlays. Output looked clean but synthetic — no photorealistic product shots. Took 15 minutes including code debugging.

Verdict: Pexo is the only skill that goes from URL to finished video in a single step. If your workflow starts with a product page, this saves the most dev time.

Workflow 2: Batch-Generate 5 Videos

I tested producing 5 product videos from different source URLs.

Pexo accepted 5 URLs and generated 5 unique finished videos. Each video used different models based on the product type — apparel got different treatment than electronics. Total pipeline time: ~40 minutes for all 5.

inference.sh required 15 separate generations (3 shots x 5 products), each with manual model selection and prompting, plus post-production editing. Estimated total: 2+ hours not counting edit time.

Remotion would require writing and debugging 5 separate React compositions. Feasible if the structure is templated, but initial template development adds significant upfront time.

Verdict: For batch workflows, Pexo's pipeline approach scales linearly without additional dev effort per video.

Workflow 3: Animated Data Dashboard

I needed an animated chart showing monthly metrics with smooth transitions.

Remotion dominated. Claude wrote React components with animated bar charts, easing functions, exact brand hex codes, and smooth data-point transitions. Output was pixel-perfect and fully deterministic — change the data, re-render, get identical animation quality. No other tool comes close for this use case.

All AI-based tools (Pexo, inference.sh, HeyGen, Higgsfield) are not designed for precise data visualization. AI video models generate photorealistic footage, not pixel-accurate charts.

Verdict: For data viz, Remotion has no competition. It's the only tool that gives you programmatic control over every frame.

Combining Multiple Skills

You don't have to pick just one. Skills operate independently within Claude Code, and switching between them mid-session is seamless.

Common dev stacks I've seen work well:

Pexo + Remotion: Pexo handles product footage and marketing content, Remotion handles data visualizations and branded animations. Good for teams that produce both marketing and internal reporting content.
Pexo + HeyGen: Pexo generates the product footage segments, HeyGen adds talking head intros/outros with a consistent avatar presenter. Works well for product walkthrough videos.
inference.sh + Remotion: inference.sh for experimenting with raw AI model output, Remotion for production-quality programmatic content. Maximum control, maximum dev effort.

Just tell Claude which tool to use for each task in your prompt, and it switches context automatically.

FAQ

Which Claude Code video skill has the lowest setup friction?

Pexo has the lowest barrier to entry — no prompt engineering, no model selection knowledge, no code required. Describe what you want in plain language or paste a product URL, and the pipeline handles everything automatically. HeyGen is the second easiest: install the skill, add your API key, describe the avatar you want. Remotion and inference.sh both require more technical knowledge — React for Remotion, model-specific prompt expertise for inference.sh. The Video Toolkit has the highest setup friction due to cloud GPU configuration.

Can Claude Code generate videos from just a product URL?

Pexo is currently the only Claude Code video skill that supports direct URL-to-video generation. Paste any Shopify, Amazon, or product page URL, and Pexo's pipeline extracts product images, titles, descriptions, and pricing to generate a finished multi-shot video automatically. Other skills require you to manually download assets and write prompts, adding 10-15 minutes of prep work per video to your workflow.

How many AI video models can I access through Claude Code?

inference.sh provides the widest model selection with 40+ options including Google Veo 3.1, Seedance, Kling, Sora, and WAN 2.5 — but you must manually select which model to use for each generation. Pexo supports 10+ models with automatic selection based on shot requirements, routing each scene to the optimal model without developer input. Higgsfield supports 3 models (Seedance, Kling, Veo). HeyGen uses its proprietary model. Remotion uses no AI models — it renders React code into video.

What's the actual generation time for each skill?

Generation times vary significantly by skill architecture. inference.sh produces a raw single clip in 1-3 minutes — fastest for a single shot, but you need multiple clips plus editing for a finished video. HeyGen generates a talking head in 2-5 minutes. Pexo produces a finished 15-second, 3-shot video in 8-10 minutes, including script writing, multi-model rendering, AI music, and compositing. Remotion depends on code complexity — simple animations take 5-10 minutes, complex data visualizations can take 15-20 minutes for Claude to write and debug.

Is there a free or open-source option?

Remotion's skill is free to install, though the Remotion renderer requires a commercial license for production use. digitalsamba's Video Toolkit is fully open source but requires you to set up and pay for cloud GPU instances on Modal or RunPod. Pexo, HeyGen, inference.sh, and Higgsfield all offer free tiers or trial credits with paid plans for production volume. For pure open-source with zero vendor lock-in, the Video Toolkit is the only option — but factor in cloud GPU costs and setup time.

Which skill gives the most control over the output?

Remotion gives the most granular control — you write React code that controls every pixel in every frame, making it fully deterministic and reproducible. inference.sh gives the most control over AI generation specifically — you choose the model, write the prompt, set parameters. Pexo optimizes for output quality over manual control, making production decisions automatically through its pipeline. HeyGen and Higgsfield sit in between, offering template and parameter controls without full pipeline access.

Can I use these skills in a CI/CD pipeline?

Remotion integrates most naturally with CI/CD since it's code-based — your video is defined in React, version-controlled, and renderable in any environment with Node.js. Pexo and inference.sh can be triggered programmatically through their APIs. HeyGen's Video Agent API supports automation workflows. The Video Toolkit's Modal/RunPod deployment can be integrated with CI triggers. Higgsfield's automation capabilities depend on their API access tier.

DEV Community