TL;DR
HeyGen open-sourced HyperFrames under Apache 2.0 in 2026. Instead of programmable video via React components (like Remotion), you write plain HTML with data-* attributes and GSAP timelines. The design goal is explicit: AI coding agents are the primary users, not humans.
npx skills add heygen-com/hyperframes
This single command installs five slash commands into Claude Code / Cursor / Codex / Gemini CLI and turns your agent into a video editor.
Why Another Video Framework?
The homepage headline is the thesis statement: "Now Claude Code can edit videos."
Content automation pipelines have agent-friendly tools for research, writing, and image generation. Video was the missing piece. The question HyperFrames answers is: "What abstraction level do AI agents handle best?"
The answer, according to HeyGen: HTML. Not JSX, not imperative timeline APIs, just HTML.
The Core Primitive
<div id="root" data-composition-id="root"
data-start="0" data-width="1920" data-height="1080">
<video id="clip-1" data-start="0" data-duration="5" data-track-index="0"
src="intro.mp4" muted playsinline></video>
<img id="overlay" class="clip" data-start="2" data-duration="3"
data-track-index="1" src="logo.png" />
<audio id="bg-music" data-start="0" data-duration="9" data-track-index="2"
data-volume="0.5" src="music.wav"></audio>
</div>
That is the full mental model. Four clip types:
-
<video>— must bemuted -
<img>— static visuals -
<audio>— separated from video -
<div data-composition-id>— nested compositions
Five required attributes cover timing, layering, and optional volume. A class="clip" tells the framework to honor the data-start/data-duration window.
Determinism Is Non-Negotiable
One of the seven official "must follow" rules caught my eye:
Math.random() is forbidden. If you need randomness, use a seeded PRNG like
mulberry32.
That level of commitment to determinism is rare in video tooling. The reasoning is clear: agent-driven pipelines need the same input to produce identical bytes every time, otherwise you cannot put rendering in CI.
Other non-negotiables:
- Every timeline must register into
window.__timelines -
<video>elements must bemuted(audio goes into<audio>tags) - GSAP timeline construction must be synchronous (no
async/await/fetch) - Timed elements require
class="clip" - Never call
video.play()oraudio.currentTimefrom scripts — the framework owns media control - Every scene needs an entrance animation
- Scenes need transitions between them
Natural Language → Technical Mapping
The prompting guide includes a mapping table that does most of the work:
| Natural Language | GSAP Easing |
|---|---|
| smooth | power2.out |
| snappy | power4.out |
| bouncy | back.out |
| springy | elastic.out |
| dramatic | expo.out |
| dreamy | sine.inOut |
The same approach for caption tones maps "Hype / Corporate / Tutorial / Storytelling / Social" to specific font weights, entrance animations, and size ranges. The user describes a feeling; the framework resolves to technique.
Two Prompt Modes
Cold Start
10-second product intro, fade-in title, dark background, BGM, corporate mood
Recommended structure:
- Duration
- Aspect ratio (16:9 / 9:16 / 1:1)
- Mood (energetic / calm / premium / playful)
- Key elements
Warm Start
This is where HyperFrames shines:
Turn this GitHub repo into a 45-second pitch video
Turn this PDF into a 30-second summary video
The agent handles both research and production in a single prompt. The /website-to-hyperframes slash command is a first-class pipeline for URL → video.
Common Mistakes (The Debugging Cheat Sheet)
From the official Common Mistakes doc, here are the failure modes I would not have guessed:
1. Animating video element dimensions
// ❌ Freezes frame rendering
gsap.to('#video1', { width: 1920, duration: 1 });
// ✅ Animate a wrapper div
gsap.to('.video-wrapper', { width: 1920, duration: 1 });
2. Timeline shorter than video
// Extend timeline with a zero-duration set
tl.set({}, {}, 283);
3. Oversized images
A 7000×5000 PNG causes ~140MB decode per frame. Keep images at 2× canvas size max.
4. Backdrop-filter stacks
16 layers of backdrop-filter: blur() calculated every frame will kill render performance. Cap at 2-3 layers.
Architecture
Monorepo with clean separation:
| Package | Responsibility |
|---|---|
hyperframes |
CLI (create, preview, lint, render) |
@hyperframes/core |
Types, parser, linter, runtime, frame adapter |
@hyperframes/engine |
Page → video capture (Puppeteer + FFmpeg) |
@hyperframes/producer |
Full pipeline (capture + encode + audio mix) |
@hyperframes/studio |
Browser-based composition editor |
@hyperframes/player |
Embeddable <hyperframes-player> web component |
@hyperframes/shader-transitions |
WebGL shader transitions |
The Frame Adapter pattern is the extensibility story. Adapters can bring GSAP, Lottie, CSS animations, or Three.js into the render pipeline. First-mover adapters will probably shape the ecosystem.
TTS Is Built-In
Kokoro TTS runs locally, no API key required:
npx hyperframes tts --text "Hello world" --voice af_heart --output narration.wav
Recommended voices by use case:
- Product demos:
af_heart,af_nova - Tutorials:
am_adam,bf_emma - Marketing:
af_sky,am_michael
The Component Registry
Over 50 blocks are registered and installable via CLI:
npx hyperframes add flash-through-white
npx hyperframes add instagram-follow
npx hyperframes add data-chart
Categories include social overlays, shader transitions, data visualizations, and cinematic effects.
Workflow I Would Adopt
-
npx hyperframes init my-video(installs skill automatically) - Open in Claude Code / Cursor / Codex
-
/hyperframeswith a warm start prompt pointing to source material -
npx hyperframes previewfor browser live reload - Small, targeted follow-up prompts: "make the title 2x larger", "add a fade-out at the end"
-
npx hyperframes lintto catch structural issues npx hyperframes render --preset high --output final.mp4
Anti-Patterns to Avoid
From the prompting guide:
- Asking for React/Vue components — adds a translation layer
- Requesting 4K/60fps — 1920×1080 30fps is the sweet spot for speed
- Skipping the slash command — the agent will fall back to generic HTML video conventions
- Giant monolithic prompts — targeted, iterative edits beat one-shot mega-prompts
Requirements
- Node.js 22+
- FFmpeg
That is the entire system requirement list.
Why This Matters
The design signals a specific bet: the future of content tooling is agent-primary, human-secondary. Most frameworks treat agent support as a retrofit. HyperFrames treats it as the foundational design constraint. Whether that bet pays off or not, the engineering choices (HTML-first, deterministic rendering, slash command integration) are worth studying regardless of which tool you end up using.
Links
- Homepage: https://hyperframes.heygen.com/
- Prompting guide: https://hyperframes.heygen.com/guides/prompting
- Compositions concept: https://hyperframes.heygen.com/concepts/compositions
- Common mistakes: https://hyperframes.heygen.com/guides/common-mistakes
- GitHub: https://github.com/heygen-com/hyperframes
Top comments (0)