DEV Community

정상록
정상록

Posted on

HeyGen HyperFrames: An Open-Source Video Framework Built for AI Agents (Not Humans)

TL;DR

HeyGen open-sourced HyperFrames under Apache 2.0 in 2026. Instead of programmable video via React components (like Remotion), you write plain HTML with data-* attributes and GSAP timelines. The design goal is explicit: AI coding agents are the primary users, not humans.

npx skills add heygen-com/hyperframes
Enter fullscreen mode Exit fullscreen mode

This single command installs five slash commands into Claude Code / Cursor / Codex / Gemini CLI and turns your agent into a video editor.

Why Another Video Framework?

The homepage headline is the thesis statement: "Now Claude Code can edit videos."

Content automation pipelines have agent-friendly tools for research, writing, and image generation. Video was the missing piece. The question HyperFrames answers is: "What abstraction level do AI agents handle best?"

The answer, according to HeyGen: HTML. Not JSX, not imperative timeline APIs, just HTML.

The Core Primitive

<div id="root" data-composition-id="root"
     data-start="0" data-width="1920" data-height="1080">
  <video id="clip-1" data-start="0" data-duration="5" data-track-index="0"
         src="intro.mp4" muted playsinline></video>
  <img id="overlay" class="clip" data-start="2" data-duration="3"
       data-track-index="1" src="logo.png" />
  <audio id="bg-music" data-start="0" data-duration="9" data-track-index="2"
         data-volume="0.5" src="music.wav"></audio>
</div>
Enter fullscreen mode Exit fullscreen mode

That is the full mental model. Four clip types:

  • <video> — must be muted
  • <img> — static visuals
  • <audio> — separated from video
  • <div data-composition-id> — nested compositions

Five required attributes cover timing, layering, and optional volume. A class="clip" tells the framework to honor the data-start/data-duration window.

Determinism Is Non-Negotiable

One of the seven official "must follow" rules caught my eye:

Math.random() is forbidden. If you need randomness, use a seeded PRNG like mulberry32.

That level of commitment to determinism is rare in video tooling. The reasoning is clear: agent-driven pipelines need the same input to produce identical bytes every time, otherwise you cannot put rendering in CI.

Other non-negotiables:

  1. Every timeline must register into window.__timelines
  2. <video> elements must be muted (audio goes into <audio> tags)
  3. GSAP timeline construction must be synchronous (no async/await/fetch)
  4. Timed elements require class="clip"
  5. Never call video.play() or audio.currentTime from scripts — the framework owns media control
  6. Every scene needs an entrance animation
  7. Scenes need transitions between them

Natural Language → Technical Mapping

The prompting guide includes a mapping table that does most of the work:

Natural Language GSAP Easing
smooth power2.out
snappy power4.out
bouncy back.out
springy elastic.out
dramatic expo.out
dreamy sine.inOut

The same approach for caption tones maps "Hype / Corporate / Tutorial / Storytelling / Social" to specific font weights, entrance animations, and size ranges. The user describes a feeling; the framework resolves to technique.

Two Prompt Modes

Cold Start

10-second product intro, fade-in title, dark background, BGM, corporate mood
Enter fullscreen mode Exit fullscreen mode

Recommended structure:

  • Duration
  • Aspect ratio (16:9 / 9:16 / 1:1)
  • Mood (energetic / calm / premium / playful)
  • Key elements

Warm Start

This is where HyperFrames shines:

Turn this GitHub repo into a 45-second pitch video
Turn this PDF into a 30-second summary video
Enter fullscreen mode Exit fullscreen mode

The agent handles both research and production in a single prompt. The /website-to-hyperframes slash command is a first-class pipeline for URL → video.

Common Mistakes (The Debugging Cheat Sheet)

From the official Common Mistakes doc, here are the failure modes I would not have guessed:

1. Animating video element dimensions

// ❌ Freezes frame rendering
gsap.to('#video1', { width: 1920, duration: 1 });
Enter fullscreen mode Exit fullscreen mode
// ✅ Animate a wrapper div
gsap.to('.video-wrapper', { width: 1920, duration: 1 });
Enter fullscreen mode Exit fullscreen mode

2. Timeline shorter than video

// Extend timeline with a zero-duration set
tl.set({}, {}, 283);
Enter fullscreen mode Exit fullscreen mode

3. Oversized images

A 7000×5000 PNG causes ~140MB decode per frame. Keep images at 2× canvas size max.

4. Backdrop-filter stacks

16 layers of backdrop-filter: blur() calculated every frame will kill render performance. Cap at 2-3 layers.

Architecture

Monorepo with clean separation:

Package Responsibility
hyperframes CLI (create, preview, lint, render)
@hyperframes/core Types, parser, linter, runtime, frame adapter
@hyperframes/engine Page → video capture (Puppeteer + FFmpeg)
@hyperframes/producer Full pipeline (capture + encode + audio mix)
@hyperframes/studio Browser-based composition editor
@hyperframes/player Embeddable <hyperframes-player> web component
@hyperframes/shader-transitions WebGL shader transitions

The Frame Adapter pattern is the extensibility story. Adapters can bring GSAP, Lottie, CSS animations, or Three.js into the render pipeline. First-mover adapters will probably shape the ecosystem.

TTS Is Built-In

Kokoro TTS runs locally, no API key required:

npx hyperframes tts --text "Hello world" --voice af_heart --output narration.wav
Enter fullscreen mode Exit fullscreen mode

Recommended voices by use case:

  • Product demos: af_heart, af_nova
  • Tutorials: am_adam, bf_emma
  • Marketing: af_sky, am_michael

The Component Registry

Over 50 blocks are registered and installable via CLI:

npx hyperframes add flash-through-white
npx hyperframes add instagram-follow
npx hyperframes add data-chart
Enter fullscreen mode Exit fullscreen mode

Categories include social overlays, shader transitions, data visualizations, and cinematic effects.

Workflow I Would Adopt

  1. npx hyperframes init my-video (installs skill automatically)
  2. Open in Claude Code / Cursor / Codex
  3. /hyperframes with a warm start prompt pointing to source material
  4. npx hyperframes preview for browser live reload
  5. Small, targeted follow-up prompts: "make the title 2x larger", "add a fade-out at the end"
  6. npx hyperframes lint to catch structural issues
  7. npx hyperframes render --preset high --output final.mp4

Anti-Patterns to Avoid

From the prompting guide:

  • Asking for React/Vue components — adds a translation layer
  • Requesting 4K/60fps — 1920×1080 30fps is the sweet spot for speed
  • Skipping the slash command — the agent will fall back to generic HTML video conventions
  • Giant monolithic prompts — targeted, iterative edits beat one-shot mega-prompts

Requirements

  • Node.js 22+
  • FFmpeg

That is the entire system requirement list.

Why This Matters

The design signals a specific bet: the future of content tooling is agent-primary, human-secondary. Most frameworks treat agent support as a retrofit. HyperFrames treats it as the foundational design constraint. Whether that bet pays off or not, the engineering choices (HTML-first, deterministic rendering, slash command integration) are worth studying regardless of which tool you end up using.

Links

Top comments (0)