Wanda

Posted on Apr 17 • Originally published at apidog.com

How to Edit Video with an AI Agent Using HyperFrames

TL;DR

AI agents can now write code, call APIs, and run multi-step workflows—but video editing was out of reach. Professional video editors like After Effects and DaVinci Resolve use complex, non-standard formats that LLMs don’t understand. HeyGen’s open-source HyperFrames project closes this gap: it lets AI agents compose videos using HTML, CSS, and JavaScript, then renders output as MP4, MOV, or WebM. Install it as a Claude Code skill and your agent becomes a video editor.

Try Apidog today

Introduction

Video is the most engaging format online, yet lacked an accessible toolchain for AI agents. While Sora, Veo, and Runway let you create an entire video from a prompt, you can’t iterate on scenes, tweak motion graphics, or refine specific transitions. You’re stuck with a single, uneditable output.

HeyGen launched HyperFrames on April 17, 2026, to solve this. Instead of teaching agents legacy video tools, HyperFrames lets agents use HTML. This guide explains how HyperFrames works, why it makes sense, and how to let your own agent automate video editing.

If you’re building API-driven agent workflows that produce video, you’ll also want to test orchestration—see how Apidog fits in at the end.

Why AI Agents Couldn’t Edit Video Before

Traditional video editing software is built for humans, not code. Here’s why:

Timeline UIs don’t map to code. Tools like After Effects and DaVinci Resolve save projects as proprietary binaries or deeply nested JSON. LLMs have almost zero training data on these formats.
Motion graphics require visual reasoning. Animating, layering, and compositing depends on visual intuition—agents need a text-based abstraction to reason about these.
Human-centric tooling. Render pipelines and plugin systems hide behind GUIs. Automation is limited and fragile.

Result: agents could script ffmpeg for basic tasks, but anything beyond simple overlays required human input.

The HTML-for-Video Approach

HeyGen’s team realized LLMs are fluent in HTML, CSS, and JavaScript. They’ve seen countless GSAP animations, SVGs, and browser-based motion graphics in their training data.

When prompted for complex visuals, LLMs can generate HTML and CSS with animations, layering, and transitions. All the building blocks for motion graphics exist in the browser. The missing piece: converting a sequence of HTML scenes into a rendered video.

That’s what HyperFrames does: HTML becomes video frames.

How HyperFrames Works

HyperFrames extends standard HTML with a handful of data- attributes to define the video timeline. The rest is just web code.

Attribute	Purpose
`data-composition-id`	Unique ID for the video composition
`data-width` / `data-height`	Output resolution in pixels
`data-start`	Scene start time in seconds
`data-duration`	Scene duration in seconds
`data-track-index`	Layering order for overlapping scenes

You write a normal HTML file. HyperFrames parses these attributes, runs the page in a headless browser, captures frames, and encodes them as video using FFmpeg. No new DSLs, scene graphs, or keyframe editors—just HTML, CSS, and JavaScript.

Minimal Example

Here’s a 5-second video with two scenes: a title fading in, then a blur crossfade to a closing screen.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
  <style>
    body { margin:0; width:1920px; height:1080px; overflow:hidden; background:#0D1B2A; }
    .scene { position:absolute; inset:0; width:1920px; height:1080px; overflow:hidden; background:#0D1B2A; }
    #scene2 { z-index:2; opacity:0; }
    .s1 { display:flex; flex-direction:column; justify-content:center; padding:120px 160px; gap:20px; }
    .s2 { display:flex; flex-direction:column; justify-content:center; align-items:center; padding:100px 160px; gap:32px; }
  </style>
</head>
<body>
  <div id="root" data-composition-id="hyperframes-intro"
       data-width="1920" data-height="1080" data-start="0" data-duration="5">
    <div id="scene1" class="scene">
      <div class="s1">
        <div class="s1-title">HTML is Video</div>
        <div class="s1-sub">Compose. Animate. Render.</div>
      </div>
    </div>

    <div id="scene2" class="scene">
      <div class="s2-title">Start composing.</div>
    </div>
  </div>
  <script>
    window.__timelines = window.__timelines || {};
    const tl = gsap.timeline({ paused: true });

    // Scene 1: title entrance
    tl.from(".s1-title", { x:-40, opacity:0, duration:0.5, ease:"power3.out" }, 0.25);
    tl.from(".s1-sub", { y:15, opacity:0, duration:0.4, ease:"power2.out" }, 0.5);

    // Blur crossfade transition
    const T = 2.2;
    tl.to("#scene1", { filter:"blur(8px)", scale:1.03, opacity:0, duration:0.35, ease:"power2.inOut" }, T);
    tl.fromTo("#scene2",
      { filter:"blur(8px)", scale:0.97, opacity:0 },
      { filter:"blur(0px)", scale:1, opacity:1, duration:0.35, ease:"power2.inOut" }, T + 0.08);

    window.__timelines["hyperframes-intro"] = tl;
  </script>
</body>
</html>

Key points:

Animation is pure GSAP. Any agent trained on GSAP can build these timelines.
HyperFrames overhead is minimal. Only a few data- attributes on the root element.

Render this file and you get a 1920x1080 MP4. Updates are as simple as editing HTML.

What the Agent Can Use

Because HyperFrames runs a real browser, agents can leverage:

CSS animations and transitions for simple movement
GSAP timelines for advanced choreography
SVG for vector graphics and path animation
Canvas for custom drawing or particles
Three.js for 3D scenes
D3.js for data visualizations
Lottie for After Effects imports
Web fonts from Google Fonts or custom sources
Background images/video via <img> or <video>

No wrappers, no new frameworks—agents use what they already know.

Add Video Editing to Your Agent in One Command

If you use Claude Code, install HyperFrames with:

npx skills add heygen-com/hyperframes

This pulls the skill from GitHub, sets up dependencies, and registers video editing for your agent.

Example prompt:

Build me a 10-second product explainer video for a new API.
Start with a dark gradient background, animate the product name
sliding up from the bottom with a fade, then cut to three
bullet points with icons, end on a call-to-action card.

Your agent writes the HTML, previews it locally, and renders to MP4. No API keys or external services required.

Setting Up Without Claude Code

HyperFrames is framework-agnostic. Any agent that can run shell commands and access files can use it.

Clone and install:

git clone https://github.com/heygen-com/hyperframes
cd hyperframes
npm install

Render a composition:

npx hyperframes render my-video.html --output my-video.mp4

Preview locally:

npx hyperframes preview my-video.html

The preview opens a browser window for scrubbing through the timeline and checking frame accuracy before rendering.

What This Unlocks for Developers

Immediate use cases:

Automated product marketing: Generate release videos from changelogs, render and upload—no human timeline editing.
Personalized video responses: Trigger agents via webhooks to create custom clips for user events—welcome videos, receipts, milestones.
Data storytelling: Feed metrics to agents, generate D3 visualizations wrapped in HyperFrames scenes, and output narrated dashboards.
Dynamic B-roll: Generate motion graphics for podcasts or long-form content, layered over audio.
API documentation videos: Parse OpenAPI specs, generate animated endpoint walkthroughs, and export as shareable videos.

Testing Agent Orchestration with Apidog

HyperFrames handles rendering. Upstream, you need reliable orchestration: agent loops, tool calls, LLM API requests, and logic for picking video content.

Here’s where things often break: malformed payloads, API timeouts, or schema mismatches can halt the video pipeline before rendering starts.

Apidog provides a robust testbed for these scenarios:

Mock LLM endpoints: Build dummy Claude or OpenAI endpoints with exact schemas. Test your pipeline’s error handling before incurring real API costs.
Validate tool-use payloads: If your agent calls external APIs, set up those endpoints in Apidog and chain them into test runs. Verify agent API calls match your requirements.
Track token usage: Claude Opus 4.7 uses a new tokenizer and can generate up to 35% more tokens. Apidog’s usage tracking helps you optimize prompts before costs escalate.
Debug multi-turn flows: Video generation usually takes 5–10 LLM turns. Use Apidog’s replay tools to track where your agent gets stuck or off-track.

The Philosophical Argument

HeyGen argues that HTML isn’t just convenient for agent-generated video—it’s the right format for the future.

Traditional video is locked in proprietary formats. HTML is open, versionable, searchable, and works with every text tool.

HTML-based video means:

Diffable in git: See exactly what changed between versions.
Componentizable: Title cards as React components, motion graphics as modules.
Responsive: Render at any resolution or aspect ratio.
Accessible: Screen readers can parse the source; alt text is built-in.
Searchable: Text remains text, not pixels.

All of this works in browsers today. HyperFrames bridges browser-native content and high-quality video outputs.

Limitations to Know About

HyperFrames is early-stage. Real-world limitations include:

Render speed: Complex scenes (Three.js, Canvas shaders) take longer to encode. Plan accordingly.
Live video input: Embedding <video> tags works, but live feeds or streams require additional code.
Audio support: Basic audio tracks work; advanced mixing still needs FFmpeg post-processing.
Agent creativity: Output quality still depends on the LLM. Opus 4.7 currently gives the best results.

Account for these in production workflows.

Getting Started Checklist

Ready to try HyperFrames?

[ ] Install Claude Code (or your preferred agent)
[ ] Run npx skills add heygen-com/hyperframes
[ ] Prompt your agent for a simple 5-second video
[ ] Render and inspect the MP4 output
[ ] Iterate: tweak style, timing, or scenes
[ ] For API-driven workflows, set up LLM and tool endpoints in Apidog
[ ] Build one real video (product teaser, data story, release summary)
[ ] Star the repo: github.com/heygen-com/hyperframes

Conclusion

AI agents have coded for years, but video editing was the last creative frontier needing humans in the loop. HyperFrames changes that, letting agents compose video with HTML, CSS, and JavaScript.

The approach is simple, flexible, and powerful enough for broadcast-quality graphics. If you need video output—marketing automation, personalized content, data storytelling, or agent-driven docs—HyperFrames should be in your stack.

For testing API and orchestration layers, use Apidog to catch issues before scaling. Failed API calls mean no MP4.

DEV Community