TL;DR
AI agents can write code, call APIs, and run multi-step workflows. Until now, one capability kept eluding them: editing video. Professional tools like After Effects and DaVinci Resolve use layered timelines and JSON scene graphs that LLMs weren’t trained on. HeyGen’s open-source project, HyperFrames, flips the approach: agents compose video with HTML, CSS, and JavaScript, then render the result to MP4, MOV, or WebM. Install it as a Claude Code skill with one command, and your agent becomes a video editor.
Introduction
Video is one of the most engaging formats on the web, but it has been harder for AI agents to produce than text, code, images, or charts.
Prompt-to-video tools like Sora, Veo, and Runway can generate a full clip from text, but the output is usually monolithic. You can’t easily compose scenes, iterate on motion graphics, overlay precise brand animations, or ask the agent to “redo scene 3 with a slower fade.”
HeyGen shipped HyperFrames on April 17, 2026 to close this gap. Instead of teaching agents traditional video software, HyperFrames gives them a format they already know: HTML.
This guide covers:
- Why traditional video editing is hard for agents
- How HyperFrames maps HTML to video timelines
- How to install and render your first composition
- Where API testing and orchestration with Apidog fit into the workflow
Why AI agents couldn’t edit video before
Traditional video tools were built for people clicking on timelines, not agents generating source code.
The main blockers:
- Timeline UIs don’t map cleanly to code
Tools like After Effects, Premiere, and DaVinci Resolve use proprietary project formats or deeply nested scene graphs. Even if an agent can inspect those files, there is very little public training data for models to learn the structure.
- Motion graphics require visual composition
Keyframes, easing, layer blending, and transitions are usually tuned by eye. Agents need a text-first representation they can reason about, modify, diff, and regenerate.
- Automation APIs are limited
Scripting tools like ExtendScript can automate some editor workflows, but they are narrow and fragile compared with a normal web development stack.
The result: agents could call ffmpeg, stitch clips, and add basic overlays. Anything more advanced usually required a human editor.
The HTML-for-video insight
HeyGen’s core observation is practical: LLMs already know the web stack.
Models have seen massive amounts of HTML, CSS, JavaScript, SVG, Canvas, GSAP, Lottie, and animation examples. If you ask a strong model to create an animated landing page section, it can usually write working front-end code.
That means agents already know how to:
- Position elements with CSS
- Animate with CSS keyframes or GSAP
- Render SVG paths
- Layer scenes with
z-indexand opacity - Tween between visual states
- Use Canvas, D3, Three.js, and Lottie
HyperFrames uses those browser primitives as the authoring format, then renders them into video frames.
How HyperFrames works
HyperFrames adds timeline metadata to normal HTML using data-* attributes.
| Attribute | Purpose |
|---|---|
data-composition-id |
Unique ID for the video composition |
data-width / data-height
|
Output resolution in pixels |
data-start |
Scene start time in seconds |
data-duration |
Scene duration in seconds |
data-track-index |
Layering order for overlapping scenes |
The workflow is:
- The agent writes a normal HTML file.
- HyperFrames reads the timeline attributes.
- It runs the page in a headless browser.
- It captures frames at the target frame rate.
- It encodes the output with FFmpeg.
There is no new DSL, proprietary scene graph, or keyframe editor. Your animation logic can stay in GSAP, CSS animations, SVG, Canvas, or other browser-native tools.
A minimal HyperFrames composition
Here is a 5-second video composition with two scenes:
- Scene 1: title card fades in
- Scene 2: closing card appears through a blur crossfade
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
<style>
body {
margin: 0;
width: 1920px;
height: 1080px;
overflow: hidden;
background: #0D1B2A;
font-family: system-ui, sans-serif;
color: white;
}
.scene {
position: absolute;
inset: 0;
width: 1920px;
height: 1080px;
overflow: hidden;
background: #0D1B2A;
}
#scene2 {
z-index: 2;
opacity: 0;
}
.s1 {
display: flex;
flex-direction: column;
justify-content: center;
padding: 120px 160px;
gap: 20px;
}
.s2 {
display: flex;
flex-direction: column;
justify-content: center;
align-items: center;
padding: 100px 160px;
gap: 32px;
}
.s1-title,
.s2-title {
font-size: 96px;
font-weight: 800;
letter-spacing: -0.04em;
}
.s1-sub {
font-size: 40px;
opacity: 0.8;
}
</style>
</head>
<body>
<div
id="root"
data-composition-id="hyperframes-intro"
data-width="1920"
data-height="1080"
data-start="0"
data-duration="5"
>
<div id="scene1" class="scene">
<div class="s1">
<div class="s1-title">HTML is Video</div>
<div class="s1-sub">Compose. Animate. Render.</div>
</div>
</div>
<div id="scene2" class="scene">
<div class="s2">
<div class="s2-title">Start composing.</div>
</div>
</div>
</div>
<script>
window.__timelines = window.__timelines || {};
const tl = gsap.timeline({ paused: true });
// Scene 1: title entrance
tl.from(".s1-title", {
x: -40,
opacity: 0,
duration: 0.5,
ease: "power3.out"
}, 0.25);
tl.from(".s1-sub", {
y: 15,
opacity: 0,
duration: 0.4,
ease: "power2.out"
}, 0.5);
// Blur crossfade transition
const T = 2.2;
tl.to("#scene1", {
filter: "blur(8px)",
scale: 1.03,
opacity: 0,
duration: 0.35,
ease: "power2.inOut"
}, T);
tl.fromTo("#scene2",
{
filter: "blur(8px)",
scale: 0.97,
opacity: 0
},
{
filter: "blur(0px)",
scale: 1,
opacity: 1,
duration: 0.35,
ease: "power2.inOut"
},
T + 0.08
);
window.__timelines["hyperframes-intro"] = tl;
</script>
</body>
</html>
Two implementation details matter:
- The animation logic is plain GSAP.
- The HyperFrames-specific part is only the
data-*metadata on the root element.
To customize it, edit the HTML like any other front-end file:
- Change copy
- Swap colors
- Add a logo
- Use a web font
- Add SVG illustrations
- Add more scenes
- Adjust animation timing
What agents can use inside a composition
Because HyperFrames renders through a browser, your agent can use standard web technologies:
- CSS animations and transitions
- GSAP timelines
- SVG shapes and path animations
- Canvas animations
- Three.js scenes
- D3.js visualizations
- Lottie animations
- Google Fonts or custom web fonts
- Background videos with
<video> - Images with
<img>
There is no separate plugin architecture for the agent to learn. It writes browser code and HyperFrames renders it.
Install HyperFrames with Claude Code
HyperFrames ships as a Claude Code skill.
If you use Claude Code, install it with:
npx skills add heygen-com/hyperframes
This fetches the skill from HeyGen’s GitHub repository, installs the toolchain, and registers video editing as an agent capability.
Then prompt your agent like this:
Build me a 10-second product explainer video for a new API.
Use a dark gradient background. Start with the product name sliding up
from the bottom with a fade. Then show three bullet points with icons.
End on a call-to-action card.
The agent can then:
- Generate the HTML composition
- Preview it locally
- Revise timing and styling
- Render the final MP4
According to the original workflow, this runs locally without API keys or external rendering services.
Set up HyperFrames without Claude Code
HyperFrames is framework-agnostic. Any agent that can run shell commands and edit files can use it.
Clone and install:
git clone https://github.com/heygen-com/hyperframes
cd hyperframes
npm install
Render a composition:
npx hyperframes render my-video.html --output my-video.mp4
Preview a composition:
npx hyperframes preview my-video.html
Use preview before rendering final output. It lets you inspect timing, scene transitions, and frame-level behavior before encoding the full video.
Practical agent workflow
A useful development loop looks like this:
- Generate a storyboard
Ask the agent for a scene-by-scene plan:
Create a 15-second video storyboard for a product release note.
Include scene duration, visual layout, motion, and copy.
- Generate the first HTML composition
Ask for one HTML file using HyperFrames metadata and GSAP timelines.
- Preview locally
npx hyperframes preview release-video.html
- Iterate on specific problems
Instead of asking for broad changes, give targeted instructions:
Make scene 2 last 1 second longer.
Slow the bullet point entrance animation.
Keep the same colors and layout.
- Render the final output
npx hyperframes render release-video.html --output release-video.mp4
- Post-process if needed
Use FFmpeg for advanced audio mixing, compression, or format conversion.
Developer use cases
HyperFrames is most useful when video generation is part of an automated workflow.
Automated product marketing
An agent can read release notes, generate a short product teaser, render it, and ship the MP4 to your CDN.
Personalized video responses
A webhook can trigger an agent to generate a video for a user-specific event, such as:
- Welcome messages
- Receipts
- Milestone celebrations
- Renewal reminders
Data storytelling
An agent can take metrics, generate D3 visualizations, wrap them in HyperFrames scenes, and render a narrated dashboard summary.
Podcast and long-form content graphics
An agent can read a transcript, identify key points, and generate animated B-roll or motion graphics to layer over audio.
API documentation videos
An agent can parse an OpenAPI spec and generate endpoint walkthroughs with animated request/response diagrams.
Testing the orchestration layer with Apidog
HyperFrames handles rendering. The rest of the system is orchestration:
- Agent loops
- LLM API requests
- Tool calls
- Asset retrieval
- Webhooks
- Brand kit lookups
- Render job triggers
- Error handling
That is where production failures usually happen. A malformed tool payload, timed-out LLM request, incorrect tool_use_id, or mismatched message schema can break the pipeline before HyperFrames renders a single frame.
Apidog helps test those API-driven parts.
Mock LLM endpoints
Create dummy Claude or OpenAI-compatible endpoints in Apidog with the schema your agent expects.
Use mocks to test:
- Valid responses
- Malformed responses
- Delayed responses
- Empty tool calls
- Unexpected message structures
This lets you validate pipeline behavior before spending real LLM tokens.
Validate tool-use payloads
If your agent calls APIs for assets, stock footage, user data, or brand kits, define those endpoints in Apidog and test the request/response contracts.
Check that the agent sends:
- Required fields
- Correct JSON structure
- Valid IDs
- Expected headers
- Proper authentication format
Track token usage
Large video compositions can include long prompts, CSS, scene descriptions, and JavaScript timelines.
The original article notes that Claude Opus 4.7 uses a tokenizer that can produce up to 35% more tokens than Opus 4.6. For agent-generated video, token sizing matters because HTML and animation code can grow quickly.
Use Apidog’s usage tracking to estimate prompt size and avoid cost surprises.
Replay multi-turn agent flows
A full video generation flow often takes several turns:
- Plan the video
- Generate scenes
- Write HTML
- Preview
- Fix layout
- Adjust animation timing
- Render
- Finalize output
Apidog can help replay and inspect those API conversations when the agent goes off track.
Why HTML is a strong video format for agents
HeyGen’s broader argument is that HTML is not just convenient for agent-generated video. It may be a better interchange format for many video workflows.
Compared with proprietary video project files, HTML is:
- Open
- Versionable
- Searchable
- Editable with standard tools
- Familiar to LLMs
- Compatible with the browser runtime
That means HTML-based video compositions can be:
Diffable in Git
You can review exactly what changed between revisions.Componentized
A title card can become a reusable component. A motion graphic can become an importable module.Responsive
The same composition can be adapted for 1080p, 4K, or vertical 9:16 layouts.Accessible
Text and semantic structure exist in source form instead of being flattened into pixels.Searchable
Text inside the video starts as actual text, not OCR output.
HyperFrames bridges browser-native content and rendered video output.
Limitations to plan for
HyperFrames is version 1, so plan around these constraints:
Render speed depends on complexity
A simple GSAP text animation renders faster than a composition with Three.js particles or Canvas shaders.Live video input is limited
You can embed<video>tags, but real-time camera feeds or streaming sources need additional glue code.Audio support is basic
You can add audio tracks, but advanced mixing, ducking, EQ, or noise reduction still requires FFmpeg post-processing.Output quality depends on the model
The original article notes that Opus 4.6 and Gemini 3 were the first models to produce consistent, aesthetically strong output from plain prompts, and that Opus 4.7 is currently best for this workflow.
These are not blockers, but they matter if you are building a production pipeline.
Getting started checklist
Use this checklist to try HyperFrames:
- [ ] Install Claude Code or prepare another agent that can run shell commands
- [ ] Run
npx skills add heygen-com/hyperframes - [ ] Ask the agent to create a simple 5-second video
- [ ] Preview the generated HTML
- [ ] Render the MP4
- [ ] Iterate on layout, copy, timing, and scene count
- [ ] For API-driven workflows, define LLM and tool endpoints in Apidog
- [ ] Build one real video, such as a product teaser, data story, or release note summary
- [ ] Star the GitHub repo at
github.com/heygen-com/hyperframes
Conclusion
HyperFrames gives AI agents a practical way to edit video by using the stack they already understand: HTML, CSS, and JavaScript.
Instead of forcing agents into timeline-based desktop tools, it turns browser-native compositions into rendered video files. That makes video generation easier to automate, inspect, version, and integrate into developer workflows.
If your system generates marketing videos, personalized clips, documentation walkthroughs, or data stories, HyperFrames can handle the composition and render step. For the surrounding API workflow, test your agent conversations, tool calls, and LLM requests with Apidog before scaling.

Top comments (0)