WonderLab

Posted on Jun 22

Open Source Project of the Day (#102): OpenMontage — Turn Your AI Coding Assistant Into a Full Video Production Studio

#ai #opensource #claude #video

Introduction

"12 production pipelines, 52 tools, 500+ agent skills — turn your AI coding assistant into a full video production studio."

This is article #102 in the Open Source Project of the Day series. Today's project is OpenMontage — an open-source agentic video production system that uses Claude Code, Cursor, or Codex as its execution engine, turning natural language descriptions into fully produced videos.

Most AI video tools produce a single clip: enter a prompt, get five seconds of generated footage. OpenMontage's scope is different. It models a complete production team: researcher, scriptwriter, storyboard artist, asset creator, editor, compositor, quality reviewer — each stage has a corresponding agent skill, executed in pipeline order by your AI coding assistant.

The starting point is a sentence in plain language. The ending point is a quality-validated video file. The whole process runs inside your AI coding assistant with no tool switching.

What You'll Learn

OpenMontage's three-layer knowledge architecture: how Tools, Skills, and Pipeline Defs work together
All 12 production pipelines and what they cover
The zero-cost path: what you can produce without spending a dollar
Quality governance design: pre-compose validation, slideshow risk scoring, budget controls
The 7-dimension provider scoring system: how the AI selects which video generation service to use
Reference video analysis: what happens when you paste a YouTube URL

Prerequisites

Experience with Claude Code, Cursor, or a similar AI coding tool
Familiarity with basic video production concepts (script, shots, voiceover, editing)
Basic Python experience

Project Background

What Is OpenMontage?

OpenMontage is an agentic video production system — "turn your AI coding assistant into a complete video production studio."

The problem it addresses isn't "generate a video clip with AI." It's the full end-to-end production pipeline from scratch to delivery. Research, scripting, storyboarding, asset generation, editing, compositing, quality review — in traditional video production these are separate roles; in OpenMontage they're separate agent skills, executed sequentially by your AI assistant.

A second design focus is the "animated stills" problem: most AI video tools produce the visual effect of motion by animating static image frames. OpenMontage can source real motion footage from free archives — Pexels, Pixabay, Archive.org, NASA, Wikimedia Commons — and cut a proper montage from it.

Author

Author: calesthio
Community: GitHub Discussions (Show and Tell, Ideas, Q&A)
License: AGPL-3.0
Video channel: @OpenMontage on YouTube

Project Stats

⭐ GitHub Stars: 9,000+
🍴 Forks: 1,300+
🛠 Production tools: 52
🎬 Pipelines: 12
📚 Agent Skills: 500+
📄 License: AGPL-3.0

Core Features

What It Does

Typical AI video tool:
Prompt → generates a single video clip → user assembles manually

OpenMontage:
"Make a 3-minute explainer about quantum computing"
           ↓
   [Research] → gather background information and facts
           ↓
   [Proposal] → generate production plan with cost estimate
           ↓
   [Script] → complete narration text
           ↓
   [Scene Plan] → visual planning for each scene
           ↓
   [Assets] → generate/source video, images, voiceover, music
           ↓
   [Edit] → assemble timeline
           ↓
   [Compose] → render final video
           ↓
   [Quality Review] → frame validation + audio analysis + delivery check
           ↓
   Complete video file

12 Production Pipelines

Pipeline	Output
Animated Explainer	Research-backed educational animation
Documentary Montage	Real-footage montage in documentary style
Cinematic	Trailers, teasers, mood pieces
Clip Factory	Batch short-form clips from long content
Talking Head	Speaker/presenter video
Avatar Spokesperson	Digital avatar presentation
Localization & Dub	Multi-language translation and dubbing
Screen Demo	Software walkthrough recording
Podcast Repurpose	Audio podcast → video highlight clips
Hybrid	Existing footage + AI-generated content
Animation	Motion graphics, kinetic typography

Zero-Cost Path

No paid APIs required to run a complete video end-to-end:

Component	Zero-cost option
Voiceover / TTS	Piper (offline, free)
Video footage	Pexels, Pixabay, Archive.org, NASA, Wikimedia Commons
Image generation	Stable Diffusion (local)
Video generation	WAN 2.1, Hunyuan, LTX-Video (local GPU)
Compositing	Remotion (React), HyperFrames (HTML/GSAP)
Post-production	FFmpeg

Cost benchmarks when using paid APIs (from the README):

Ghibli-style animation (12 FLUX images + music): $0.15
Pixar-style animated short (6 Kling clips + narration): $1.33
Product ad (OpenAI only): $0.69

Quick Start

Install:

git clone https://github.com/calesthio/OpenMontage.git
cd OpenMontage
make setup

Use in Claude Code:

cd OpenMontage
claude  # open Claude Code

Then describe what you want in plain language:

Make a 2-minute video explaining the early warning signs of Alzheimer's disease.
Target audience: 40–60 year olds with no medical background.
Scientifically accurate but approachable.
Use real medical footage, no AI-generated faces.

The agent generates a production proposal with cost estimate and waits for confirmation before executing anything.

Deep Dive

Three-Layer Knowledge Architecture

OpenMontage separates capabilities and knowledge into three layers:

Layer 1: Execution layer
  tools/          ← 52 Python tools
  pipeline_defs/  ← 12 YAML pipeline definitions
  schemas/        ← 15 JSON Schemas (input/output validation)
  └── Defines "what can be done" and "in what order"

Layer 2: Usage convention layer
  skills/         ← OpenMontage's own operational conventions
  └── Tells the agent how to use this toolset correctly

Layer 3: External technology knowledge layer
  .agents/skills/ ← Deep external technology knowledge
  └── Expert knowledge about FFmpeg, Remotion, provider APIs

500+ agent skills are distributed across layers 2 and 3 — essentially packaging domain expertise in video production into the AI coding assistant. Each skill is a Markdown file containing the professional knowledge, common failure modes, and quality criteria for that specific step.

Rendering Engines: Remotion vs. HyperFrames

OpenMontage supports two compositing engines, each suited to different content types:

Remotion (React-based rendering):

Describes video frames as React components
Best for precision-timed content: subtitles, titles, data visualizations
Stable, predictable output; developers can customize with React syntax

HyperFrames (HTML/GSAP rendering):

Describes video using HTML + GSAP animations
Best for kinetic typography, brand content, web-style visual design
Higher customization flexibility

Both render locally through Node.js, with no external service dependency.

Quality Governance

This is the highest-engineering-content part of OpenMontage:

Pre-compose validation gate: Before rendering begins, the system checks whether production promises are met. Execution is blocked if:

Planned output doesn't match the script content
Scene coverage falls below threshold
Asset quality doesn't meet target specifications

Slideshow Risk Score: Six dimensions evaluate whether a video is too "static" — a pile of images pretending to be video:

Average scene duration
Ratio of motion footage to static content
Camera motion detection
Scene cut frequency
Audio dynamic range
Visual change density

If the score exceeds the threshold, the agent actively sources more motion footage or restructures the scene plan rather than delivering a slideshow.

Budget controls:

Default configuration:
  - Operations over $0.50 require confirmation
  - Total cap: $10
  - Cost estimate provided before any execution

Adjusting:
  Say "set budget cap to $5" in conversation
  or modify the config file

Post-render self-review:

FFprobe validation: resolution, frame rate, bitrate match the spec
Key frame extraction: visual quality spot-check
Audio analysis: volume levels, silence detection, sync verification

7-Dimension Provider Scoring

When multiple video or image generation providers are available for a task, the system scores all options across seven dimensions:

Dimension	Weight	Description
Task fit	30%	Provider's specialization for this content type
Output quality	20%	Historical benchmark scores
Controllability	15%	How many parameters allow fine-tuning
Reliability	15%	API stability and success rate
Cost efficiency	10%	Cost per unit of output
Latency	5%	Generation speed
Continuity	5%	Cross-scene style consistency capability

Every choice is written to a decision audit log with reasoning. If something goes wrong, you can trace back exactly why the AI selected a particular provider.

Reference Video Analysis

A particularly useful feature: provide a YouTube, TikTok, or Reels URL as a reference:

Make a video in the style of this: https://youtube.com/watch?v=xxx
Topic: quantum computing, 2 minutes, for a general US audience

The agent analyzes the reference video for:

Narration text and pacing rhythm
Scene cut frequency and beat alignment
Visual style classification
Hook structure (how the first 5 seconds are designed)

It then generates a differentiated production plan — learning the style, not copying the content — with a cost estimate attached. Execution only begins after explicit confirmation.

Provider Ecosystem

Video generation (14 providers):

Cloud: Kling, Runway Gen-4, Google Veo 3, MiniMax, HeyGen, Grok
Local GPU: WAN 2.1, Hunyuan, CogVideo, LTX-Video

Image generation (10 providers):

Cloud: DALL-E 3, Google Imagen, Recraft
Local: FLUX, Stable Diffusion

TTS (4 providers):

Cloud: ElevenLabs, Google TTS (700+ voices), OpenAI TTS
Offline: Piper (free, no API required)

Music: Suno AI, ElevenLabs Music

Links and Resources

Official Resources

🌟 GitHub: calesthio/OpenMontage
📺 YouTube: @OpenMontage (sample output videos)
💬 GitHub Discussions: Share work, ask questions, propose ideas

Tech Stack References

Remotion: remotion.dev
GSAP: greensock.com/gsap
Piper TTS: Open-source offline TTS

Conclusion

OpenMontage shifts video production from "requires mastery of a dozen professional tools" to "describe what you want in your AI coding assistant."

The 12 pipelines cover the main video types from educational animation to product advertising. The 52 tools connect the full supply chain of video, image, TTS, music, and footage sources. The quality governance mechanisms prevent the AI from delivering a low-effort slideshow. Budget controls make costs predictable before a single API call is made.

The zero-cost path matters: even with no API budget, you can run the complete workflow to understand how the system operates, then connect paid services as needed.

9k Stars for a system this complex reflects real demand for AI-assisted video production at the pipeline level, not just the single-clip level.

Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.

Welcome to my Homepage for more useful insights and interesting products.

DEV Community