Introduction
"12 production pipelines, 52 tools, 500+ agent skills — turn your AI coding assistant into a full video production studio."
This is article #102 in the Open Source Project of the Day series. Today's project is OpenMontage — an open-source agentic video production system that uses Claude Code, Cursor, or Codex as its execution engine, turning natural language descriptions into fully produced videos.
Most AI video tools produce a single clip: enter a prompt, get five seconds of generated footage. OpenMontage's scope is different. It models a complete production team: researcher, scriptwriter, storyboard artist, asset creator, editor, compositor, quality reviewer — each stage has a corresponding agent skill, executed in pipeline order by your AI coding assistant.
The starting point is a sentence in plain language. The ending point is a quality-validated video file. The whole process runs inside your AI coding assistant with no tool switching.
What You'll Learn
- OpenMontage's three-layer knowledge architecture: how Tools, Skills, and Pipeline Defs work together
- All 12 production pipelines and what they cover
- The zero-cost path: what you can produce without spending a dollar
- Quality governance design: pre-compose validation, slideshow risk scoring, budget controls
- The 7-dimension provider scoring system: how the AI selects which video generation service to use
- Reference video analysis: what happens when you paste a YouTube URL
Prerequisites
- Experience with Claude Code, Cursor, or a similar AI coding tool
- Familiarity with basic video production concepts (script, shots, voiceover, editing)
- Basic Python experience
Project Background
What Is OpenMontage?
OpenMontage is an agentic video production system — "turn your AI coding assistant into a complete video production studio."
The problem it addresses isn't "generate a video clip with AI." It's the full end-to-end production pipeline from scratch to delivery. Research, scripting, storyboarding, asset generation, editing, compositing, quality review — in traditional video production these are separate roles; in OpenMontage they're separate agent skills, executed sequentially by your AI assistant.
A second design focus is the "animated stills" problem: most AI video tools produce the visual effect of motion by animating static image frames. OpenMontage can source real motion footage from free archives — Pexels, Pixabay, Archive.org, NASA, Wikimedia Commons — and cut a proper montage from it.
Author
- Author: calesthio
- Community: GitHub Discussions (Show and Tell, Ideas, Q&A)
- License: AGPL-3.0
- Video channel: @OpenMontage on YouTube
Project Stats
- ⭐ GitHub Stars: 9,000+
- 🍴 Forks: 1,300+
- 🛠 Production tools: 52
- 🎬 Pipelines: 12
- 📚 Agent Skills: 500+
- 📄 License: AGPL-3.0
Core Features
What It Does
Typical AI video tool:
Prompt → generates a single video clip → user assembles manually
OpenMontage:
"Make a 3-minute explainer about quantum computing"
↓
[Research] → gather background information and facts
↓
[Proposal] → generate production plan with cost estimate
↓
[Script] → complete narration text
↓
[Scene Plan] → visual planning for each scene
↓
[Assets] → generate/source video, images, voiceover, music
↓
[Edit] → assemble timeline
↓
[Compose] → render final video
↓
[Quality Review] → frame validation + audio analysis + delivery check
↓
Complete video file
12 Production Pipelines
| Pipeline | Output |
|---|---|
| Animated Explainer | Research-backed educational animation |
| Documentary Montage | Real-footage montage in documentary style |
| Cinematic | Trailers, teasers, mood pieces |
| Clip Factory | Batch short-form clips from long content |
| Talking Head | Speaker/presenter video |
| Avatar Spokesperson | Digital avatar presentation |
| Localization & Dub | Multi-language translation and dubbing |
| Screen Demo | Software walkthrough recording |
| Podcast Repurpose | Audio podcast → video highlight clips |
| Hybrid | Existing footage + AI-generated content |
| Animation | Motion graphics, kinetic typography |
Zero-Cost Path
No paid APIs required to run a complete video end-to-end:
| Component | Zero-cost option |
|---|---|
| Voiceover / TTS | Piper (offline, free) |
| Video footage | Pexels, Pixabay, Archive.org, NASA, Wikimedia Commons |
| Image generation | Stable Diffusion (local) |
| Video generation | WAN 2.1, Hunyuan, LTX-Video (local GPU) |
| Compositing | Remotion (React), HyperFrames (HTML/GSAP) |
| Post-production | FFmpeg |
Cost benchmarks when using paid APIs (from the README):
- Ghibli-style animation (12 FLUX images + music): $0.15
- Pixar-style animated short (6 Kling clips + narration): $1.33
- Product ad (OpenAI only): $0.69
Quick Start
Install:
git clone https://github.com/calesthio/OpenMontage.git
cd OpenMontage
make setup
Use in Claude Code:
cd OpenMontage
claude # open Claude Code
Then describe what you want in plain language:
Make a 2-minute video explaining the early warning signs of Alzheimer's disease.
Target audience: 40–60 year olds with no medical background.
Scientifically accurate but approachable.
Use real medical footage, no AI-generated faces.
The agent generates a production proposal with cost estimate and waits for confirmation before executing anything.
Deep Dive
Three-Layer Knowledge Architecture
OpenMontage separates capabilities and knowledge into three layers:
Layer 1: Execution layer
tools/ ← 52 Python tools
pipeline_defs/ ← 12 YAML pipeline definitions
schemas/ ← 15 JSON Schemas (input/output validation)
└── Defines "what can be done" and "in what order"
Layer 2: Usage convention layer
skills/ ← OpenMontage's own operational conventions
└── Tells the agent how to use this toolset correctly
Layer 3: External technology knowledge layer
.agents/skills/ ← Deep external technology knowledge
└── Expert knowledge about FFmpeg, Remotion, provider APIs
500+ agent skills are distributed across layers 2 and 3 — essentially packaging domain expertise in video production into the AI coding assistant. Each skill is a Markdown file containing the professional knowledge, common failure modes, and quality criteria for that specific step.
Rendering Engines: Remotion vs. HyperFrames
OpenMontage supports two compositing engines, each suited to different content types:
Remotion (React-based rendering):
- Describes video frames as React components
- Best for precision-timed content: subtitles, titles, data visualizations
- Stable, predictable output; developers can customize with React syntax
HyperFrames (HTML/GSAP rendering):
- Describes video using HTML + GSAP animations
- Best for kinetic typography, brand content, web-style visual design
- Higher customization flexibility
Both render locally through Node.js, with no external service dependency.
Quality Governance
This is the highest-engineering-content part of OpenMontage:
Pre-compose validation gate: Before rendering begins, the system checks whether production promises are met. Execution is blocked if:
- Planned output doesn't match the script content
- Scene coverage falls below threshold
- Asset quality doesn't meet target specifications
Slideshow Risk Score: Six dimensions evaluate whether a video is too "static" — a pile of images pretending to be video:
- Average scene duration
- Ratio of motion footage to static content
- Camera motion detection
- Scene cut frequency
- Audio dynamic range
- Visual change density
If the score exceeds the threshold, the agent actively sources more motion footage or restructures the scene plan rather than delivering a slideshow.
Budget controls:
Default configuration:
- Operations over $0.50 require confirmation
- Total cap: $10
- Cost estimate provided before any execution
Adjusting:
Say "set budget cap to $5" in conversation
or modify the config file
Post-render self-review:
- FFprobe validation: resolution, frame rate, bitrate match the spec
- Key frame extraction: visual quality spot-check
- Audio analysis: volume levels, silence detection, sync verification
7-Dimension Provider Scoring
When multiple video or image generation providers are available for a task, the system scores all options across seven dimensions:
| Dimension | Weight | Description |
|---|---|---|
| Task fit | 30% | Provider's specialization for this content type |
| Output quality | 20% | Historical benchmark scores |
| Controllability | 15% | How many parameters allow fine-tuning |
| Reliability | 15% | API stability and success rate |
| Cost efficiency | 10% | Cost per unit of output |
| Latency | 5% | Generation speed |
| Continuity | 5% | Cross-scene style consistency capability |
Every choice is written to a decision audit log with reasoning. If something goes wrong, you can trace back exactly why the AI selected a particular provider.
Reference Video Analysis
A particularly useful feature: provide a YouTube, TikTok, or Reels URL as a reference:
Make a video in the style of this: https://youtube.com/watch?v=xxx
Topic: quantum computing, 2 minutes, for a general US audience
The agent analyzes the reference video for:
- Narration text and pacing rhythm
- Scene cut frequency and beat alignment
- Visual style classification
- Hook structure (how the first 5 seconds are designed)
It then generates a differentiated production plan — learning the style, not copying the content — with a cost estimate attached. Execution only begins after explicit confirmation.
Provider Ecosystem
Video generation (14 providers):
- Cloud: Kling, Runway Gen-4, Google Veo 3, MiniMax, HeyGen, Grok
- Local GPU: WAN 2.1, Hunyuan, CogVideo, LTX-Video
Image generation (10 providers):
- Cloud: DALL-E 3, Google Imagen, Recraft
- Local: FLUX, Stable Diffusion
TTS (4 providers):
- Cloud: ElevenLabs, Google TTS (700+ voices), OpenAI TTS
- Offline: Piper (free, no API required)
Music: Suno AI, ElevenLabs Music
Links and Resources
Official Resources
- 🌟 GitHub: calesthio/OpenMontage
- 📺 YouTube: @OpenMontage (sample output videos)
- 💬 GitHub Discussions: Share work, ask questions, propose ideas
Tech Stack References
- Remotion: remotion.dev
- GSAP: greensock.com/gsap
- Piper TTS: Open-source offline TTS
Conclusion
OpenMontage shifts video production from "requires mastery of a dozen professional tools" to "describe what you want in your AI coding assistant."
The 12 pipelines cover the main video types from educational animation to product advertising. The 52 tools connect the full supply chain of video, image, TTS, music, and footage sources. The quality governance mechanisms prevent the AI from delivering a low-effort slideshow. Budget controls make costs predictable before a single API call is made.
The zero-cost path matters: even with no API budget, you can run the complete workflow to understand how the system operates, then connect paid services as needed.
9k Stars for a system this complex reflects real demand for AI-assisted video production at the pipeline level, not just the single-clip level.
Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.
Welcome to my Homepage for more useful insights and interesting products.
Top comments (0)