DEV Community

Andrew
Andrew

Posted on • Originally published at andrew.ooo

OpenMontage Review: Open-Source Agentic Video Production

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

OpenMontage is an open-source, agentic video production system that turns any AI coding assistant — Claude Code, Cursor, Copilot, Windsurf, or Codex — into a full video production studio. You describe what you want in plain language ("Make a 60-second animated explainer about how neural networks learn"), and the agent handles research, scripting, asset generation, voice direction, editing, subtitles, music, and final composition.

Key facts:

  • #1 GitHub Trending this week, currently sitting at 13.7K stars with 6,089 new stars in the past 7 days
  • 12 pipelines, 52 tools, 500+ agent skills — covers explainers, ads, anime shorts, documentary montages, product promos, and trailers
  • Works with zero paid API keys — out of the box you get Piper TTS, Archive.org/NASA/Wikimedia footage, Pexels/Pixabay/Unsplash, Remotion, FFmpeg, and built-in subtitles
  • Adds premium providers as you have them: FAL (FLUX, Veo, Kling), ElevenLabs, OpenAI, Google Imagen, Runway Gen-4, HeyGen, xAI Grok
  • Real cost examples from the README: $0.15 for a 12-image Ghibli-style animation, $0.69 for a single-API-key product ad, $1.33 for a 60-second Pixar-style animated short with Kling motion clips
  • Two real composition engines: Remotion (React-based) for data-driven explainers, HyperFrames (HTML/CSS/GSAP) for motion-graphics-heavy briefs
  • "Real video video" path — not just animating stills. The agent can build a corpus from free stock footage, retrieve actual motion clips, edit a timeline, and render
  • Multi-point self-review before render: ffprobe validation, frame sampling, audio level analysis, delivery-promise verification, subtitle checks
  • Provider scoring across 7 dimensions with an auditable decision log per project
  • AGPLv3 license — strong copyleft, plan accordingly if you want to wrap it in a commercial SaaS

If you've watched ChatGPT-class assistants struggle to turn a prompt into anything resembling a real video, OpenMontage is the first credible open-source answer.

Why Agentic Video Production Matters Now

For most of 2025, "AI video" meant Sora, Veo, Runway, and Kling — closed APIs that generate clips from prompts, leaving the actual production work (story, structure, voice, music, subtitles, composition) to humans or a tangle of bespoke scripts.

OpenMontage flips the abstraction. Instead of "give me a clip," you give the agent a brief and it owns the full pipeline: research, script, asset generation/retrieval, TTS narration, auto-sourced music, word-level subtitles via WhisperX, timeline composition in Remotion or HyperFrames, self-review with ffprobe and frame sampling, and render.

That's the entire workflow video editors used to charge $500-$5,000 per minute for, executed by a coding agent for a fraction of the cost — and crucially, with an audit trail you can inspect.

Installation: Five Minutes Including FFmpeg

You need Python 3.10+, Node.js 18+, FFmpeg, and an AI coding assistant.

# Install prerequisites (macOS)
brew install ffmpeg python node

# Clone and set up OpenMontage
git clone https://github.com/calesthio/OpenMontage.git
cd OpenMontage
make setup
Enter fullscreen mode Exit fullscreen mode

make setup runs pip install -r requirements.txt, installs Remotion's npm dependencies, pulls down Piper TTS, and copies .env.example to .env. Windows users hitting ERR_INVALID_ARG_TYPE should swap npm install for npx --yes npm install.

First Video: Plain Language to Final Cut

Open the OpenMontage folder in Claude Code, Cursor, or Copilot, and just tell it what you want:

Make a 60-second animated explainer about how neural networks learn.
Enter fullscreen mode Exit fullscreen mode

The agent picks a pipeline (likely the FLUX-image animation pipeline for explainers without motion clips), proposes 2-3 concepts with cost estimates, waits for your approval, then executes the full production. You can also start from a reference video:

Here's a YouTube Short I love.
Make me something like this, but about quantum computing.
Enter fullscreen mode Exit fullscreen mode

The agent will analyze the reference's transcript, pacing, scenes, keyframes, and style, then return:

  • What it keeps from the reference: pacing, hook style, structure, tone
  • What it changes: topic, visual treatment, angle, narration approach
  • What it will cost at your target duration, before any asset generation
  • What it will look like with your currently available tools

That last point is the killer feature. The agent introspects which providers are configured and tells you what's actually achievable before charging your card. If you only have OpenAI configured, it will plan around DALL-E + TTS instead of pretending it can call Kling.

The Pipeline System: 12 Recipes, Auditable Decisions

OpenMontage is pipeline-driven by design. The maintainers are explicit in AGENT_GUIDE.md: do not improvise. Pick a pipeline, read the manifest, read the stage skill, then use tools. The 12 pipelines cover animated explainers, Ghibli-style anime (FLUX images + parallax crossfade), Pixar-style shorts (Kling motion clips), product ads (single-API-key path), documentary montages (Archive.org/NASA/Wikimedia footage), Veo-backed sci-fi trailers, TikTok-style shorts, GSAP kinetic typography, product promos and launch reels, website-to-video, rigged SVG character animation, and reference-grounded productions (paste a YouTube Short, get a plan).

Each pipeline has a manifest declaring its capability envelope. The tool registry is queryable:

python -c "from tools.tool_registry import registry; \
  import json; registry.discover(); \
  print(json.dumps(registry.support_envelope(), indent=2))"
Enter fullscreen mode Exit fullscreen mode

And the available provider menu:

python -c "from tools.tool_registry import registry; \
  import json; registry.discover(); \
  print(json.dumps(registry.provider_menu(), indent=2))"
Enter fullscreen mode Exit fullscreen mode

This is what the README means by "engineering quality gates" — every provider selection is scored across 7 dimensions (cost, latency, quality, capability match, license, region, reliability) with an auditable decision log per project.

What You Get With Zero API Keys

This is the part the marketing pages bury and the engineering audience cares about most:

Capability Free Tool What It Does
Narration Piper TTS Offline neural TTS, human-sounding
Open footage Archive.org + NASA + Wikimedia Commons Free archival video
Extra stock Pexels + Unsplash + Pixabay Free stock (developer keys are free)
Composition (React) Remotion Spring animations, charts, stat cards, captions, TalkingHead
Composition (HTML/GSAP) HyperFrames Kinetic typography, product promos, SVG character rigging
Post-production FFmpeg Encoding, subtitle burn-in, audio mix, color grading
Subtitles Built-in Auto-generated word-level timing

Yes, you can produce a publishable documentary montage with $0.00 in API spend. The cost goes up the moment you ask for AI-generated stills or motion clips. That's the right tradeoff.

Add API Keys As You Have Them

The .env is opt-in, additive, and the agent adapts to what's available:

# Image + video gateway:
FAL_KEY=...               # FLUX images + Google Veo, Kling, MiniMax video + Recraft

# Music:
SUNO_API_KEY=...          # Full songs, instrumentals, any genre

# Voice & images:
ELEVENLABS_API_KEY=...    # Premium TTS, AI music, sound effects
OPENAI_API_KEY=...        # OpenAI TTS, DALL-E 3 images, gpt-image-1
XAI_API_KEY=...           # xAI Grok image gen + Grok video
GOOGLE_API_KEY=...        # Google Imagen, Google TTS (700+ voices)

# More video providers:
HEYGEN_API_KEY=...        # HeyGen gateway → VEO, Sora, Runway, Kling
RUNWAY_API_KEY=...        # Runway Gen-4 direct
Enter fullscreen mode Exit fullscreen mode

If you have a GPU, you can also unlock local video generation:

make install-gpu

# Then in .env:
VIDEO_GEN_LOCAL_ENABLED=true
VIDEO_GEN_LOCAL_MODEL=wan2.1-1.3b  # or wan2.1-14b, hunyuan-1.5, ltx2-local, cogvideo-5b
Enter fullscreen mode Exit fullscreen mode

Wan 2.1, Hunyuan 1.5, LTX2, and CogVideo 5B all run locally with no per-render cost — the fixed-cost ceiling local generation gives you is exactly the open-source pitch.

Real Cost Examples

The README ships with cost-stamped reference videos. These aren't aspirational — they're reproducible:

  • "Afternoon in Candyland" — Ghibli-style anime with 12 FLUX images, multi-image crossfade, Ken Burns camera, particle overlays, ambient music — $0.15
  • "Mori no Seishin" — forest spirit anime, 12 FLUX images with parallax + firefly particles + vignette lighting + ambient soundtrack — $0.15
  • "Into the Abyss" — deep ocean anime, 12 FLUX images, sparkle/mist overlays, light-ray effects — $0.15
  • "VOID — Neural Interface" — product ad with gpt-image-1 + OpenAI TTS + auto-music + WhisperX subtitles + Remotion data viz — $0.69
  • "THE LAST BANANA" — 60-second Pixar-style short with 6 Kling v3 motion clips (via fal.ai), Google Chirp3-HD narration, royalty-free piano, TikTok captions — $1.33
  • "SIGNAL FROM TOMORROW" — sci-fi trailer with Veo-generated motion clips and Remotion composition (cost not listed; estimate $5-$15 based on Veo pricing)

A 60-second Pixar-style short for $1.33 is the price point that breaks the current freelancer-marketplace model for short-form ad creative. The Candyland-class output at $0.15 is essentially free.

Community Reaction

OpenMontage hit #1 on GitHub Trending the day it shipped publicly and has held a top-5 slot for the better part of a week — 13,749 stars and counting, with 6,089 added in the last 7 days alone. That's roughly the velocity LMCache, codebase-memory-mcp, and Microsoft Fara-7B saw at their respective launches.

The sentiment in the discussions tab clusters around four themes:

  1. "Is the open-source path really viable?" — Yes, with caveats. Piper TTS quality is good but not ElevenLabs-good. Archive.org footage is great for documentary tone but limits you to public-domain aesthetics. Free stock images are abundant but generic. The system is honest about this in the proposal step.
  2. "What about the AI coding assistant overhead?" — The agent loop costs whatever your model costs. For a 60-second video, expect 200-500K tokens of agent reasoning across research, scripting, asset selection, and review. At Claude Opus pricing that's $3-$7 in agent fees on top of the asset costs.
  3. "AGPLv3 — really?" — Yes, and intentionally. The maintainers explicitly want self-hosters and contributors, not SaaS wrappers. If you're planning to wrap OpenMontage in a hosted product, you'll need to release your modifications.
  4. "Does the self-review actually catch problems?" — Multiple early users have screenshot-confirmed it blocking renders when the audio levels were wrong, when subtitle timing drifted, and when a "motion-led" delivery promise was violated by 80%+ still imagery. The pre-compose validation is real.

The aitoolly.com review puts it well: "The sheer volume of tools suggests that OpenMontage is not merely a wrapper for existing models, but a comprehensive environment where an AI can perform diverse technical tasks."

Honest Limitations

A few things to know before you commit:

  • AGPLv3 license — strong copyleft. Plan accordingly. Internal use and consulting work are fine; a hosted SaaS based on OpenMontage requires releasing your modifications.
  • Agent quality is upstream of output quality — Claude Opus or GPT-5-class reasoning produces meaningfully better pipeline selection and script writing than smaller models. Don't expect Sonnet-3.5-level outputs from a 7B local model driving the pipeline.
  • Remotion and HyperFrames are real engines with real learning curves. The agent abstracts most of it, but if you want to customize scene templates, you're writing React or HTML/GSAP.
  • No GUI — this is a CLI + agent system. If you want a timeline editor, look elsewhere.
  • Cost can creep — the proposal step gives you an estimate, but a 5-minute video with Kling motion clips can easily cross $20-$40 in asset costs, plus agent fees.
  • Latency is real — a 60-second video takes 10-30 minutes end-to-end depending on which generation APIs are in the path. Veo and Runway are the slowest stages.
  • Anti-bot defenses can break the research stage on aggressive sites. The agent handles this gracefully (skips and notes), but the script suffers when key sources are unreachable.

Where It Fits

OpenMontage is the right tool when you want:

  • Reproducible, auditable AI video — you can ship the prompt, pipeline, and tool path so others can recreate
  • Cost-controlled short-form video — TikTok/Reels/Shorts at $0.15-$2 per piece, ad creative at $0.50-$5
  • Self-hosted, no-vendor-lock video production — you own the prompts, the pipeline, the outputs
  • Documentary tone with free archival footage and no music licensing risk
  • Multi-provider abstraction — try different image/video/voice providers without rewriting your pipeline

It is not the right tool when you want:

  • Frame-perfect cinematic control — you want DaVinci Resolve, Premiere, or After Effects with a human editor
  • Long-form narrative (10+ minutes) — pipelines target short and mid-form
  • A SaaS you can wrap and resell — see AGPLv3
  • Real-time previews during the agent's work — the system is batch-oriented

FAQ

What's the minimum cost to produce a usable video with OpenMontage?

$0.00 in API spend if you stick to the free path (Piper TTS, Archive.org footage, free stock from Pexels/Pixabay/Unsplash, Remotion composition, FFmpeg post-production, built-in subtitles). The only floor is your agent fees, which depend on which model is driving the pipeline. Plan for $1-$5 in agent reasoning costs even for a $0-asset-cost video.

Does OpenMontage work with Claude Code specifically, or do I need a different agent?

It works with any AI coding assistant that can read files and run code: Claude Code, Cursor, Copilot, Windsurf, and Codex are all explicitly supported. The maintainers test against Claude Code most aggressively because the pipeline contracts in AGENT_GUIDE.md align well with Claude Code's tool-use loop. Cursor and Windsurf work well. Copilot's tool execution is more limited so expect to do more manual handoff there.

How does OpenMontage compare to Runway, Sora, or Veo directly?

Runway, Sora, and Veo are clip generators — you prompt, they return 5-10 seconds of motion. OpenMontage is the layer above them: it composes a full production, picks which generator to call (or whether to call one at all), writes the script, narrates, scores the music, burns subtitles, and renders the final timeline. You can configure OpenMontage to use Veo, Runway, Kling, or MiniMax as the motion-generation backend via FAL_KEY, HEYGEN_API_KEY, or RUNWAY_API_KEY. It's complementary, not competitive.

Can I run OpenMontage entirely offline?

Mostly, yes. With local GPU video generation (make install-gpu + Wan 2.1 / Hunyuan 1.5 / LTX2 / CogVideo 5B), Piper TTS for narration, a local LLM driving the agent loop (Ollama, llama.cpp, vLLM), and pre-downloaded stock libraries, the entire pipeline can run airgapped. Research stages that require live web search will obviously not work. Real archival footage from Archive.org/NASA/Wikimedia is a download step you can pre-cache.

Is the AGPLv3 license a deal-breaker for commercial use?

It depends on what you're building. Internal use, consulting deliverables, and ad creative for clients are fine — you're not distributing modifications. Wrapping OpenMontage in a hosted SaaS where users interact with your product over the network does trigger AGPLv3's network use clause, meaning you'd need to release your modifications. If that's a problem, talk to the maintainers about a commercial license, or look at alternatives with permissive licensing.

The Bottom Line

OpenMontage is the most credible open-source answer to "what does AI-native video production actually look like" we've seen so far. The pipeline-driven architecture, the seven-dimension provider scoring, the pre-compose validation, and the zero-API-key free path are all signs of a system designed by someone who has actually shipped video and gotten burned by the alternatives.

For solo creators, indie agencies, and engineering teams that need a steady stream of short-form video without paying $500/month per editor, this is now the obvious place to start. Star it, clone it, and ship a $0.15 Ghibli-style short before lunch.

Repository: github.com/calesthio/OpenMontage

Top comments (0)