What is OpenReels?
OpenReels takes a topic and produces a YouTube Short. It handles the research, script, voiceover, visuals, music, captions, and assembly. You get a vertical MP4 at the end.
It's MIT licensed, runs via Docker Compose, and costs about $0.68 per video. You bring your own API keys.
GitHub: github.com/tsensei/OpenReels
How it works
Give it a topic. Six stages run automatically:
| Stage | What happens |
|---|---|
| Research | Web search grounds the script in real facts |
| Script | AI creative director writes a "DirectorScore" — a per-scene production plan |
| Voiceover | TTS with word-level timestamps for karaoke-style captions |
| Visuals | AI images (Gemini, DALL-E), AI video (Veo, Kling), vision-verified stock |
| Music | AI-generated via Lyria 3 Pro, synced to the video's emotional arc |
| Assembly | Remotion composites everything with transitions and animated captions |
Every stage streams progress to a web UI. You can watch it work in real time.
The DirectorScore
This is the design choice that made everything else click. Early versions generated assets independently and the results felt disconnected. The fix: make the AI write a structured plan before generating anything.
{
"scenes": [
{
"sceneNumber": 1,
"visual": {
"type": "ai_image",
"description": "Close-up of astronaut's visor reflecting Earth",
"motion": { "type": "zoom_in", "intensity": "subtle" }
},
"voiceover": "On April 13, 1970, three men heard a bang that changed space history.",
"transition": { "type": "crossfade", "durationMs": 500 }
}
]
}
Every downstream stage reads from this score. The image generator follows the visual description, the music prompter maps the emotional arc, the caption renderer syncs to word timestamps. Same idea as film production: director writes the vision, departments execute against it.
Stock footage verification
Stock footage search is bad. Like, really bad. Search "astronaut's visor reflecting Earth" on Pexels and you get generic space B-roll.
So there's a VLM (vision-language model) that reviews each stock result and checks if it actually matches what the scene needs. Mismatch? The pipeline rewrites the search query and tries again. If stock is totally exhausted, it falls back to AI image generation.
The query reformulation step is where most of the improvement comes from. The initial search terms are rarely what stock APIs want to hear.
Music
I didn't want random background tracks. The music prompter writes a Lyria 3 Pro prompt with:
- Per-scene timestamp sections
- Intensity ratings (1-10)
- Instrument specs
- Dynamics ("sparse piano at 0:00, build strings at 0:15, full orchestra at 0:30, settle to solo cello at 0:45")
The track ducks under voiceover automatically.
Archetypes
There are 14 visual archetypes. Each one is a config that controls the entire look and feel:
- anime_illustration: fast cuts, vibrant, cel-shaded
- moody_cinematic: dark, slow, atmospheric
- editorial_caricature: satirical, exaggerated
- infographic: clean, data-heavy, rapid
- pastoral_watercolor: soft, painterly
- surreal_dreamscape: ethereal, impossible geometry
- comic_book: bold outlines, halftone dots
They control pacing (fast: 8-12 scenes, moderate: 7-10, cinematic: 5-8), color palette, caption style, the image generation style bible, and transition defaults.
Try it
git clone https://github.com/tsensei/OpenReels.git
cd OpenReels
cp .env.example .env # add your API keys
docker compose up # starts Redis + API + Worker
# Open http://localhost:3000
Or single command:
docker run --env-file .env --shm-size=2gb -v ./output:/output \
ghcr.io/tsensei/openreels "the apollo 13 disaster"
Cost
A typical video costs about $0.68:
- LLM calls: $0.003 (7 calls)
- TTS: $0.017
- AI images: $0.30 (3 images)
- AI video clip: $0.30 (1 clip)
- Music: $0.08 (Lyria generation)
- Stock footage: free
You can also go cheaper: --provider local uses Kokoro for voiceover (free, no API key), there's a bundled music library with 25 tracks, and you can use stock footage only.
Stack
TypeScript, Mastra, Vercel AI SDK 6, Fastify 5, BullMQ + Redis, Remotion 4, React 19, Tailwind, shadcn/ui.
Feedback welcome
I'm looking for thoughts on the architecture and the DirectorScore approach specifically. Contributors welcome too.



Top comments (0)