DEV Community

Cover image for My Code Makes Videos While I Sleep
CounterIntEng
CounterIntEng

Posted on

My Code Makes Videos While I Sleep

Ever tried producing a 10-minute video solo?

Script. Voiceover. Visuals. Editing. Color. Music. Subtitles. Export.

That's not a weekend project — that's a full-time team. Four people minimum. Eight thousand dollars a month, easy.

I refused to accept that. So I wrote a Python script that does all of it.


What This Thing Actually Does

You give it a topic. It gives you a finished MP4.

Topic → Script → Voice → Images → Video → BGM + Subtitles → MP4
Enter fullscreen mode Exit fullscreen mode

No babysitting. No timeline dragging. No "just one more export." You run one command, walk away, and come back to a video.

python generate_plan.py "How quantum computing works" --produce
Enter fullscreen mode Exit fullscreen mode

That's the whole interaction.


The Pipeline: 5 Stages, Zero Clicks

Production Pipeline

Stage 1 — Script
An LLM takes your topic and writes the full narration plus scene-by-scene image prompts. Plug in any OpenAI-compatible provider: Ollama (free, runs locally), DeepSeek, OpenAI, Gemini — your call.

Stage 2 — Voice
Edge-TTS turns the script into speech. It's Microsoft's free TTS service. Multi-language, decent quality, zero cost.

Stage 3 — Visuals
ComfyUI + Flux generates every scene image on your local GPU. No cloud calls. No API bills. No rate limits.

Stage 4 — Motion (optional)
HunyuanVideo animates the static images into video clips. Requires 16GB VRAM. Don't have it? Skip this — static images still make a perfectly watchable video.

Stage 5 — Assembly
BGM gets layered in. Subtitles get burned. Everything stitches together into a final MP4.

Each stage is independent. Kill the process halfway through? Re-run the same command — it picks up exactly where it stopped. Checkpoint resume, built in.


Inside the Repo

File Tree

Under 20 files. Nothing hidden, nothing clever:

  • generate_plan.py — topic in, production plan out
  • produce_from_plan.py — plan in, video out
  • main.py — the pipeline core
  • modules/ — one file per stage (LLM, TTS, image gen, video assembly, BGM)
  • setup.py — interactive wizard, 3 questions, done

Hardware? Lower Than You Think

Code

What Minimum Sweet Spot
GPU 8GB VRAM (images only) 16GB VRAM (images + motion)
RAM 16GB 32GB
Disk 50GB free 100GB+

A used RTX 2070 handles it fine.


Getting Started

Three commands. That's the setup.

git clone https://github.com/counter-eng/ai-video-factory.git
cd ai-video-factory && pip install -r requirements.txt
python setup.py
Enter fullscreen mode Exit fullscreen mode

The wizard asks three things: which LLM, where's ComfyUI, GPU or CPU encoding. It writes your config. You're done.

Then:

python generate_plan.py "How radar works" --produce
Enter fullscreen mode Exit fullscreen mode

Go make coffee. Come back to a video.


Why I Open-Sourced This

I built it because I needed it. Running a content channel solo means choosing between quality and quantity — unless you automate.

After months of running this pipeline, my output as one person matched a three-person team. That felt too useful to keep private.

So here it is. MIT license. Fork it, break it, improve it, ship it. If you hit a bug, open an issue.

The entire source is yours.


Links

GitHub: https://github.com/counter-eng/ai-video-factory

YouTube: https://www.youtube.com/@CounterintuitiveEng

Star it if you find it useful. PRs welcome.

git clone https://github.com/counter-eng/ai-video-factory.git
Enter fullscreen mode Exit fullscreen mode

Your code makes videos. You make ideas.

Top comments (0)