My end-to-end automated YouTube video factory

#automation #softwareengineering #api #python

I'm excited to share the MVP of my latest personal project: a fully end-to-end automated YouTube video factory. This system is designed to transform a simple text idea into a complete, published video, supporting both short-form (Shorts) and long-form content.

It orchestrates a suite of powerful APIs and tools to automate the entire workflow:

Scripting: OpenAI API (GPT)
Voiceover: ElevenLabs API for high-quality TTS
Visuals: Flexible image generation via Replicate (SDXL) or a cost-effective alternative using the Pixabay API
Subtitles: Precise, locally-run transcription with OpenAI Whisper
Music: Royalty-free background music sourced from the Jamendo API
Assembly: A robust, two-step FFmpeg process that ensures smooth, "judder-free" animations
Publishing: Direct upload to YouTube via the YouTube Data API v3

Beyond just connecting APIs, the focus was on building a resilient and developer-friendly system. This includes:

A comprehensive health_check.py script to validate the environment.
Automated pytest tests for key integrations like Pixabay and Jamendo to handle API errors gracefully.
A full scheduling system for macOS using launchd to automate future uploads.

This project was a deep dive into system architecture, complex automation pipelines, and solving low-level integration challenges.

You can see the system's architecture in these diagrams:

Application Flowchart: https://bit.ly/4kzoxby
Sequence Diagram: https://bit.ly/46EHd6h

I'm always open to feedback and connecting with fellow developers!

Top comments (4)

Luis Faria • Jul 14

The diagrams look very well designed and are self explanatory, thanks for sharing and congrats on the work!!!

Could you share a bit more about the stack you used? Any frameworks for that python?

Luis Fernando Richter • Jul 15

Hi Luis, thank you so much for the kind words! I'm glad you found the diagrams clear.

That's an excellent question. The project is built with "vanilla" Python 3.12 and doesn't use a major framework like Django or FastAPI.

The architecture is designed as a modular orchestration system. The "stack" is essentially a collection of powerful, specialized tools and APIs connected by Python scripts:

APIs for Content: OpenAI (scripting), ElevenLabs (voice), Replicate/Pixabay (visuals), and Jamendo (music).

Local Processing: FFmpeg for all video assembly/animation and OpenAI Whisper for local subtitle generation.

Automation & Upload: The final step is handled by the YouTube Data API.

The core idea was to keep it a lightweight, command-line-driven pipeline rather than a web application.

I'm already working on another personal project that will transform this prototype into a Java Spring Boot API project, strictly following Clean Architecture and DDD, so I'll have another post soon.

Thanks again for your interest!