DEV Community

Cover image for My end-to-end automated YouTube video factory
Luis Fernando Richter
Luis Fernando Richter

Posted on

My end-to-end automated YouTube video factory

I'm excited to share the MVP of my latest personal project: a fully end-to-end automated YouTube video factory. This system is designed to transform a simple text idea into a complete, published video, supporting both short-form (Shorts) and long-form content.

It orchestrates a suite of powerful APIs and tools to automate the entire workflow:

  • Scripting: OpenAI API (GPT)
  • Voiceover: ElevenLabs API for high-quality TTS
  • Visuals: Flexible image generation via Replicate (SDXL) or a cost-effective alternative using the Pixabay API
  • Subtitles: Precise, locally-run transcription with OpenAI Whisper
  • Music: Royalty-free background music sourced from the Jamendo API
  • Assembly: A robust, two-step FFmpeg process that ensures smooth, "judder-free" animations
  • Publishing: Direct upload to YouTube via the YouTube Data API v3

Beyond just connecting APIs, the focus was on building a resilient and developer-friendly system. This includes:

  • A comprehensive health_check.py script to validate the environment.
  • Automated pytest tests for key integrations like Pixabay and Jamendo to handle API errors gracefully.
  • A full scheduling system for macOS using launchd to automate future uploads.

This project was a deep dive into system architecture, complex automation pipelines, and solving low-level integration challenges.

You can see the system's architecture in these diagrams:

I'm always open to feedback and connecting with fellow developers!

Top comments (3)

Collapse
 
lfariaus profile image
Luis Faria

The diagrams look very well designed and are self explanatory, thanks for sharing and congrats on the work!!!

Could you share a bit more about the stack you used? Any frameworks for that python?

Collapse
 
lfrichter profile image
Luis Fernando Richter

Hi Luis, thank you so much for the kind words! I'm glad you found the diagrams clear.

That's an excellent question. The project is built with "vanilla" Python 3.12 and doesn't use a major framework like Django or FastAPI.

The architecture is designed as a modular orchestration system. The "stack" is essentially a collection of powerful, specialized tools and APIs connected by Python scripts:

APIs for Content: OpenAI (scripting), ElevenLabs (voice), Replicate/Pixabay (visuals), and Jamendo (music).

Local Processing: FFmpeg for all video assembly/animation and OpenAI Whisper for local subtitle generation.

Automation & Upload: The final step is handled by the YouTube Data API.

The core idea was to keep it a lightweight, command-line-driven pipeline rather than a web application.

I'm already working on another personal project that will transform this prototype into a Java Spring Boot API project, strictly following Clean Architecture and DDD, so I'll have another post soon.

Thanks again for your interest!

Collapse
 
lfariaus profile image
Luis Faria

Thanks for the follow up and for sharing the details, Luis! It's well thought of you, cool job, man!

I'm looking forward for the Java details once you share them here, have a great one!! Warm regards!