How I Built a Python Pipeline That Publishes YouTube Videos Automatically (108 Videos Proven)

#python #automation #youtube #ffmpeg

I run a faceless YouTube channel and I got tired of the manual grind. So I built a Python pipeline that takes a text script and produces a published YouTube video — mostly hands-off.

Here's the full technical breakdown.

What the pipeline does

Input:  scripts/01-top-10-anime.txt   (a plain text script file)
Output: Published YouTube video + YouTube Short
Time:   ~45-60 minutes, unattended

One command:

python video_generator.py scripts/01-top-10-anime.txt

What runs automatically:

Parse script file into sections
Generate TTS voiceover per section (Microsoft Edge Neural — free, no API key)
Download trailer/background video (yt-dlp, YouTube search)
Fetch show metadata from MyAnimeList API (Jikan) or TMDB
Build title overlay + rank badge (PIL)
Render info card (episodes, score, studio, synopsis)
Assemble all sections into final MP4 (ffmpeg, CRF 18, fast preset)
Extract 45-second vertical Short (9:16)
Upload Short to YouTube via Data API v3
Schedule main video 1/day via batch uploader

I've used it to publish 108 videos. Here's exactly how each piece works.

The script format

Scripts are plain .txt files with a simple structure:

VIDEO #1: Top 10 Anime on Crunchyroll Right Now

====================
HOOK
====================
"Most people are sleeping on half of what Crunchyroll has right now..."

====================
#10 — Spy x Family
====================
"Spy x Family is the rare anime that works for absolutely everyone..."

====================
#9 — Demon Slayer
====================
"If you haven't finished Demon Slayer yet, clear your weekend..."

The parser splits on === dividers, extracts section titles and quoted voiceover text.

TTS — Microsoft Edge Neural (free)

import edge_tts
import asyncio

async def tts(text, output_path):
    await edge_tts.Communicate(
        text,
        voice="en-US-ChristopherNeural",
        rate="+20%",
        pitch="+2Hz"
    ).save(output_path)

asyncio.run(tts("Hello world", "output.mp3"))

edge-tts is a Python wrapper around Microsoft's free TTS service. ChristopherNeural sounds genuinely good — deep, authoritative. No API key, no cost.

Fallback: ElevenLabs (paid, better) or gTTS (free, robotic).

Background video — yt-dlp trailer download

import subprocess

def download_trailer(show_title, output_path, max_seconds=55):
    subprocess.run([
        "yt-dlp",
        f"ytsearch1:{show_title} official trailer",
        "--format", "bestvideo[height<=720][ext=mp4]+bestaudio[ext=m4a]/best[height<=720]",
        "--download-sections", f"*0-{max_seconds}",
        "--merge-output-format", "mp4",
        "--output", output_path,
        "--quiet"
    ])

Search YouTube for the official trailer, download the first 55 seconds, use as background. Results are cached so re-renders are instant.

Anime metadata — Jikan API (MAL, free)

import requests, time

def get_anime_info(title):
    time.sleep(0.5)  # Jikan rate limit
    r = requests.get(
        "https://api.jikan.moe/v4/anime",
        params={"q": title, "limit": 1}
    )
    data = r.json().get("data", [])
    if not data:
        return None
    anime = data[0]
    return {
        "title":    anime.get("title_english") or anime.get("title"),
        "episodes": anime.get("episodes"),
        "score":    anime.get("score"),
        "year":     anime.get("year"),
        "studios":  [s["name"] for s in anime.get("studios", [])],
        "genres":   [g["name"] for g in anime.get("genres", [])][:3],
        "teaser":   anime.get("synopsis", "")[:120],
    }

Free, no API key, returns everything: episodes, score, studios, genres, synopsis, poster URL.

For KDramas I fall back to TMDB (also free with a key).

Rendering the info card (PIL)

Each anime section gets a 5.5-second info card showing the stats:

from PIL import Image, ImageDraw, ImageFont

def build_info_card(info):
    img = Image.new("RGB", (1920, 1080), (8, 8, 20))
    draw = ImageDraw.Draw(img)

    # Red accent bar
    draw.rectangle([0, 0, 1920, 8], fill=(200, 30, 30))

    # Title
    font_title = ImageFont.truetype("arialbd.ttf", 58)
    draw.text((960, 30), info["title"], font=font_title,
              fill=(255,255,255), anchor="mt")

    # Genre tags
    x = 60
    for genre in info["genres"]:
        font = ImageFont.truetype("arial.ttf", 30)
        w = draw.textlength(genre, font=font) + 24
        draw.rectangle([x, 115, x+w, 153], fill=(200, 30, 30))
        draw.text((x+12, 119), genre, fill=(255,255,255), font=font)
        x += w + 12

    # Stats row
    stats = [
        ("EPISODES", str(info["episodes"])),
        ("YEAR", str(info["year"])),
        ("MAL SCORE", f"{info['score']} / 10"),
    ]
    col_w = (1920 - 120) // len(stats)
    for i, (label, value) in enumerate(stats):
        cx = 60 + col_w * i + col_w // 2
        draw.text((cx, 190), label, fill=(140,140,160),
                  font=ImageFont.truetype("arial.ttf", 34), anchor="mt")
        draw.text((cx, 238), value, fill=(255,255,255),
                  font=ImageFont.truetype("arialbd.ttf", 50), anchor="mt")

    return img

ffmpeg assembly

The core render: static image + Ken Burns zoom + RGBA overlay + TTS audio → MP4 section:

def build_section(bg_image, overlay_png, audio_mp3, output_mp4, duration):
    subprocess.run([
        "ffmpeg", "-y",
        "-loop", "1", "-framerate", "24", "-i", bg_image,
        "-loop", "1", "-framerate", "24", "-i", overlay_png,
        "-i", audio_mp3,
        "-filter_complex",
        f"[0:v]scale=1920:1080,setsar=1,"
        f"zoompan=z='min(zoom+0.0015,1.06)':x='iw/2-(iw/zoom/2)':"
        f"y='ih/2-(ih/zoom/2)':d={int(duration*24)}:s=1920x1080:fps=24[kb];"
        f"[kb][1:v]overlay=0:0[out]",
        "-map", "[out]", "-map", "2:a",
        "-c:v", "libx264", "-preset", "fast", "-crf", "18",
        "-pix_fmt", "yuv420p",
        "-c:a", "aac", "-b:a", "192k",
        "-t", str(duration),
        output_mp4
    ])

Then concatenate all sections into the final video:

def concat_sections(section_files, output):
    with open("list.txt", "w") as f:
        for s in section_files:
            f.write(f"file '{s}'\n")
    subprocess.run([
        "ffmpeg", "-y", "-f", "concat", "-safe", "0",
        "-i", "list.txt",
        "-c:v", "libx264", "-preset", "fast", "-crf", "18",
        "-c:a", "aac", "-ar", "44100",
        output
    ])

Why not MoviePy? MoviePy was 8 hours per video on my machine. Switching to raw ffmpeg subprocess calls dropped it to 30-60 minutes. The trick is CRF 18 + fast preset and avoiding frame-by-frame Python loops entirely.

Auto-upload to YouTube

from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload

def upload_video(yt_client, video_path, title, description, tags, publish_at=None):
    status = {"privacyStatus": "private", "publishAt": publish_at} \
             if publish_at else {"privacyStatus": "public"}

    body = {
        "snippet": {
            "title": title,
            "description": description,
            "tags": tags,
            "categoryId": "24",
        },
        "status": status,
    }
    media = MediaFileUpload(video_path, mimetype="video/mp4",
                            resumable=True, chunksize=10*1024*1024)
    req = yt_client.videos().insert(part="snippet,status",
                                     body=body, media_body=media)
    response = None
    while response is None:
        status_obj, response = req.next_chunk()
    return response["id"]

Schedule with publishAt to drip 1 video/day without touching it.

Results

108 videos published using this pipeline
Channel: anime/KDrama niche
Revenue path: Crunchyroll affiliate ($5-10/signup) before YouTube monetization kicks in
Render time: 30-60 min/video on a standard Windows laptop

The whole thing is ~3,000 lines of Python across 9 scripts.

What's next

I'm building a Streamlit frontend so this becomes a proper web app instead of a CLI tool. If you want the full script pack (all 9 scripts + documentation), it's available here:

YouTube Automation Script Pack — Gumroad

Questions? Drop them in the comments — happy to go deeper on any part of the pipeline.

Built on Windows 10, Python 3.11, ffmpeg 6.0, edge-tts 6.1, yt-dlp 2024.x