I run a faceless YouTube channel and I got tired of the manual grind. So I built a Python pipeline that takes a text script and produces a published YouTube video — mostly hands-off.
Here's the full technical breakdown.
What the pipeline does
Input: scripts/01-top-10-anime.txt (a plain text script file)
Output: Published YouTube video + YouTube Short
Time: ~45-60 minutes, unattended
One command:
python video_generator.py scripts/01-top-10-anime.txt
What runs automatically:
- Parse script file into sections
- Generate TTS voiceover per section (Microsoft Edge Neural — free, no API key)
- Download trailer/background video (yt-dlp, YouTube search)
- Fetch show metadata from MyAnimeList API (Jikan) or TMDB
- Build title overlay + rank badge (PIL)
- Render info card (episodes, score, studio, synopsis)
- Assemble all sections into final MP4 (ffmpeg, CRF 18, fast preset)
- Extract 45-second vertical Short (9:16)
- Upload Short to YouTube via Data API v3
- Schedule main video 1/day via batch uploader
I've used it to publish 108 videos. Here's exactly how each piece works.
The script format
Scripts are plain .txt files with a simple structure:
VIDEO #1: Top 10 Anime on Crunchyroll Right Now
====================
HOOK
====================
"Most people are sleeping on half of what Crunchyroll has right now..."
====================
#10 — Spy x Family
====================
"Spy x Family is the rare anime that works for absolutely everyone..."
====================
#9 — Demon Slayer
====================
"If you haven't finished Demon Slayer yet, clear your weekend..."
The parser splits on === dividers, extracts section titles and quoted voiceover text.
TTS — Microsoft Edge Neural (free)
import edge_tts
import asyncio
async def tts(text, output_path):
await edge_tts.Communicate(
text,
voice="en-US-ChristopherNeural",
rate="+20%",
pitch="+2Hz"
).save(output_path)
asyncio.run(tts("Hello world", "output.mp3"))
edge-tts is a Python wrapper around Microsoft's free TTS service. ChristopherNeural sounds genuinely good — deep, authoritative. No API key, no cost.
Fallback: ElevenLabs (paid, better) or gTTS (free, robotic).
Background video — yt-dlp trailer download
import subprocess
def download_trailer(show_title, output_path, max_seconds=55):
subprocess.run([
"yt-dlp",
f"ytsearch1:{show_title} official trailer",
"--format", "bestvideo[height<=720][ext=mp4]+bestaudio[ext=m4a]/best[height<=720]",
"--download-sections", f"*0-{max_seconds}",
"--merge-output-format", "mp4",
"--output", output_path,
"--quiet"
])
Search YouTube for the official trailer, download the first 55 seconds, use as background. Results are cached so re-renders are instant.
Anime metadata — Jikan API (MAL, free)
import requests, time
def get_anime_info(title):
time.sleep(0.5) # Jikan rate limit
r = requests.get(
"https://api.jikan.moe/v4/anime",
params={"q": title, "limit": 1}
)
data = r.json().get("data", [])
if not data:
return None
anime = data[0]
return {
"title": anime.get("title_english") or anime.get("title"),
"episodes": anime.get("episodes"),
"score": anime.get("score"),
"year": anime.get("year"),
"studios": [s["name"] for s in anime.get("studios", [])],
"genres": [g["name"] for g in anime.get("genres", [])][:3],
"teaser": anime.get("synopsis", "")[:120],
}
Free, no API key, returns everything: episodes, score, studios, genres, synopsis, poster URL.
For KDramas I fall back to TMDB (also free with a key).
Rendering the info card (PIL)
Each anime section gets a 5.5-second info card showing the stats:
from PIL import Image, ImageDraw, ImageFont
def build_info_card(info):
img = Image.new("RGB", (1920, 1080), (8, 8, 20))
draw = ImageDraw.Draw(img)
# Red accent bar
draw.rectangle([0, 0, 1920, 8], fill=(200, 30, 30))
# Title
font_title = ImageFont.truetype("arialbd.ttf", 58)
draw.text((960, 30), info["title"], font=font_title,
fill=(255,255,255), anchor="mt")
# Genre tags
x = 60
for genre in info["genres"]:
font = ImageFont.truetype("arial.ttf", 30)
w = draw.textlength(genre, font=font) + 24
draw.rectangle([x, 115, x+w, 153], fill=(200, 30, 30))
draw.text((x+12, 119), genre, fill=(255,255,255), font=font)
x += w + 12
# Stats row
stats = [
("EPISODES", str(info["episodes"])),
("YEAR", str(info["year"])),
("MAL SCORE", f"{info['score']} / 10"),
]
col_w = (1920 - 120) // len(stats)
for i, (label, value) in enumerate(stats):
cx = 60 + col_w * i + col_w // 2
draw.text((cx, 190), label, fill=(140,140,160),
font=ImageFont.truetype("arial.ttf", 34), anchor="mt")
draw.text((cx, 238), value, fill=(255,255,255),
font=ImageFont.truetype("arialbd.ttf", 50), anchor="mt")
return img
ffmpeg assembly
The core render: static image + Ken Burns zoom + RGBA overlay + TTS audio → MP4 section:
def build_section(bg_image, overlay_png, audio_mp3, output_mp4, duration):
subprocess.run([
"ffmpeg", "-y",
"-loop", "1", "-framerate", "24", "-i", bg_image,
"-loop", "1", "-framerate", "24", "-i", overlay_png,
"-i", audio_mp3,
"-filter_complex",
f"[0:v]scale=1920:1080,setsar=1,"
f"zoompan=z='min(zoom+0.0015,1.06)':x='iw/2-(iw/zoom/2)':"
f"y='ih/2-(ih/zoom/2)':d={int(duration*24)}:s=1920x1080:fps=24[kb];"
f"[kb][1:v]overlay=0:0[out]",
"-map", "[out]", "-map", "2:a",
"-c:v", "libx264", "-preset", "fast", "-crf", "18",
"-pix_fmt", "yuv420p",
"-c:a", "aac", "-b:a", "192k",
"-t", str(duration),
output_mp4
])
Then concatenate all sections into the final video:
def concat_sections(section_files, output):
with open("list.txt", "w") as f:
for s in section_files:
f.write(f"file '{s}'\n")
subprocess.run([
"ffmpeg", "-y", "-f", "concat", "-safe", "0",
"-i", "list.txt",
"-c:v", "libx264", "-preset", "fast", "-crf", "18",
"-c:a", "aac", "-ar", "44100",
output
])
Why not MoviePy? MoviePy was 8 hours per video on my machine. Switching to raw ffmpeg subprocess calls dropped it to 30-60 minutes. The trick is CRF 18 + fast preset and avoiding frame-by-frame Python loops entirely.
Auto-upload to YouTube
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload
def upload_video(yt_client, video_path, title, description, tags, publish_at=None):
status = {"privacyStatus": "private", "publishAt": publish_at} \
if publish_at else {"privacyStatus": "public"}
body = {
"snippet": {
"title": title,
"description": description,
"tags": tags,
"categoryId": "24",
},
"status": status,
}
media = MediaFileUpload(video_path, mimetype="video/mp4",
resumable=True, chunksize=10*1024*1024)
req = yt_client.videos().insert(part="snippet,status",
body=body, media_body=media)
response = None
while response is None:
status_obj, response = req.next_chunk()
return response["id"]
Schedule with publishAt to drip 1 video/day without touching it.
Results
- 108 videos published using this pipeline
- Channel: anime/KDrama niche
- Revenue path: Crunchyroll affiliate ($5-10/signup) before YouTube monetization kicks in
- Render time: 30-60 min/video on a standard Windows laptop
The whole thing is ~3,000 lines of Python across 9 scripts.
What's next
I'm building a Streamlit frontend so this becomes a proper web app instead of a CLI tool. If you want the full script pack (all 9 scripts + documentation), it's available here:
YouTube Automation Script Pack — Gumroad
Questions? Drop them in the comments — happy to go deeper on any part of the pipeline.
Built on Windows 10, Python 3.11, ffmpeg 6.0, edge-tts 6.1, yt-dlp 2024.x
Top comments (0)