Auto-generating YouTube thumbnails with ffmpeg inside a CI pipeline

#typescript #webdev #showdev #githubactions

When I started auto-publishing YouTube videos from GitHub Actions, the default thumbnails were whatever frame YouTube chose to freeze on. Usually a half-rendered slide or a moment of black. They looked unprofessional enough that I fixed it before worrying about anything else.

The result is thumbnail.sh — 51 lines of bash that run as step 4a in my publish pipeline, generate a 1280×720 JPEG from the finished mp4, and hand it to upload.py for thumbnails.set. Here's how it works and where it's still rough.

What the pipeline looks like before the thumbnail step

My full pipeline is orchestrated by main.sh:

TTS — tts.sh generates voice.wav from a script using edge-tts
Visuals — visuals.sh writes slide_*.txt files, one per sentence
Background — bg.sh pulls a Pexels stock video or falls back to a solid color
Compose — compose.sh assembles everything into output.mp4 using ffmpeg
Thumbnail — thumbnail.sh reads output.mp4, writes thumbnail.jpg ← new
Upload — upload.py uploads the mp4 and optionally calls thumbnails.set

Thumbnail generation runs after compose because it needs the finished video as input. The upload step receives a --thumbnail arg only if the file exists — if thumbnail.sh fails, the video still uploads without a custom thumbnail instead of aborting the whole run.

The ffmpeg filter chain

The core of thumbnail.sh is a single ffmpeg invocation. It does four things in one pass:

ffmpeg -y -loglevel error \
  -ss "$SEEK" -i "$VIDEO" \
  -frames:v 1 \
  -vf "scale=1280:720:force_original_aspect_ratio=increase,crop=1280:720,\
eq=brightness=-0.18:saturation=0.85,\
vignette=PI/4.5,\
drawtext=fontfile='${FONT}':textfile='${TITLE_FILE}':fontcolor=white:fontsize=72:\
x=(w-text_w)/2:y=(h-text_h)/2:line_spacing=14:\
shadowcolor=black@0.9:shadowx=6:shadowy=6:\
box=1:boxcolor=black@0.45:boxborderw=24" \
  -q:v 3 \
  "$OUTPUT"

-ss "$SEEK" — seeks to 40% of total duration before decoding a single frame. I picked 40% empirically: the first 20% of my videos is usually a title card, and the last 10% fades out. Somewhere in the middle is almost always a content-heavy slide that reads well as a still.

scale=1280:720:force_original_aspect_ratio=increase,crop=1280:720 — my source videos are 1080×1920 (9:16 Shorts). This filter scales to fill 16:9, crops to center. YouTube's thumbnail spec is 1280×720 maximum, 2MB maximum.

eq=brightness=-0.18:saturation=0.85 — darkens the frame slightly and desaturates a little. Title text needs contrast to be readable. I tried several values; -0.18 brightness is about as far as you can go before the background looks obviously crushed.

vignette=PI/4.5 — adds edge darkening. Combined with the brightness reduction, this draws the eye toward center where the title sits.

drawtext — overlays the wrapped title. The textfile= approach rather than text= is intentional: ffmpeg's text= parameter has escaping requirements that break on apostrophes, colons, and commas that appear regularly in video titles. Writing to a temp file and pointing textfile= at it sidesteps all of that. shadowcolor=black@0.9:shadowx=6:shadowy=6 plus box=1:boxcolor=black@0.45:boxborderw=24 adds both a drop shadow and a semi-transparent text box. Either alone isn't enough when the background frame is complicated.

-q:v 3 — JPEG quality scale. ffmpeg's JPEG quality flag is inverse: 2-3 is high quality, 31 is terrible. I settled on 3 because the output is typically 200-400KB, well inside the 2MB YouTube limit. If it does exceed 2MB, the script recompresses at -q:v 6.

Title wrapping

YouTube titles can be 100 characters. At fontsize 72 on a 1280px canvas, about 24 characters fit per line. I wrap with Python's textwrap.fill:

WRAPPED_TITLE=$(python3 -c "
import textwrap, sys
title = '''$TITLE'''.strip()
print(textwrap.fill(title, width=24))
")

The triple-quote protects against titles with single quotes. It still breaks on titles with three consecutive single quotes (''') — I haven't seen one in practice but it's a known hole.

Font discovery

The script checks three hardcoded paths:

for f in \
  "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf" \
  "/usr/share/fonts/dejavu/DejaVuSans-Bold.ttf" \
  "/System/Library/Fonts/Helvetica.ttc"; do
  [ -f "$f" ] && FONT="$f" && break
done

The first two paths cover Ubuntu (GitHub Actions default runner). The third covers macOS for local testing. If none are found the script exits non-zero, which main.sh catches and swallows — the upload continues without a custom thumbnail.

I should probably install a specific font in the CI runner explicitly rather than hoping the path is stable. That's on my list.

Wiring up the YouTube thumbnails API

upload.py already handled the video upload via the YouTube Data API v3 resumable upload flow. Thumbnail upload is a separate endpoint — thumbnails.set — and it's straightforward:

def upload_thumbnail(access_token, video_id, thumb_path):
    file_size = os.path.getsize(thumb_path)
    with open(thumb_path, "rb") as f:
        data = f.read()
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "image/jpeg",
        "Content-Length": str(file_size),
    }
    url = f"https://www.googleapis.com/upload/youtube/v3/thumbnails/set?videoId={video_id}&uploadType=media"
    req = urllib.request.Request(url, data=data, headers=headers, method="POST")
    ...

One catch: thumbnails.set requires the YouTube OAuth scope youtube.upload to be enabled on the same token. If you set up your OAuth credentials without that scope, this call returns 403. I hit that on the first test run and had to regenerate the refresh token.

The upload_thumbnail call is wrapped in a try/except HTTPError with a WARN: print and a None return rather than raising. Thumbnail failure should never block a published video.

What I'd do differently

Frame selection is too simple. Picking 40% of duration works reasonably often but sometimes lands on a text-only slide with a plain background that reads fine in video but looks bland as a static thumbnail. A smarter approach would score candidate frames by visual complexity — edge density or color variance — and pick the highest-scoring one.

Text placement is center-center. That's fine for short titles. Longer titles that wrap to three or four lines push into the lower third where a progress bar or face would normally go. Adding padding-from-bottom logic would help.

The font size is fixed at 72. A 20-character title and a 90-character title both render at the same size. The 90-character version looks cramped. Dynamic fontsize based on character count would be cleaner.

I don't verify the thumbnail is actually live. After thumbnails.set returns 200, there's a delay before it shows in the YouTube Studio UI, and sometimes custom thumbnails revert if the channel doesn't have thumbnail upload permissions verified. I'm watching whether this happens on the live channel, but I don't have enough uploads yet to know the reliability.

Whether custom thumbnails move the CTR needle at all is the open question. I'll have that data in 30 days.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.