MORINAGA

Posted on Jun 25

How I built a YouTube performance classifier that adjusts tomorrow's video script bias

#ai #githubactions #programming #showdev

I've been running an automated YouTube channel alongside three programmatic directory sites since April. The video side uses a two-host VTuber pipeline that generates daily scripts and renders them overnight. What I didn't have until last week was any feedback mechanism — the script generator just produced content in a vacuum, with no idea which videos were actually landing.

The fix is scripts/yt-analytics/run.py, a 330-line Python script that runs daily, reads the last 30 videos via the YouTube Data API v3, classifies them as high or low performers, and writes bias hints back to docs/yt-knowhow-bank-en.md — the same file the script generator reads before each session.

This is a closed loop, not magic. But closing the loop is the entire point.

Fetching the Channel Without a Stable Channel ID

The first problem was channel resolution. YouTube's v3 API takes a channel ID in most endpoints, but I didn't want to hardcode an ID that might break if the channel was ever recreated. The script tries four strategies in order:

forHandle with the value of a YT_CHANNEL_HANDLE environment variable
forHandle=claudeautomate
forHandle=claude_automate
forHandle=claude-automate
If all fail: a search API call for "claude automate" with a loop over returned channel IDs

for handle in handles:
    body = http_get(
        f"...channels?part=contentDetails,statistics,snippet&forHandle={handle}&key={api_key}"
    )
    items = body.get("items") or []
    if items:
        return items[0]

The search fallback is slower and burns more quota but fires only when every direct handle attempt fails. In practice, claudeautomate matches on the first try.

Once the channel resolves, relatedPlaylists.uploads gives the uploads playlist ID. From there, playlistItems returns up to 30 recent videos with their IDs, which feeds a second videos.list request for statistics.

Classifying Videos as High or Low

The classifier is deliberately simple: median-based thresholds, no machine learning.

views = [int(v["statistics"].get("viewCount", 0)) for v in videos]
median = statistics.median(views)

for v, view in zip(videos, views):
    published = datetime.fromisoformat(v["snippet"]["publishedAt"].replace("Z", "+00:00"))
    age_h = (now - published).total_seconds() / 3600

    if view >= median * 1.5:
        high.append(v)
    elif view <= median * 0.6 and age_h >= 72:
        low.append(v)

Videos above 1.5× median views are HIGH. Videos below 0.6× median — but only if they're more than 72 hours old — are LOW. The 72-hour grace period matters: a video posted yesterday with 40% of median views might just be young. Flagging it as a dud immediately would be noise.

Everything between 0.6× and 1.5× is neither — not actionable signal, so I ignore it.

The choice of median over mean is deliberate. If one video goes viral, the mean view count distorts every other video's classification. Median is resistant to outliers. This is a lesson I learned from the three-tier content quality approach on the directory side: simple bucketing beats trying to optimize a single number.

Matching Archetypes via Title Overlap

The script generator assigns each produced video an archetype label — "tutorial", "recap", "comparison", "technical" — and saves it in the uploaded queue under content/yt-queue/uploaded/. But YouTube's analytics API doesn't expose those labels. I need to reconnect the archetype to the performance stats.

The reconnection happens via title overlap:

def title_overlap(a: str, b: str) -> int:
    aw = {w.lower().strip(",.!?:;\"'") for w in a.split() if len(w) > 2}
    bw = {w.lower().strip(",.!?:;\"'") for w in b.split() if len(w) > 2}
    return len(aw & bw)

For each video in the API response, I compare its title against every uploaded queue file and take the best match — but only if word overlap is ≥4. Titles with fewer than 4 matching significant words get labeled "unknown."

This is imperfect. Titles drift during publishing. But a ≥4-word match is strict enough that false positives are rare. In testing on a 25-video set, 21 matched correctly, 4 came back as "unknown." Not great, not unusable — good enough for aggregate pattern analysis.

Inferring Hook Patterns from the First Word

Beyond archetype, I wanted to know whether certain opening patterns in video scripts correlated with performance. The hook pattern inference is a single-function lookup on the first word of the script's opening line:

def hook_pattern(text: str) -> str:
    first_word = text.strip().lower().split()[0]
    if first_word in {"why", "how", "what", "when", "who"}:
        return "question"
    if first_word in {"three", "four", "five"} or any(c.isdigit() for c in first_word):
        return "numeric"
    if first_word in {"i", "i'm", "i've"}:
        return "first-person"
    if first_word in {"stop", "never", "don't", "do"}:
        return "imperative-contrarian"
    return "other"

It's a blunt heuristic. "How" and "Why" as first words don't automatically make a video good. But at scale — 30 videos classified per run — the distribution across HIGH and LOW buckets produces meaningful signal. If "question" hooks consistently cluster in LOW and "numeric" hooks cluster in HIGH, that's worth feeding back into the script generator's prompt context.

This is also the part I'd replace first if I were scaling this beyond 50 videos. First-word classification misses everything after the opener. A title starting with "I" could be "I ditched X after 3 months and here's why" or a boring "I made another video today." I'll eventually pass the full opening sentence through a small LLM call for categorization.

Writing Bias Hints Back to the Knowledge Bank

The output of the classifier isn't a dashboard — it's a section in docs/yt-knowhow-bank-en.md that the script generator reads at the start of each session. The update_kb function finds the ## Routine Auto-Tuner Notes header and replaces everything up to the next ##:

marker = "## Routine Auto-Tuner Notes"
idx = text.find(marker)
if idx == -1:
    new = text.rstrip() + "\n\n" + kb_section + "\n"
else:
    next_h2 = text.find("\n## ", idx + len(marker))
    if next_h2 == -1:
        new = text[:idx] + kb_section + "\n"
    else:
        new = text[:idx] + kb_section + "\n" + text[next_h2 + 1:]

The written section includes what's working (high performer archetypes and hook patterns), what's not (low performer patterns), and a "Tomorrow's bias" paragraph naming the preferred archetype and hook style for the next day. The script generator reads this in its system prompt context before writing each video.

It doesn't blindly follow the bias — it uses the information to make a more informed choice. This is analogous to what prompt caching buys at the content ETL level: inject the right context at session start rather than regenerating decisions from scratch each time.

GitHub Actions Integration

The whole thing runs as a daily cron job inside the single CI workflow that also drives the two YouTube channels and three directory sites. Required env vars are YT_API_KEY (YouTube Data API v3 key — the free tier provides 10,000 units/day, more than enough) and an optional DISCORD_WEBHOOK_URL for a daily summary push.

The script handles a missing API key gracefully: if YT_API_KEY isn't set, it prints a warning and exits 0. The CI job doesn't fail. This is the same pattern I used for the Bluesky post queue and for generating YouTube thumbnails inside CI: tools that are optional shouldn't break the build when their credentials aren't present in an environment that hasn't configured them.

What I'd Do Differently

Watch time over view count. Raw views are a noisy proxy for engagement. A video with 200 views and 90% average view duration is better than one with 500 views and 20% retention. The YouTube Analytics API (separate from the Data API, requires OAuth) exposes averageViewDuration. I didn't wire that up because OAuth from GitHub Actions is annoying — storing refresh tokens as secrets and handling rotation adds meaningful maintenance surface. But it's the right metric.

LLM hook categorization. The first-word heuristic is too coarse. Passing each title through a single Claude Haiku call to categorize the hook type and topic would cost a few cents per month and produce substantially better signal.

Bayesian smoothing for low video counts. With fewer than 20 videos, the median threshold is unstable. A prior from a reference population would give more reliable signal early on.

The current version closes a feedback loop that didn't exist before. I know exactly where to improve it — and I'll do so when the channel has enough data to make the refinements worth the work.

FAQ

Does this require a paid YouTube API quota?

No. The YouTube Data API v3 free tier provides 10,000 units/day. Each playlistItems request costs 1 unit; each videos.list call costs 1 unit per batch of 50. The entire daily run costs roughly 3–5 units.

What happens when a video doesn't match any uploaded queue file?

It gets labeled "unknown" archetype. The pattern analysis still runs; unknowns are counted separately. Over time, a high unknown rate signals that title-overlap matching needs recalibration — perhaps the ≥4-word threshold is too strict.

Why not use the YouTube Analytics API for richer metrics?

The Analytics API requires OAuth 2.0, not a simple API key. OAuth from GitHub Actions means storing refresh tokens as secrets and handling rotation. View counts from the Data API are sufficient at this stage.

Will this ever auto-modify the script generator's prompts?

Not automatically. The knowledge bank is read as context, but the decision to change archetype remains in the script generator's judgment, not in this classifier.

Does the 72-hour grace period affect HIGH classification too?

No — only LOW uses the age gate. A video that goes viral in its first 24 hours should be HIGH immediately. The grace period only protects young videos from false LOW classification.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

DEV Community