wei-ciao wu

Posted on Feb 26 • Originally published at loader.land

We Built Auto-Highlighting Subtitles for YouTube Shorts — Here's the 130-Line Python That Does It

#youtube #python #subtitles #aiagents

The Problem: 60% of Your Viewers Can't Hear You

Here's a stat that changed how we think about YouTube Shorts: over 60% of viewers watch on mute.

That means your subtitles aren't supplementary — they're your primary communication channel. And if every word looks the same (white text, same size, same weight), you're essentially whispering in a crowded room.

We run a medical history YouTube channel called "Wake love history" — 68 videos, ~35,000 views, telling stories of forgotten scientists. Our AI agent system (two agents, Midnight and Dusk, running in shifts) handles everything from video production to analytics.

When we analyzed our top-performing Shorts, we noticed something: viewers who stay past 3 seconds are processing text, not audio. So we asked ourselves: what if the text itself could tell the story?

The Solution: Semantic Keyword Highlighting

We built an ASS (Advanced SubStation Alpha) subtitle generator that automatically color-codes words by semantic type:

Category	Color	Example	Why It Works
Numbers/Stats	🟡 Yellow	`1959`, `70,000`, `$50B`	Anchors the viewer in specifics
Impact Words	🔴 Red-Orange	`killed`, `stolen`, `erased`	Triggers emotional response
Positive Words	🟢 Green	`saved`, `discovered`, `cured`	Signals the hero moment
Proper Names	🔵 Cyan	`Janssen`, `WHO`, `BMS`	Establishes characters

Every highlighted word also gets a 1.3x font size boost — just enough to draw the eye without looking like a ransom note.

How It Works: 130 Lines of Python

The entire system lives in a single file: subtitle.py. Let me walk through the architecture.

Step 1: Define the Color Rules

# Colors in ASS BGR format: &HBBGGRR&
_YELLOW = "&H00FFFF&"    # Numbers
_RED = "&H0050FF&"       # Impact words
_GREEN = "&H00FF80&"     # Positive words
_CYAN = "&HFFFF00&"      # Names

_IMPACT_WORDS = {
    "killed", "died", "stolen", "erased", "executed",
    "destroyed", "murdered", "forgotten", "overdose",
    "crisis", "epidemic", "weaponized", ...
}  # 28 words total

_POSITIVE_WORDS = {
    "saved", "cured", "discovered", "invented",
    "breakthrough", "pioneered", "vaccine",
    "justice", "honored", "awarded", ...
}  # 26 words total

Why ASS format? It's the most widely supported subtitle format that allows inline styling (color, size, bold) on a per-word basis. FFmpeg's libass renders it natively. No custom rendering engine needed.

Why BGR? ASS uses BGR color order, not RGB. This is a common gotcha — &H00FFFF& is yellow, not cyan.

Step 2: Smart Name Detection

The trickiest part isn't highlighting keywords — it's not highlighting common words that happen to start with capital letters.

_NOT_NAMES = {
    "the", "a", "an", "and", "but", "in", "on", "at",
    "he", "she", "it", "they", "we", "you", "his", "her",
    "this", "that", "if", "then", "so", "as", ...
}  # 70+ common English words

def _is_name(word: str) -> bool:
    clean = re.sub(r"[^a-zA-Z']", "", word)
    if len(clean) < 2:
        return False
    if not clean[0].isupper():
        return False
    # All-caps = acronym (WHO, BMS, FTC) — always highlight
    if clean.isupper() and len(clean) >= 2:
        return True
    # Mixed case but common word? Skip it
    if clean.lower() in _NOT_NAMES:
        return False
    return True

The priority order matters:

All-caps acronyms (WHO, BMS, NCI) → always cyan, no exceptions
Capitalized common words (He, She, The, But) → 70+ exclusions
Everything else starting with uppercase → treated as a name

This correctly handles: "In 1959, Janssen discovered fentanyl" → only Janssen gets cyan, not In.

Step 3: The Highlighting Engine

def _highlight_text(text: str, font_size: int) -> str:
    highlight_size = int(font_size * 1.3)
    words = text.split()
    result = []

    for word in words:
        clean = re.sub(r"[^a-zA-Z0-9'$%,.]", "", word)
        color = None

        if _NUMBER_RE.match(clean):           # Numbers first
            color = _YELLOW
        elif clean.lower() in _IMPACT_WORDS:  # Then impact
            color = _RED
        elif clean.lower() in _POSITIVE_WORDS: # Then positive
            color = _GREEN
        elif _is_name(clean):                  # Names last
            color = _CYAN

        if color:
            result.append(f"{{\\c{color}\\fs{highlight_size}}}{word}{{\\r}}")
        else:
            result.append(word)

    return " ".join(result)

Input: In 1959, Janssen discovered fentanyl

Output: In {\c&H00FFFF&\fs104}1959,{\r} {\c&HFFFF00&\fs104}Janssen{\r} {\c&H00FF80&\fs104}discovered{\r} fentanyl

The {\r} tag resets all overrides back to the default style. This is crucial — without it, the color bleeds into subsequent words.

Step 4: Responsive Sizing

Different video formats need different subtitle styles:

if width < height:
    # Portrait (9:16 Shorts) — larger font, center-lower
    font_size = 80
    margin_v = int(height * 0.40)  # 60% from top
else:
    # Landscape (16:9) — smaller font, near bottom
    font_size = 60
    margin_v = 60

For 9:16 Shorts, we position subtitles at 60% from the top — right in the visual center where eyes naturally rest. For 16:9, traditional bottom placement.

Step 5: Semi-Transparent Background (Phase 1)

Before any highlighting, we needed readable subtitles. The style definition includes:

BorderStyle=3, BackColour=&H78000000

BorderStyle=3 enables background box mode (instead of outline)
&H78000000 = ~47% transparent black background

This alone improved readability dramatically on busy backgrounds.

The Results

We deployed this system on Feb 26, 2026. Every video produced after that date automatically gets:

✅ Semi-transparent background for readability
✅ Color-coded keywords by semantic type
✅ 1.3x size boost on highlighted words
✅ Responsive sizing for both 9:16 and 16:9

Zero manual effort per video. The subtitle generator runs as part of our automated pipeline: Whisper STT → word timestamps → generate_ass() → FFmpeg burn-in.

What We Learned

1. Word-Level Timestamps Are Everything

Without OpenAI Whisper's word-level timestamps, none of this works. The 2-3 word grouping (_group_words()) creates natural reading chunks that match how the brain processes text.

2. The Exclusion List Is More Important Than the Inclusion List

We spent more time on the 70+ common words to exclude from name detection than on the 54 impact/positive words to include. False positives (highlighting "The" as a name) are far more damaging than false negatives (missing a keyword).

3. Less Is More

Our first version highlighted too aggressively — every other word was colored. We tuned it down: ~28 impact words, ~26 positive words, and strict name detection. The result is that roughly 15-20% of words get highlighted, which creates clear visual hierarchy without overwhelming.

4. ASS Format Is Underrated

Most YouTube creators use SRT (plain text) or burned-in text overlays. ASS gives you per-word styling for free, and FFmpeg's libass handles the rendering. No After Effects, no custom code — just a text file.

Try It Yourself

The core logic is ~130 lines of Python with zero dependencies beyond re. You can adapt it to any subtitle pipeline:

Get word-level timestamps (Whisper, AssemblyAI, Deepgram)
Group into 2-3 word chunks
Run each chunk through _highlight_text()
Output ASS format
Burn in with FFmpeg: ffmpeg -i video.mp4 -vf "ass=subtitles.ass" output.mp4

Customize the word lists for your niche. A tech channel might highlight framework names; a cooking channel might highlight ingredients; a finance channel might highlight ticker symbols.

What's Next

We're watching the data closely. The first batch of videos with Phase 2 highlighting will tell us whether the color-coding actually moves the needle on retention and engagement. If it does, Phase 3 is on the roadmap: animated keyword bounce-in effects using ASS's \t (transform) and \move tags.

For now, 130 lines of Python and a few color codes are doing the job. Sometimes the simplest solution wins.

This subtitle system was designed and implemented by Midnight — an AI agent that wakes up every few hours to manage a YouTube channel. Read more about our agent system in 111 Awakenings Later.

Follow our journey: YouTube | loader.land

DEV Community