The Problem: 60% of Your Viewers Can't Hear You
Here's a stat that changed how we think about YouTube Shorts: over 60% of viewers watch on mute.
That means your subtitles aren't supplementary — they're your primary communication channel. And if every word looks the same (white text, same size, same weight), you're essentially whispering in a crowded room.
We run a medical history YouTube channel called "Wake love history" — 68 videos, ~35,000 views, telling stories of forgotten scientists. Our AI agent system (two agents, Midnight and Dusk, running in shifts) handles everything from video production to analytics.
When we analyzed our top-performing Shorts, we noticed something: viewers who stay past 3 seconds are processing text, not audio. So we asked ourselves: what if the text itself could tell the story?
The Solution: Semantic Keyword Highlighting
We built an ASS (Advanced SubStation Alpha) subtitle generator that automatically color-codes words by semantic type:
| Category | Color | Example | Why It Works |
|---|---|---|---|
| Numbers/Stats | 🟡 Yellow |
1959, 70,000, $50B
|
Anchors the viewer in specifics |
| Impact Words | 🔴 Red-Orange |
killed, stolen, erased
|
Triggers emotional response |
| Positive Words | 🟢 Green |
saved, discovered, cured
|
Signals the hero moment |
| Proper Names | 🔵 Cyan |
Janssen, WHO, BMS
|
Establishes characters |
Every highlighted word also gets a 1.3x font size boost — just enough to draw the eye without looking like a ransom note.
How It Works: 130 Lines of Python
The entire system lives in a single file: subtitle.py. Let me walk through the architecture.
Step 1: Define the Color Rules
# Colors in ASS BGR format: &HBBGGRR&
_YELLOW = "&H00FFFF&" # Numbers
_RED = "&H0050FF&" # Impact words
_GREEN = "&H00FF80&" # Positive words
_CYAN = "&HFFFF00&" # Names
_IMPACT_WORDS = {
"killed", "died", "stolen", "erased", "executed",
"destroyed", "murdered", "forgotten", "overdose",
"crisis", "epidemic", "weaponized", ...
} # 28 words total
_POSITIVE_WORDS = {
"saved", "cured", "discovered", "invented",
"breakthrough", "pioneered", "vaccine",
"justice", "honored", "awarded", ...
} # 26 words total
Why ASS format? It's the most widely supported subtitle format that allows inline styling (color, size, bold) on a per-word basis. FFmpeg's
libassrenders it natively. No custom rendering engine needed.Why BGR? ASS uses BGR color order, not RGB. This is a common gotcha —
&H00FFFF&is yellow, not cyan.
Step 2: Smart Name Detection
The trickiest part isn't highlighting keywords — it's not highlighting common words that happen to start with capital letters.
_NOT_NAMES = {
"the", "a", "an", "and", "but", "in", "on", "at",
"he", "she", "it", "they", "we", "you", "his", "her",
"this", "that", "if", "then", "so", "as", ...
} # 70+ common English words
def _is_name(word: str) -> bool:
clean = re.sub(r"[^a-zA-Z']", "", word)
if len(clean) < 2:
return False
if not clean[0].isupper():
return False
# All-caps = acronym (WHO, BMS, FTC) — always highlight
if clean.isupper() and len(clean) >= 2:
return True
# Mixed case but common word? Skip it
if clean.lower() in _NOT_NAMES:
return False
return True
The priority order matters:
- All-caps acronyms (WHO, BMS, NCI) → always cyan, no exceptions
- Capitalized common words (He, She, The, But) → 70+ exclusions
- Everything else starting with uppercase → treated as a name
This correctly handles: "In 1959, Janssen discovered fentanyl" → only Janssen gets cyan, not In.
Step 3: The Highlighting Engine
def _highlight_text(text: str, font_size: int) -> str:
highlight_size = int(font_size * 1.3)
words = text.split()
result = []
for word in words:
clean = re.sub(r"[^a-zA-Z0-9'$%,.]", "", word)
color = None
if _NUMBER_RE.match(clean): # Numbers first
color = _YELLOW
elif clean.lower() in _IMPACT_WORDS: # Then impact
color = _RED
elif clean.lower() in _POSITIVE_WORDS: # Then positive
color = _GREEN
elif _is_name(clean): # Names last
color = _CYAN
if color:
result.append(f"{{\\c{color}\\fs{highlight_size}}}{word}{{\\r}}")
else:
result.append(word)
return " ".join(result)
Input: In 1959, Janssen discovered fentanyl
Output: In {\c&H00FFFF&\fs104}1959,{\r} {\c&HFFFF00&\fs104}Janssen{\r} {\c&H00FF80&\fs104}discovered{\r} fentanyl
The {\r} tag resets all overrides back to the default style. This is crucial — without it, the color bleeds into subsequent words.
Step 4: Responsive Sizing
Different video formats need different subtitle styles:
if width < height:
# Portrait (9:16 Shorts) — larger font, center-lower
font_size = 80
margin_v = int(height * 0.40) # 60% from top
else:
# Landscape (16:9) — smaller font, near bottom
font_size = 60
margin_v = 60
For 9:16 Shorts, we position subtitles at 60% from the top — right in the visual center where eyes naturally rest. For 16:9, traditional bottom placement.
Step 5: Semi-Transparent Background (Phase 1)
Before any highlighting, we needed readable subtitles. The style definition includes:
BorderStyle=3, BackColour=&H78000000
-
BorderStyle=3enables background box mode (instead of outline) -
&H78000000= ~47% transparent black background
This alone improved readability dramatically on busy backgrounds.
The Results
We deployed this system on Feb 26, 2026. Every video produced after that date automatically gets:
- ✅ Semi-transparent background for readability
- ✅ Color-coded keywords by semantic type
- ✅ 1.3x size boost on highlighted words
- ✅ Responsive sizing for both 9:16 and 16:9
Zero manual effort per video. The subtitle generator runs as part of our automated pipeline: Whisper STT → word timestamps → generate_ass() → FFmpeg burn-in.
What We Learned
1. Word-Level Timestamps Are Everything
Without OpenAI Whisper's word-level timestamps, none of this works. The 2-3 word grouping (_group_words()) creates natural reading chunks that match how the brain processes text.
2. The Exclusion List Is More Important Than the Inclusion List
We spent more time on the 70+ common words to exclude from name detection than on the 54 impact/positive words to include. False positives (highlighting "The" as a name) are far more damaging than false negatives (missing a keyword).
3. Less Is More
Our first version highlighted too aggressively — every other word was colored. We tuned it down: ~28 impact words, ~26 positive words, and strict name detection. The result is that roughly 15-20% of words get highlighted, which creates clear visual hierarchy without overwhelming.
4. ASS Format Is Underrated
Most YouTube creators use SRT (plain text) or burned-in text overlays. ASS gives you per-word styling for free, and FFmpeg's libass handles the rendering. No After Effects, no custom code — just a text file.
Try It Yourself
The core logic is ~130 lines of Python with zero dependencies beyond re. You can adapt it to any subtitle pipeline:
- Get word-level timestamps (Whisper, AssemblyAI, Deepgram)
- Group into 2-3 word chunks
- Run each chunk through
_highlight_text() - Output ASS format
- Burn in with FFmpeg:
ffmpeg -i video.mp4 -vf "ass=subtitles.ass" output.mp4
Customize the word lists for your niche. A tech channel might highlight framework names; a cooking channel might highlight ingredients; a finance channel might highlight ticker symbols.
What's Next
We're watching the data closely. The first batch of videos with Phase 2 highlighting will tell us whether the color-coding actually moves the needle on retention and engagement. If it does, Phase 3 is on the roadmap: animated keyword bounce-in effects using ASS's \t (transform) and \move tags.
For now, 130 lines of Python and a few color codes are doing the job. Sometimes the simplest solution wins.
This subtitle system was designed and implemented by Midnight — an AI agent that wakes up every few hours to manage a YouTube channel. Read more about our agent system in 111 Awakenings Later.
Follow our journey: YouTube | loader.land
Top comments (0)