DEV Community

Cover image for How Chrome Extensions Inject Dual Subtitles into Netflix (And Why It’s Harder Than It Looks)
Funlingo
Funlingo

Posted on

How Chrome Extensions Inject Dual Subtitles into Netflix (And Why It’s Harder Than It Looks)

Dual subtitles on Netflix are not a built-in feature. Chrome extensions do not magically “add” a second subtitle track either. In practice, they observe subtitle data, normalize it, render a second overlay on top of the player, and keep everything synced while dealing with Netflix’s SPA behavior, player changes, and timing issues. This post breaks down the core engineering ideas behind that experience, based on publicly observable browser behavior and standard Chrome APIs.
If you have ever used a language-learning extension on Netflix, you have probably wondered:
How is this actually working?
Tools like Language Reactor, Trancy, and Funlingo make dual subtitles look effortless. But under the hood, there is no simple Netflix API that says, “please show this in two languages.”
That means the extension has to work around the platform, not with a clean official integration.
And that is where things get interesting.
Because what looks like a small UI feature is actually a mix of:
browser extension architecture,
subtitle parsing,
overlay rendering,
sync logic,
and a lot of platform-specific edge cases.
The naive approach
The first idea most developers have is simple:
video.appendChild(track);
In a normal app, this feels like it should work.
In Netflix, it usually does not.
Why?
Because Netflix is controlling the media experience very tightly. The player manages subtitle rendering, state, and lifecycle on its own. Even if the DOM accepts your change, the player may ignore it, overwrite it, or rebuild itself during navigation.
So the real solution is not “add a second track to the video.”
The real solution is closer to:
capture subtitle data,
translate or normalize it,
render your own overlay,
keep it synced with playback.
Step 1: understand the extension context problem
One of the first things that trips up browser extension developers is the difference between:
the content script world
and the page’s main world
Chrome extensions run content scripts in an isolated environment. That gives you access to the DOM, but not to all of the page’s JavaScript internals.
That matters a lot on Netflix.
If the player logic lives in the page context, your extension cannot always access it directly from a content script. So many extensions inject a script tag into the page itself.
That script runs in the page’s main world.
A simplified version looks like this:
function injectMainWorldScript(code) {
const script = document.createElement('script');
script.textContent = code;
(document.head || document.documentElement).appendChild(script);
script.remove();
}
This kind of bridge lets the extension and the page communicate.
That is one of the reasons subtitle extensions feel more like small systems than simple add-ons.
Step 2: capture subtitle data
A dual-subtitle extension needs subtitle cues:
start time
end time
text
language
The extension then needs to normalize all of that into a format it can work with.
Depending on the platform, subtitle files may come in WebVTT, TTML, or SRT-like structures.
A simple internal format might look like this:
interface SubtitleCue {
startMs: number;
endMs: number;
text: string;
translatedText?: string;
language?: string;
}
That one shape becomes the foundation for everything else:
sync
rendering
translation
search
vocabulary saving
If you skip this step and let every platform format leak into the rest of your code, the system becomes hard to maintain very quickly.
Step 3: parse WebVTT properly
WebVTT looks simple at first glance.
It is not.
A real subtitle file often includes:
timing cues
tags like or
speaker labels
positioning metadata
entity encoding
A basic parser might look like this:
function parseWebVTT(vttText) {
const cues = [];
const blocks = vttText.split('\n\n').filter(Boolean);

for (const block of blocks) {
const lines = block.split('\n');
const timingLine = lines.find(l => l.includes('-->'));
if (!timingLine) continue;

const [startStr, endStr] = timingLine
.split('-->')
.map(s => s.trim().split(' ')[0]);

const text = lines
.slice(lines.indexOf(timingLine) + 1)
.join('\n')
.replace(/<[^>]+>/g, '')
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>');

if (!text.trim()) continue;

cues.push({
start: parseTimestamp(startStr),
end: parseTimestamp(endStr),
text: text.trim(),
translatedText: null
});
}

return cues;
}
The exact parser will vary by platform, but the principle is the same:
Normalize everything early.
That makes the rest of the extension much easier to reason about.
Step 4: render a second subtitle overlay
Once you have cues, you still need to show them.
The usual approach is to create a subtitle overlay element and position it on top of the player.

The tricky part is keeping it aligned with:
player resizing,
fullscreen changes,
route changes,
and subtitle timing.
Step 5: keep the subtitles in sync
Sync is where a lot of subtitle extensions become fragile.
You need to match the current video time to the correct subtitle cue.
A simple approach is to check the current cue every frame or close to it:
function syncSubtitleToVideo(video, cues, overlay) {
let lastCueIndex = -1;

function tick() {
const currentTime = video.currentTime;
const cue = cues.find(c => currentTime >= c.start && currentTime < c.end);

if (cue) {
const currentIndex = cues.indexOf(cue);
if (lastCueIndex !== currentIndex) {
overlay.textContent = cue.translatedText || cue.text;
lastCueIndex = currentIndex;
}
} else {
overlay.textContent = '';
lastCueIndex = -1;
}

requestAnimationFrame(tick);
}

requestAnimationFrame(tick);
}
A better implementation may use requestVideoFrameCallback when available, because it is more closely tied to actual video frames.
That matters when:
the user changes playback speed,
the video buffers,
or the subtitle timing needs to feel frame-accurate.
Why the “simple” version breaks in production
A lot of the real difficulty comes from platform behavior, not the subtitle logic itself.

  1. Netflix is a single-page app Netflix can navigate between pages without full reloads. That means your extension can initialize once and then get broken state later when the user moves to another title. A MutationObserver is often needed to detect when the player reappears and reinitialize the extension.
  2. Subtitle timing drifts If you update too often, the overlay can feel laggy or jittery. If you update too rarely, subtitles drift out of sync.
  3. Users expect interaction These tools often do more than render text. They let users: click words, save vocabulary, hear pronunciation, and learn from context. That means the subtitle overlay is not just display code. It becomes an interactive learning interface.
  4. Translation adds latency If every subtitle cue triggers an API call, performance and rate limits become a real concern. That is why many tools translate lazily: translate a few cues ahead, cache results, and keep the experience feeling instant. Why this matters for language learning This is the real reason people care about dual subtitles in the first place. They are not just trying to “make subtitles prettier.” They are trying to learn from real content without constantly pausing, switching tabs, or losing context. That is the idea behind tools like Funlingo: keep the content native, keep the learning contextual, and reduce friction as much as possible. The technical challenge exists because the learning experience is valuable. The biggest lesson What looks like a “simple dual subtitle feature” is actually a small product system: page lifecycle handling subtitle parsing overlay rendering sync logic translation caching user interaction design That is why these extensions are harder to build than they look. And that is also why the good ones feel so seamless. When they work well, users do not think about the code at all. They just feel like Netflix finally became a learning environment. Closing thought If you are building in this space, the hard part is not adding text to a screen. The hard part is making that text survive a real streaming platform, stay synced, and still feel useful enough that someone wants to come back tomorrow. That is the real engineering problem. And it is a fun one. If you have built a Chrome extension on top of a modern SPA or media player, I would love to hear what broke first for you.

Top comments (0)