A Chrome extension that captures video audio, transcribes it with Groq Whisper, translates it with Lingo.dev, and overlays subtitles on any video in under 2 seconds, completely free.
Why I Built This
1. Streaming Platform Auto-Subtitles Are Unreliable
YouTube offers auto-generated subtitles for many languages, but they are often inaccurate, mistimed, and unnatural to read. For languages like Japanese, Bengali, or Arabic the quality drops significantly. LingoTitles uses Groq Whisper + Lingo.dev to generate subtitles that are more accurate, context-aware, and actually readable on YouTube and everywhere else.
2. Breaking News From Conflict Zones
We are living in a world of active conflicts and natural disasters. The first footage from any crisis a war zone, a tsunami warning, a flood is almost always filmed by someone on the ground, a local victim or witness, speaking their native language. By the time that video reaches social media it has no subtitles, no translation, nothing.
If you don't speak that language you have no idea what they are warning about, what risks are approaching, or what is actually happening on the ground. This is not just inconvenient it can be dangerous. LingoTitles solves this directly. Any video, any language, real-time subtitles in your language so critical information reaches you regardless of the language barrier.
What Is LingoTitles?
LingoTitles is a Chrome extension that generates real-time subtitles for any video on the internet. It captures the video audio, transcribes it using Groq's Whisper model, translates it using Lingo.dev, and overlays the subtitles directly on the video all in under 2 seconds, completely free.
Tech Stack
- Node.js + Express: REST API server
- Groq Whisper Large V3 Turbo: Free, ultra-fast speech-to-text (~300ms)
- Lingo.dev: Real-time translation tool
- Multer: Audio file handling
How It Works
Project Structure
lingodev2/
├── backend/
│ ├── server.js # Express server handles Groq + Lingo.dev
│ ├── package.json
│ ├── .env # Contains API keys
│ └── uploads/ # Temporary audio storage auto cleans
└── extension/
├── manifest.json # Chrome manifest V3
├── background.js # Proxies fetch to backend
├── content.js # Audio capture + subtitle overlay
├── subtitles.css # Subtitle
├── popup.html # Extension popup UI
├── popup.js # Popup logic
└── icons/
├── icon16.png
├── icon48.png
└── icon128.png
Chapter 1 | Building the Backend
1.1 Install Dependencies
npm install express cors dotenv multer groq-sdk lingo.dev
package.json
{
"name": "lingotitles-backend",
"version": "1.0.0",
"description": "lingotitles backend Whisper speech-to-text + Lingo.dev translation",
"main": "server.js",
"scripts": {
"start": "node server.js",
"dev": "nodemon server.js"
},
"dependencies": {
"cors": "^2.8.5",
"dotenv": "^16.3.1",
"express": "^4.18.2",
"groq-sdk": "latest",
"lingo.dev": "latest",
"multer": "^1.4.5-lts.1"
},
"devDependencies": {
"nodemon": "^3.0.1"
}
}
1.2 Create package-lock.json
Run npm install inside backend/ this will create the package-lock.json file.
1.3 Create .env
Create a .env file inside backend/.
GROQ_API_KEY=your_groq_api_key
LINGODOTDEV_API_KEY=your_lingo_api_key
Put the acquired keys in the env folder along with specified PORT=3000 (optional).
- Get your free Groq API key at console.groq.com
- Get your free Lingo.dev API key at lingo.dev
1.4 Create server.js
This is the heart of the backend. It sets up Express, connects Groq and Lingo.dev, and exposes two endpoints: /health for status checks and /transcribe for audio processing. We also handle the locale mapping (e.g. ja → ja-JP) that Lingo.dev requires.
'use strict';
require('dotenv').config();
const express = require('express');
const multer = require('multer');
const cors = require('cors');
const fs = require('fs');
const path = require('path');
const Groq = require('groq-sdk');
const { LingoDotDevEngine } = require('lingo.dev/sdk');
const groq = new Groq({
apiKey: process.env.GROQ_API_KEY
});
const lingo = new LingoDotDevEngine({
apiKey: process.env.LINGODOTDEV_API_KEY
});
const app = express();
app.use(cors({ origin: '*' }));
app.use(express.json());
const upload = multer({
dest: path.join(__dirname, 'uploads'),
limits: { fileSize: 25 * 1024 * 1024 }
});
fs.mkdirSync(path.join(__dirname, 'uploads'), { recursive: true });
app.get('/health', (_req, res) => {
res.json({
status: 'ok',
groq: !!process.env.GROQ_API_KEY,
lingo: !!process.env.LINGODOTDEV_API_KEY,
time: new Date().toISOString()
});
});
app.post('/transcribe', upload.single('audio'), async (req, res) => {
const tempPath = req.file?.path;
try {
if (!req.file) {
return res.status(400).json({ error: 'No audio file provided' });
}
const sourceLang = (req.body.sourceLang || 'auto').trim();
const targetLang = (req.body.targetLang || 'en').trim();
console.log(`▶ /transcribe src=${sourceLang} → tgt=${targetLang} size=${req.file.size}B`);
const original = await groqTranscribe(tempPath, sourceLang);
if (!original || !original.trim()) {
return res.json({ original: '', translated: '' });
}
console.log(` Groq Whisper → "${original}"`);
let translated = original;
if (sourceLang !== targetLang) {
translated = await lingoTranslate(original, sourceLang, targetLang);
console.log(` Lingo.dev → "${translated}"`);
}
res.json({
original: original.trim(),
translated: translated.trim()
});
} catch (err) {
console.error(' ✗ Error:', err.message);
res.status(500).json({ error: err.message });
} finally {
if (tempPath) {
fs.unlink(tempPath, () => {});
}
}
});
async function groqTranscribe(filePath, sourceLang) {
const webmPath = filePath + '.webm';
try {
fs.copyFileSync(filePath, webmPath);
const response = await groq.audio.transcriptions.create({
file: fs.createReadStream(webmPath),
model: 'whisper-large-v3-turbo',
language: sourceLang === 'auto' ? undefined : sourceLang,
response_format: 'text',
temperature: 0
});
return (typeof response === 'string' ? response : response?.text ?? '').trim();
} finally {
try { fs.unlinkSync(webmPath); } catch(_) {}
}
}
const LOCALE_MAP = {
ja: 'ja-JP', zh: 'zh-CN', ko: 'ko-KR', es: 'es-ES', fr: 'fr-FR',
de: 'de-DE', ar: 'ar-SA', fa: 'fa-IR', vi: 'vi-VN', tr: 'tr-TR',
pt: 'pt-BR', ru: 'ru-RU', hi: 'hi-IN', id: 'id-ID', th: 'th-TH',
en: 'en-US', it: 'it-IT', nl: 'nl-NL', pl: 'pl-PL', sv: 'sv-SE',
uk: 'uk-UA', cs: 'cs-CZ', ro: 'ro-RO', hu: 'hu-HU', el: 'el-GR',
he: 'he-IL', bn: 'bn-BD', mr: 'mr-IN', ta: 'ta-IN', te: 'te-IN',
kn: 'kn-IN', gu: 'gu-IN', pa: 'pa-IN', ur: 'ur-PK', ms: 'ms-MY',
tl: 'tl-PH', sw: 'sw-KE'
};
function toLocale(code) {
if (!code || code === 'auto') return null;
if (code.includes('-')) return code;
return LOCALE_MAP[code.toLowerCase()] || code;
}
async function lingoTranslate(text, sourceLang, targetLang) {
const src = toLocale(sourceLang);
const tgt = toLocale(targetLang);
const options = { targetLocale: tgt };
if (src) options.sourceLocale = src;
const result = await lingo.localizeText(text, options);
return result ?? text;
}
const PORT = parseInt(process.env.PORT || '3000', 10);
app.listen(PORT, () => {
console.log(`\n🌍 LingoTitles Backend — http://localhost:${PORT}`);
console.log(` Groq Whisper: ${process.env.GROQ_API_KEY ? '✓ free & ready' : '✗ missing GROQ_API_KEY'}`);
console.log(` Lingo.dev: ${process.env.LINGODOTDEV_API_KEY ? '✓ free & ready' : '✗ missing LINGODOTDEV_API_KEY'}`);
console.log(`\n 💰 Cost: $0.00 — both APIs are free`);
console.log(` 🌐 Languages: 99 supported + auto-detect`);
});
Chapter 2 | Building the Chrome Extension
2.1 Create manifest.json
The manifest declares permissions, registers the service worker, and injects the content script + CSS into every page.
{
"manifest_version": 3,
"name": "lingotitle",
"version": "1.0.0",
"description": "Real-time subtitles for any video. Anime, news, reels — auto-detects language and translates instantly.",
"permissions": [
"activeTab",
"scripting",
"storage",
"tabs",
"microphone"
],
"host_permissions": [
"<all_urls>"
],
"background": {
"service_worker": "background.js"
},
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["content.js"],
"css": ["subtitles.css"],
"run_at": "document_idle"
}
],
"action": {
"default_popup": "popup.html",
"default_icon": {
"16": "icons/icon16.png",
"48": "icons/icon48.png",
"128": "icons/icon128.png"
}
},
"icons": {
"16": "icons/icon16.png",
"48": "icons/icon48.png",
"128": "icons/icon128.png"
}
}
Also create an icons/ folder with icon16.png, icon48.png, and icon128.png.
2.2 Create background.js | The Service Worker
Content scripts cannot make direct fetch calls to localhost due to Chrome's CORS restrictions. The background service worker acts as a proxy — it receives the base64-encoded audio from the content script, converts it back to a Blob, and POSTs it to the backend.
chrome.runtime.onInstalled.addListener(() => {
chrome.storage.local.set({
enabled: false,
sourceLang: 'auto',
targetLang: 'en',
displayMode: 'translated',
chunkInterval: 3000,
backendUrl: 'http://localhost:3000'
});
console.log('[LingoTitles] Installed ✓');
});
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
if (msg.type === 'TRANSCRIBE') {
handleTranscribe(msg).then(sendResponse).catch(err => {
sendResponse({ error: err.message });
});
return true;
}
});
async function handleTranscribe({ audioBase64, mimeType, sourceLang, targetLang, backendUrl }) {
try {
const byteChars = atob(audioBase64);
const byteArr = new Uint8Array(byteChars.length);
for (let i = 0; i < byteChars.length; i++) {
byteArr[i] = byteChars.charCodeAt(i);
}
const blob = new Blob([byteArr], { type: mimeType || 'audio/webm' });
const form = new FormData();
form.append('audio', blob, 'chunk.webm');
form.append('sourceLang', sourceLang);
form.append('targetLang', targetLang);
const res = await fetch(`${backendUrl}/transcribe`, {
method: 'POST',
body: form
});
if (!res.ok) {
const txt = await res.text();
return { error: `Server error ${res.status}: ${txt}` };
}
return await res.json();
} catch (err) {
return { error: err.message };
}
}
2.3 Create content.js | Audio Capture & Subtitle Overlay
This is the most complex file. It finds the best video element on the page, taps the audio stream using the Web Audio API, records chunks with MediaRecorder, and sends them to the background worker. A minimum size guard (1000 bytes) filters out silent or empty chunks before they are sent to the API.
(function () {
'use strict';
console.log('[LingoTitles] content.js loaded ✓');
let cfg = {
enabled: false,
sourceLang: 'auto',
targetLang: 'en',
displayMode: 'translated',
chunkInterval: 3000,
backendUrl: 'http://localhost:3000'
};
let audioCtx = null;
let recorder = null;
let isCapturing = false;
let currentVideo = null;
let subtitleEl = null;
let chunkTimer = null;
let audioChunks = [];
chrome.storage.local.get(
['enabled','sourceLang','targetLang','displayMode','chunkInterval','backendUrl'],
data => {
Object.assign(cfg, data);
if (cfg.enabled) init();
}
);
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
if (msg.type === 'UPDATE_SETTINGS') {
const wasEnabled = cfg.enabled;
Object.assign(cfg, msg.settings);
if (cfg.enabled && !isCapturing) init();
if (!cfg.enabled && wasEnabled) teardown();
}
if (msg.type === 'GET_STATUS') {
sendResponse({
hasVideo: !!document.querySelector('video'),
capturing: isCapturing,
sourceLang: cfg.sourceLang,
targetLang: cfg.targetLang
});
}
});
function bestVideo() {
return [...document.querySelectorAll('video')]
.filter(v => v.duration > 0 || v.readyState >= 2)
.sort((a, b) => (b.videoWidth * b.videoHeight) - (a.videoWidth * a.videoHeight))[0] || null;
}
function init() {
const v = bestVideo();
if (v) { attach(v); return; }
const poll = setInterval(() => {
const v = bestVideo();
if (v) { clearInterval(poll); attach(v); }
}, 1000);
setTimeout(() => clearInterval(poll), 60000);
}
function attach(video) {
if (isCapturing) return;
currentVideo = video;
buildOverlay();
startCapture(video);
}
function buildOverlay() {
if (subtitleEl) subtitleEl.remove();
subtitleEl = document.createElement('div');
subtitleEl.className = 'LingoTitles-overlay';
document.body.appendChild(subtitleEl);
}
function showSubtitles(original, translated) {
if (!subtitleEl) return;
subtitleEl.innerHTML = '';
if (cfg.displayMode === 'dual' && original) {
const o = document.createElement('div');
o.className = 'LingoTitles-original';
o.textContent = original;
subtitleEl.appendChild(o);
}
const text = cfg.displayMode === 'original' ? original : (translated || original);
if (text) {
const t = document.createElement('div');
t.className = 'LingoTitles-text';
t.textContent = text;
subtitleEl.appendChild(t);
}
subtitleEl.classList.add('LingoTitles-visible');
clearTimeout(subtitleEl._timer);
subtitleEl._timer = setTimeout(
() => subtitleEl && subtitleEl.classList.remove('LingoTitles-visible'),
Math.max(cfg.chunkInterval * 1.5, 3500)
);
}
function startCapture(video) {
try {
audioCtx = new (window.AudioContext || window.webkitAudioContext)();
const src = audioCtx.createMediaElementSource(video);
src.connect(audioCtx.destination);
const dest = audioCtx.createMediaStreamDestination();
src.connect(dest);
beginRecording(dest.stream);
} catch (err) {
console.warn('[LingoTitles] Direct capture failed:', err.message, '— trying mic');
micFallback();
}
}
async function micFallback() {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
beginRecording(stream);
} catch (e) {
console.error('[LingoTitles] All capture methods failed:', e.message);
}
}
function beginRecording(stream) {
const mimeType = ['audio/webm;codecs=opus','audio/webm','audio/ogg']
.find(m => MediaRecorder.isTypeSupported(m)) || '';
recorder = new MediaRecorder(stream, mimeType ? { mimeType } : {});
audioChunks = [];
recorder.ondataavailable = e => {
if (e.data && e.data.size > 0) audioChunks.push(e.data);
};
recorder.onstop = () => {
if (!audioChunks.length) return;
const totalSize = audioChunks.reduce((s, c) => s + c.size, 0);
if (totalSize < 1000) {
audioChunks = [];
if (isCapturing) setTimeout(() => { try { recorder.start(); } catch(_){} }, 100);
return;
}
const blob = new Blob(audioChunks, { type: mimeType || 'audio/webm' });
audioChunks = [];
sendChunk(blob);
if (isCapturing) {
setTimeout(() => { try { recorder.start(); } catch(_){} }, 100);
}
};
recorder.start();
isCapturing = true;
chunkTimer = setInterval(() => {
if (isCapturing && recorder && recorder.state === 'recording') {
try { recorder.stop(); } catch(_) {}
}
}, cfg.chunkInterval);
}
async function sendChunk(blob) {
if (currentVideo && currentVideo.paused) return;
try {
const arrayBuffer = await blob.arrayBuffer();
const bytes = new Uint8Array(arrayBuffer);
let binary = '';
for (let i = 0; i < bytes.length; i++) binary += String.fromCharCode(bytes[i]);
const audioBase64 = btoa(binary);
chrome.runtime.sendMessage({
type: 'TRANSCRIBE',
audioBase64,
mimeType: blob.type,
sourceLang: cfg.sourceLang,
targetLang: cfg.targetLang,
backendUrl: cfg.backendUrl
}, response => {
if (chrome.runtime.lastError || !response || response.error) return;
if (response.original && response.original.trim()) {
showSubtitles(response.original.trim(), response.translated?.trim());
}
});
} catch (err) {
console.error('[LingoTitles] sendChunk error:', err.message);
}
}
function teardown() {
isCapturing = false;
clearInterval(chunkTimer);
if (recorder && recorder.state !== 'inactive') try { recorder.stop(); } catch(_) {}
if (audioCtx) try { audioCtx.close(); } catch(_) {}
recorder = null; audioCtx = null;
if (subtitleEl) subtitleEl.classList.remove('LingoTitles-visible');
}
})();
2.4 Create subtitles.css
I used position: fixed and z-index: 2147483647 (the maximum) to ensure subtitles always render on top of the video player's own UI, regardless of the site. The fade-in animation uses CSS transitions on opacity and transform.
.lingotitles-overlay {
position: fixed;
bottom: 80px;
left: 50%;
transform: translateX(-50%) translateY(8px);
z-index: 2147483647;
max-width: 80%;
width: max-content;
text-align: center;
pointer-events: none;
display: flex;
flex-direction: column;
align-items: center;
gap: 5px;
opacity: 0;
transition: opacity 0.25s ease, transform 0.25s ease;
}
.lingotitles-overlay.lingotitles-visible {
opacity: 1;
transform: translateX(-50%) translateY(0);
}
.lingotitles-original {
font-family: 'Noto Sans', sans-serif;
font-size: 16px;
color: rgba(255, 255, 255, 0.85);
background: rgba(0, 0, 0, 0.65);
padding: 3px 14px;
border-radius: 4px;
}
.lingotitles-text {
font-family: 'Segoe UI', Arial, sans-serif;
font-size: 24px;
font-weight: 700;
color: #ffffff;
background: rgba(0, 0, 0, 0.80);
padding: 6px 20px;
border-radius: 6px;
border-bottom: 2.5px solid rgba(99, 220, 255, 0.6);
}
2.5 Create popup.html and popup.js
The popup provides language dropdowns for source and target, a display mode selector (translated only / dual / original), a chunk interval slider, a save button, and a connection test button.
The popup stores settings in chrome.storage.local and broadcasts UPDATE_SETTINGS to the active tab's content script on save.
popup.html -> code for the popup.
popup.js -> key logic.
const BACKEND_URL = 'https://your-deployed-backend.onrender.com';
$('saveBtn').addEventListener('click', async () => {
const settings = {
enabled: $('enabledToggle').checked,
sourceLang: $('sourceLang').value,
targetLang: $('targetLang').value,
displayMode: $('displayMode').value,
chunkInterval: parseInt($('chunkRange').value),
backendUrl: BACKEND_URL
};
await chrome.storage.local.set(settings);
const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
chrome.tabs.sendMessage(tab.id, { type: 'UPDATE_SETTINGS', settings }).catch(() => {});
});
$('testBtn').addEventListener('click', async () => {
const res = await fetch(`${BACKEND_URL}/health`, { signal: AbortSignal.timeout(4000) });
const data = await res.json();
// Show connection status with Groq and Lingo.dev checks
});
Chapter 3 | Deploying the Backend
Going from localhost to a live URL so the extension works without running a local server.
- Add a
.gitignorethat excludes.envandnode_modules/ - Push your project to GitHub
- Create a new project on Railway or Render
- Set the root directory to
backend/ - Add environment variables:
GROQ_API_KEYandLINGODOTDEV_API_KEY - Once deployed, copy your live URL and update
BACKEND_URLinpopup.js - Reload the extension in Chrome (
chrome://extensions→ reload)
How the Pieces Connect
| Step | What happens |
|---|---|
| 1 | User enables the extension and selects languages in the popup |
| 2 |
content.js finds the video element and taps its audio via Web Audio API |
| 3 |
MediaRecorder records 3-second chunks (configurable) |
| 4 | Each chunk is base64-encoded and sent to background.js via chrome.runtime.sendMessage
|
| 5 |
background.js converts it back to a Blob and POSTs to /transcribe
|
| 6 | Express server sends audio to Groq Whisper (~300ms) |
| 7 | Transcribed text goes to Lingo.dev for translation |
| 8 | Result returns to content.js which renders it in the subtitle overlay |
Cost
$0.00 Groq Whisper Large V3 Turbo and Lingo.dev are both free tiers. 99 languages supported with auto-detection.
Acknowledgements
Lingo.dev handles all the translation features here, so adding powerful language detection and translation was a breeze no need to set up anything complicated behind the scenes. You can check out all the source code on GitHub.com..
Keep Building. Keep Shipping.


Top comments (0)