Sayak Bhunia

Posted on Mar 13

Building LingoTitles: Real-Time Multilingual Subtitles for Any Video on the Internet

#i18n #multilingual #node #subtitles

A Chrome extension that captures video audio, transcribes it with Groq Whisper, translates it with Lingo.dev, and overlays subtitles on any video in under 2 seconds, completely free.

Why I Built This

1. Streaming Platform Auto-Subtitles Are Unreliable

YouTube offers auto-generated subtitles for many languages, but they are often inaccurate, mistimed, and unnatural to read. For languages like Japanese, Bengali, or Arabic the quality drops significantly. LingoTitles uses Groq Whisper + Lingo.dev to generate subtitles that are more accurate, context-aware, and actually readable on YouTube and everywhere else.

2. Breaking News From Conflict Zones

We are living in a world of active conflicts and natural disasters. The first footage from any crisis a war zone, a tsunami warning, a flood is almost always filmed by someone on the ground, a local victim or witness, speaking their native language. By the time that video reaches social media it has no subtitles, no translation, nothing.

If you don't speak that language you have no idea what they are warning about, what risks are approaching, or what is actually happening on the ground. This is not just inconvenient it can be dangerous. LingoTitles solves this directly. Any video, any language, real-time subtitles in your language so critical information reaches you regardless of the language barrier.

What Is LingoTitles?

LingoTitles is a Chrome extension that generates real-time subtitles for any video on the internet. It captures the video audio, transcribes it using Groq's Whisper model, translates it using Lingo.dev, and overlays the subtitles directly on the video all in under 2 seconds, completely free.

Tech Stack

Node.js + Express: REST API server
Groq Whisper Large V3 Turbo: Free, ultra-fast speech-to-text (~300ms)
Lingo.dev: Real-time translation tool
Multer: Audio file handling

How It Works

Project Structure

lingodev2/
├── backend/
│   ├── server.js           # Express server handles Groq + Lingo.dev 
│   ├── package.json
│   ├── .env                # Contains API keys
│   └── uploads/            # Temporary audio storage auto cleans
└── extension/
    ├── manifest.json        # Chrome manifest V3
    ├── background.js        # Proxies fetch to backend
    ├── content.js           # Audio capture + subtitle overlay
    ├── subtitles.css        # Subtitle 
    ├── popup.html           # Extension popup UI
    ├── popup.js             # Popup logic
    └── icons/
        ├── icon16.png
        ├── icon48.png
        └── icon128.png

Chapter 1 | Building the Backend

1.1 Install Dependencies

npm install express cors dotenv multer groq-sdk lingo.dev

`package.json`

{
  "name": "lingotitles-backend",
  "version": "1.0.0",
  "description": "lingotitles backend Whisper speech-to-text + Lingo.dev translation",
  "main": "server.js",
  "scripts": {
    "start": "node server.js",
    "dev":   "nodemon server.js"
  },
  "dependencies": {
    "cors":       "^2.8.5",
    "dotenv":     "^16.3.1",
    "express":    "^4.18.2",
    "groq-sdk":   "latest",
    "lingo.dev":  "latest",
    "multer":     "^1.4.5-lts.1"
  },
  "devDependencies": {
    "nodemon": "^3.0.1"
  }
}

1.2 Create `package-lock.json`

Run npm install inside backend/ this will create the package-lock.json file.

1.3 Create `.env`

Create a .env file inside backend/.

GROQ_API_KEY=your_groq_api_key
LINGODOTDEV_API_KEY=your_lingo_api_key

Put the acquired keys in the env folder along with specified PORT=3000 (optional).

Get your free Groq API key at console.groq.com
Get your free Lingo.dev API key at lingo.dev

1.4 Create `server.js`

This is the heart of the backend. It sets up Express, connects Groq and Lingo.dev, and exposes two endpoints: /health for status checks and /transcribe for audio processing. We also handle the locale mapping (e.g. ja → ja-JP) that Lingo.dev requires.

'use strict';

require('dotenv').config();

const express  = require('express');
const multer  = require('multer');
const cors = require('cors');
const fs = require('fs');
const path = require('path');
const Groq = require('groq-sdk');
const { LingoDotDevEngine } = require('lingo.dev/sdk');

const groq = new Groq({
  apiKey: process.env.GROQ_API_KEY
});

const lingo = new LingoDotDevEngine({
  apiKey: process.env.LINGODOTDEV_API_KEY
});

const app = express();

app.use(cors({ origin: '*' }));
app.use(express.json());

const upload = multer({
  dest: path.join(__dirname, 'uploads'),
  limits: { fileSize: 25 * 1024 * 1024 }
});

fs.mkdirSync(path.join(__dirname, 'uploads'), { recursive: true });

app.get('/health', (_req, res) => {
  res.json({
    status: 'ok',
    groq:   !!process.env.GROQ_API_KEY,
    lingo:  !!process.env.LINGODOTDEV_API_KEY,
    time:   new Date().toISOString()
  });
});

app.post('/transcribe', upload.single('audio'), async (req, res) => {
  const tempPath = req.file?.path;
  try {
    if (!req.file) {
      return res.status(400).json({ error: 'No audio file provided' });
    }
    const sourceLang = (req.body.sourceLang || 'auto').trim();
    const targetLang = (req.body.targetLang || 'en').trim();
    console.log(`▶  /transcribe  src=${sourceLang} → tgt=${targetLang}  size=${req.file.size}B`);
    const original = await groqTranscribe(tempPath, sourceLang);
    if (!original || !original.trim()) {
      return res.json({ original: '', translated: '' });
    }
    console.log(`   Groq Whisper → "${original}"`);
    let translated = original;
    if (sourceLang !== targetLang) {
      translated = await lingoTranslate(original, sourceLang, targetLang);
      console.log(`   Lingo.dev    → "${translated}"`);
    }
    res.json({
      original:   original.trim(),
      translated: translated.trim()
    });
  } catch (err) {
    console.error('   ✗ Error:', err.message);
    res.status(500).json({ error: err.message });
  } finally {
    if (tempPath) {
      fs.unlink(tempPath, () => {});
    }
  }
});

async function groqTranscribe(filePath, sourceLang) {
  const webmPath = filePath + '.webm';
  try {
    fs.copyFileSync(filePath, webmPath);
    const response = await groq.audio.transcriptions.create({
      file:            fs.createReadStream(webmPath),
      model:           'whisper-large-v3-turbo',
      language:        sourceLang === 'auto' ? undefined : sourceLang,
      response_format: 'text',
      temperature:     0
    });
    return (typeof response === 'string' ? response : response?.text ?? '').trim();
  } finally {
    try { fs.unlinkSync(webmPath); } catch(_) {}
  }
}

const LOCALE_MAP = {
  ja: 'ja-JP', zh: 'zh-CN', ko: 'ko-KR', es: 'es-ES', fr: 'fr-FR',
  de: 'de-DE', ar: 'ar-SA', fa: 'fa-IR', vi: 'vi-VN', tr: 'tr-TR',
  pt: 'pt-BR', ru: 'ru-RU', hi: 'hi-IN', id: 'id-ID', th: 'th-TH',
  en: 'en-US', it: 'it-IT', nl: 'nl-NL', pl: 'pl-PL', sv: 'sv-SE',
  uk: 'uk-UA', cs: 'cs-CZ', ro: 'ro-RO', hu: 'hu-HU', el: 'el-GR',
  he: 'he-IL', bn: 'bn-BD', mr: 'mr-IN', ta: 'ta-IN', te: 'te-IN',
  kn: 'kn-IN', gu: 'gu-IN', pa: 'pa-IN', ur: 'ur-PK', ms: 'ms-MY',
  tl: 'tl-PH', sw: 'sw-KE'
};

function toLocale(code) {
  if (!code || code === 'auto') return null;
  if (code.includes('-')) return code;
  return LOCALE_MAP[code.toLowerCase()] || code;
}

async function lingoTranslate(text, sourceLang, targetLang) {
  const src = toLocale(sourceLang);
  const tgt = toLocale(targetLang);
  const options = { targetLocale: tgt };
  if (src) options.sourceLocale = src;
  const result = await lingo.localizeText(text, options);
  return result ?? text;
}

const PORT = parseInt(process.env.PORT || '3000', 10);

app.listen(PORT, () => {
  console.log(`\n🌍 LingoTitles Backend — http://localhost:${PORT}`);
  console.log(`   Groq Whisper: ${process.env.GROQ_API_KEY       ? '✓ free & ready' : '✗ missing GROQ_API_KEY'}`);
  console.log(`   Lingo.dev:   ${process.env.LINGODOTDEV_API_KEY ? '✓ free & ready' : '✗ missing LINGODOTDEV_API_KEY'}`);
  console.log(`\n   💰 Cost: $0.00 — both APIs are free`);
  console.log(`   🌐 Languages: 99 supported + auto-detect`);
});

Chapter 2 | Building the Chrome Extension

2.1 Create `manifest.json`

The manifest declares permissions, registers the service worker, and injects the content script + CSS into every page.

{
  "manifest_version": 3,
  "name": "lingotitle",
  "version": "1.0.0",
  "description": "Real-time subtitles for any video. Anime, news, reels — auto-detects language and translates instantly.",
  "permissions": [
    "activeTab",
    "scripting",
    "storage",
    "tabs",
    "microphone"
  ],
  "host_permissions": [
    "<all_urls>"
  ],
  "background": {
    "service_worker": "background.js"
  },
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["content.js"],
      "css": ["subtitles.css"],
      "run_at": "document_idle"
    }
  ],
  "action": {
    "default_popup": "popup.html",
    "default_icon": {
      "16":  "icons/icon16.png",
      "48":  "icons/icon48.png",
      "128": "icons/icon128.png"
    }
  },
  "icons": {
    "16":  "icons/icon16.png",
    "48":  "icons/icon48.png",
    "128": "icons/icon128.png"
  }
}

Also create an icons/ folder with icon16.png, icon48.png, and icon128.png.

2.2 Create `background.js` | The Service Worker

Content scripts cannot make direct fetch calls to localhost due to Chrome's CORS restrictions. The background service worker acts as a proxy — it receives the base64-encoded audio from the content script, converts it back to a Blob, and POSTs it to the backend.

chrome.runtime.onInstalled.addListener(() => {
  chrome.storage.local.set({
    enabled:       false,
    sourceLang:    'auto',
    targetLang:    'en',
    displayMode:   'translated',
    chunkInterval: 3000,
    backendUrl:    'http://localhost:3000'
  });
  console.log('[LingoTitles] Installed ✓');
});

chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
  if (msg.type === 'TRANSCRIBE') {
    handleTranscribe(msg).then(sendResponse).catch(err => {
      sendResponse({ error: err.message });
    });
    return true;
  }
});

async function handleTranscribe({ audioBase64, mimeType, sourceLang, targetLang, backendUrl }) {
  try {
    const byteChars = atob(audioBase64);
    const byteArr   = new Uint8Array(byteChars.length);
    for (let i = 0; i < byteChars.length; i++) {
      byteArr[i] = byteChars.charCodeAt(i);
    }
    const blob = new Blob([byteArr], { type: mimeType || 'audio/webm' });
    const form = new FormData();
    form.append('audio',      blob, 'chunk.webm');
    form.append('sourceLang', sourceLang);
    form.append('targetLang', targetLang);
    const res = await fetch(`${backendUrl}/transcribe`, {
      method: 'POST',
      body:   form
    });
    if (!res.ok) {
      const txt = await res.text();
      return { error: `Server error ${res.status}: ${txt}` };
    }
    return await res.json();
  } catch (err) {
    return { error: err.message };
  }
}

2.3 Create `content.js` | Audio Capture & Subtitle Overlay

This is the most complex file. It finds the best video element on the page, taps the audio stream using the Web Audio API, records chunks with MediaRecorder, and sends them to the background worker. A minimum size guard (1000 bytes) filters out silent or empty chunks before they are sent to the API.

(function () {
  'use strict';

  console.log('[LingoTitles] content.js loaded ✓');

  let cfg = {
    enabled:       false,
    sourceLang:    'auto',
    targetLang:    'en',
    displayMode:   'translated',
    chunkInterval: 3000,
    backendUrl:    'http://localhost:3000'
  };

  let audioCtx     = null;
  let recorder     = null;
  let isCapturing  = false;
  let currentVideo = null;
  let subtitleEl   = null;
  let chunkTimer   = null;
  let audioChunks  = [];

  chrome.storage.local.get(
    ['enabled','sourceLang','targetLang','displayMode','chunkInterval','backendUrl'],
    data => {
      Object.assign(cfg, data);
      if (cfg.enabled) init();
    }
  );

  chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
    if (msg.type === 'UPDATE_SETTINGS') {
      const wasEnabled = cfg.enabled;
      Object.assign(cfg, msg.settings);
      if (cfg.enabled && !isCapturing) init();
      if (!cfg.enabled && wasEnabled) teardown();
    }
    if (msg.type === 'GET_STATUS') {
      sendResponse({
        hasVideo:   !!document.querySelector('video'),
        capturing:  isCapturing,
        sourceLang: cfg.sourceLang,
        targetLang: cfg.targetLang
      });
    }
  });

  function bestVideo() {
    return [...document.querySelectorAll('video')]
      .filter(v => v.duration > 0 || v.readyState >= 2)
      .sort((a, b) => (b.videoWidth * b.videoHeight) - (a.videoWidth * a.videoHeight))[0] || null;
  }

  function init() {
    const v = bestVideo();
    if (v) { attach(v); return; }
    const poll = setInterval(() => {
      const v = bestVideo();
      if (v) { clearInterval(poll); attach(v); }
    }, 1000);
    setTimeout(() => clearInterval(poll), 60000);
  }

  function attach(video) {
    if (isCapturing) return;
    currentVideo = video;
    buildOverlay();
    startCapture(video);
  }

  function buildOverlay() {
    if (subtitleEl) subtitleEl.remove();
    subtitleEl = document.createElement('div');
    subtitleEl.className = 'LingoTitles-overlay';
    document.body.appendChild(subtitleEl);
  }

  function showSubtitles(original, translated) {
    if (!subtitleEl) return;
    subtitleEl.innerHTML = '';
    if (cfg.displayMode === 'dual' && original) {
      const o = document.createElement('div');
      o.className = 'LingoTitles-original';
      o.textContent = original;
      subtitleEl.appendChild(o);
    }
    const text = cfg.displayMode === 'original' ? original : (translated || original);
    if (text) {
      const t = document.createElement('div');
      t.className = 'LingoTitles-text';
      t.textContent = text;
      subtitleEl.appendChild(t);
    }
    subtitleEl.classList.add('LingoTitles-visible');
    clearTimeout(subtitleEl._timer);
    subtitleEl._timer = setTimeout(
      () => subtitleEl && subtitleEl.classList.remove('LingoTitles-visible'),
      Math.max(cfg.chunkInterval * 1.5, 3500)
    );
  }

  function startCapture(video) {
    try {
      audioCtx = new (window.AudioContext || window.webkitAudioContext)();
      const src  = audioCtx.createMediaElementSource(video);
      src.connect(audioCtx.destination);
      const dest = audioCtx.createMediaStreamDestination();
      src.connect(dest);
      beginRecording(dest.stream);
    } catch (err) {
      console.warn('[LingoTitles] Direct capture failed:', err.message, '— trying mic');
      micFallback();
    }
  }

  async function micFallback() {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      beginRecording(stream);
    } catch (e) {
      console.error('[LingoTitles] All capture methods failed:', e.message);
    }
  }

  function beginRecording(stream) {
    const mimeType = ['audio/webm;codecs=opus','audio/webm','audio/ogg']
      .find(m => MediaRecorder.isTypeSupported(m)) || '';
    recorder    = new MediaRecorder(stream, mimeType ? { mimeType } : {});
    audioChunks = [];
    recorder.ondataavailable = e => {
      if (e.data && e.data.size > 0) audioChunks.push(e.data);
    };
    recorder.onstop = () => {
      if (!audioChunks.length) return;
      const totalSize = audioChunks.reduce((s, c) => s + c.size, 0);
      if (totalSize < 1000) {
        audioChunks = [];
        if (isCapturing) setTimeout(() => { try { recorder.start(); } catch(_){} }, 100);
        return;
      }
      const blob = new Blob(audioChunks, { type: mimeType || 'audio/webm' });
      audioChunks = [];
      sendChunk(blob);
      if (isCapturing) {
        setTimeout(() => { try { recorder.start(); } catch(_){} }, 100);
      }
    };
    recorder.start();
    isCapturing = true;
    chunkTimer = setInterval(() => {
      if (isCapturing && recorder && recorder.state === 'recording') {
        try { recorder.stop(); } catch(_) {}
      }
    }, cfg.chunkInterval);
  }

  async function sendChunk(blob) {
    if (currentVideo && currentVideo.paused) return;
    try {
      const arrayBuffer = await blob.arrayBuffer();
      const bytes       = new Uint8Array(arrayBuffer);
      let binary        = '';
      for (let i = 0; i < bytes.length; i++) binary += String.fromCharCode(bytes[i]);
      const audioBase64 = btoa(binary);
      chrome.runtime.sendMessage({
        type:       'TRANSCRIBE',
        audioBase64,
        mimeType:   blob.type,
        sourceLang: cfg.sourceLang,
        targetLang: cfg.targetLang,
        backendUrl: cfg.backendUrl
      }, response => {
        if (chrome.runtime.lastError || !response || response.error) return;
        if (response.original && response.original.trim()) {
          showSubtitles(response.original.trim(), response.translated?.trim());
        }
      });
    } catch (err) {
      console.error('[LingoTitles] sendChunk error:', err.message);
    }
  }

  function teardown() {
    isCapturing = false;
    clearInterval(chunkTimer);
    if (recorder && recorder.state !== 'inactive') try { recorder.stop(); } catch(_) {}
    if (audioCtx) try { audioCtx.close(); } catch(_) {}
    recorder = null; audioCtx = null;
    if (subtitleEl) subtitleEl.classList.remove('LingoTitles-visible');
  }

})();

2.4 Create `subtitles.css`

I used position: fixed and z-index: 2147483647 (the maximum) to ensure subtitles always render on top of the video player's own UI, regardless of the site. The fade-in animation uses CSS transitions on opacity and transform.

.lingotitles-overlay {
  position: fixed;
  bottom: 80px;
  left: 50%;
  transform: translateX(-50%) translateY(8px);
  z-index: 2147483647;
  max-width: 80%;
  width: max-content;
  text-align: center;
  pointer-events: none;
  display: flex;
  flex-direction: column;
  align-items: center;
  gap: 5px;
  opacity: 0;
  transition: opacity 0.25s ease, transform 0.25s ease;
}

.lingotitles-overlay.lingotitles-visible {
  opacity: 1;
  transform: translateX(-50%) translateY(0);
}

.lingotitles-original {
  font-family: 'Noto Sans', sans-serif;
  font-size: 16px;
  color: rgba(255, 255, 255, 0.85);
  background: rgba(0, 0, 0, 0.65);
  padding: 3px 14px;
  border-radius: 4px;
}

.lingotitles-text {
  font-family: 'Segoe UI', Arial, sans-serif;
  font-size: 24px;
  font-weight: 700;
  color: #ffffff;
  background: rgba(0, 0, 0, 0.80);
  padding: 6px 20px;
  border-radius: 6px;
  border-bottom: 2.5px solid rgba(99, 220, 255, 0.6);
}

2.5 Create `popup.html` and `popup.js`

The popup provides language dropdowns for source and target, a display mode selector (translated only / dual / original), a chunk interval slider, a save button, and a connection test button.

The popup stores settings in chrome.storage.local and broadcasts UPDATE_SETTINGS to the active tab's content script on save.

popup.html -> code for the popup.

popup.js -> key logic.

const BACKEND_URL = 'https://your-deployed-backend.onrender.com';

$('saveBtn').addEventListener('click', async () => {
  const settings = {
    enabled:       $('enabledToggle').checked,
    sourceLang:    $('sourceLang').value,
    targetLang:    $('targetLang').value,
    displayMode:   $('displayMode').value,
    chunkInterval: parseInt($('chunkRange').value),
    backendUrl:    BACKEND_URL
  };
  await chrome.storage.local.set(settings);
  const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
  chrome.tabs.sendMessage(tab.id, { type: 'UPDATE_SETTINGS', settings }).catch(() => {});
});

$('testBtn').addEventListener('click', async () => {
  const res  = await fetch(`${BACKEND_URL}/health`, { signal: AbortSignal.timeout(4000) });
  const data = await res.json();
  // Show connection status with Groq and Lingo.dev checks
});

Chapter 3 | Deploying the Backend

Going from localhost to a live URL so the extension works without running a local server.

Add a .gitignore that excludes .env and node_modules/
Push your project to GitHub
Create a new project on Railway or Render
Set the root directory to backend/
Add environment variables: GROQ_API_KEY and LINGODOTDEV_API_KEY
Once deployed, copy your live URL and update BACKEND_URL in popup.js
Reload the extension in Chrome (chrome://extensions → reload)

How the Pieces Connect

Step	What happens
1	User enables the extension and selects languages in the popup
2	`content.js` finds the video element and taps its audio via Web Audio API
3	`MediaRecorder` records 3-second chunks (configurable)
4	Each chunk is base64-encoded and sent to `background.js` via `chrome.runtime.sendMessage`
5	`background.js` converts it back to a Blob and POSTs to `/transcribe`
6	Express server sends audio to Groq Whisper (~300ms)
7	Transcribed text goes to Lingo.dev for translation
8	Result returns to `content.js` which renders it in the subtitle overlay

Cost

$0.00 Groq Whisper Large V3 Turbo and Lingo.dev are both free tiers. 99 languages supported with auto-detection.

Acknowledgements

Lingo.dev handles all the translation features here, so adding powerful language detection and translation was a breeze no need to set up anything complicated behind the scenes. You can check out all the source code on GitHub.com..

Keep Building. Keep Shipping.

DEV Community

Building LingoTitles: Real-Time Multilingual Subtitles for Any Video on the Internet

Why I Built This

1. Streaming Platform Auto-Subtitles Are Unreliable

2. Breaking News From Conflict Zones

What Is LingoTitles?

Tech Stack

How It Works

Project Structure

Chapter 1 | Building the Backend

1.1 Install Dependencies

`package.json`

1.2 Create `package-lock.json`

1.3 Create `.env`

1.4 Create `server.js`

Chapter 2 | Building the Chrome Extension

2.1 Create `manifest.json`

2.2 Create `background.js` | The Service Worker

2.3 Create `content.js` | Audio Capture & Subtitle Overlay

2.4 Create `subtitles.css`

2.5 Create `popup.html` and `popup.js`

Chapter 3 | Deploying the Backend

How the Pieces Connect

Cost

Acknowledgements

Top comments (0)

Why I Built This

1. Streaming Platform Auto-Subtitles Are Unreliable

2. Breaking News From Conflict Zones

What Is LingoTitles?

Tech Stack

How It Works

Project Structure

Chapter 1 | Building the Backend

1.1 Install Dependencies

package.json

1.2 Create package-lock.json

1.3 Create .env

1.4 Create server.js

Chapter 2 | Building the Chrome Extension

2.1 Create manifest.json

2.2 Create background.js | The Service Worker

2.3 Create content.js | Audio Capture & Subtitle Overlay

2.4 Create subtitles.css

2.5 Create popup.html and popup.js

Chapter 3 | Deploying the Backend

How the Pieces Connect

Cost

Acknowledgements

`package.json`

1.2 Create `package-lock.json`

1.3 Create `.env`

1.4 Create `server.js`

2.1 Create `manifest.json`

2.2 Create `background.js` | The Service Worker

2.3 Create `content.js` | Audio Capture & Subtitle Overlay

2.4 Create `subtitles.css`

2.5 Create `popup.html` and `popup.js`