DEV Community

Cover image for I Built a Real-Time Multilingual Dubbing Platform and Used TestSprite MCP to Test It
Rahul
Rahul

Posted on

I Built a Real-Time Multilingual Dubbing Platform and Used TestSprite MCP to Test It

What if you could speak, and everyone listening heard you in their own language, with no noticeable delay?

That question turned into PolyDub.


What It Does

Three modes:

  • Live Broadcast: one speaker, listeners worldwide, each hearing a dubbed stream in their language
  • Multilingual Rooms: everyone speaks their own language, everyone hears everyone else in theirs
  • VOD Dubbing: upload a video, download a dubbed MP4 with SRT subtitles

The real-time pipeline:

Mic -> WebSocket -> Deepgram Nova-2 (STT) -> Google Translate (~300ms) -> Deepgram Aura-2 (TTS) -> Speaker
Enter fullscreen mode Exit fullscreen mode

Perceived latency is around 1.2 to 1.5 seconds. Fast enough for a real conversation.

Landing Page


A Few Decisions Worth Explaining

Why Google Translate instead of Lingo.dev for real-time? Lingo.dev is LLM-based, which means 5 to 8 seconds of latency. Fine for batch work, not for live speech. Google's gtx endpoint runs at 250 to 350ms warm. Lingo.dev is still in the project, compiling UI strings at build time across 15 locales.

Why Deepgram Aura-2? Aura v1 only shipped English voices regardless of the language param. Aura-2 ships genuinely native-accent voices: Japanese prosody, Spanish regional variation, German intonation. Using an English voice mispronouncing another language defeats the entire product.

Why a per-listener TTS queue? In a room with multiple speakers, audio chunks from different people arrive at the same socket in parallel. Without serialization they interleave into noise. A per-socket promise chain fixes this, and the queue depth is capped at 1 so stale utterances get dropped rather than building an 8-second backlog.


Screenshots

Broadcast setup

Broadcast mode: pick source and target languages, hit Start, share the listener link.

Room view

Rooms: each participant sets their own language and voice. The server handles translation per-person.

VOD studio

VOD: upload a video, pick a language, get a dubbed MP4 and SRT file back.


Testing With TestSprite MCP

The project was built under hackathon pressure. Third-party APIs can fail in specific ways. Frontend validation is easy to break quietly. Writing full test coverage by hand would have eaten most of the remaining build time.

TestSprite MCP plugs into Claude Code as an MCP server. It reads the codebase, generates a test plan, and writes runnable test code. I ran it twice: once for a baseline, and again after a round of fixes.

Backend tests generated (5/5 passing):

Test What it checks
TC001 POST /api/dub with valid file returns { srt, mp3 }
TC002 POST /api/dub with missing params returns 400
TC003 POST /api/dub with broken third-party API returns 500
TC004 POST /api/mux with valid inputs returns video/mp4 stream
TC005 POST /api/mux with missing inputs returns 400

The generated code is more thorough than what you'd write in a hurry. TC001 builds a minimal valid WAV file inline, validates the base64 response actually decodes, and checks the SRT string is non-empty:

mp3_bytes = base64.b64decode(json_data["mp3"], validate=True)
assert len(mp3_bytes) > 0
assert "srt" in json_data and len(json_data["srt"].strip()) > 0
Enter fullscreen mode Exit fullscreen mode

Frontend tests generated (12 cases): broadcast start and validation, room create/join/leave/rejoin, language and voice change in-session, VOD upload validation, and landing-to-mode navigation flows.

What the first run caught:

  1. /api/dub was returning a plain string in some error paths instead of a consistent JSON shape. TC003 found it.
  2. The room ID field was letting through malformed IDs before hitting the server. TC009 found it.

Fixed both, reran, all clean. The dashboard keeps a full run history so you can diff before and after. That is the actual useful part: not a single passing run, but a record of what broke, what changed, and whether the fix held.


Running It

git clone https://github.com/your-username/polydub
cd polydub
pnpm install
cp .env.example .env
# set DEEPGRAM_API_KEY and LINGO_API_KEY in .env

pnpm dev     # terminal 1: Next.js on :3000
pnpm server  # terminal 2: WebSocket server on :8080
Enter fullscreen mode Exit fullscreen mode

Github: https://github.com/crypticsaiyan/PolyDub

Top comments (0)