DEV Community

Cover image for A practical walkthrough of building a real-time multilingual experience in a Next.js app
Rahul
Rahul

Posted on

A practical walkthrough of building a real-time multilingual experience in a Next.js app

TL;DR: I built PolyDub, a real‑time multilingual video dubbing app. This post shows the practical pieces that make it work: UX, WebSockets, streaming STT → translate → TTS, and automated UI i18n. You can reuse the same approach for any React app.

The use case (why this matters)

Ever tried hosting a live webinar for a global audience? You’re speaking English, half your audience prefers Spanish, and everyone else is quietly struggling.

PolyDub turns that into: speak once, listeners hear you in their language. It works for:

  • Live broadcasts (one‑to‑many)
  • Multilingual meetings (many‑to‑many)
  • Demos, classes, and community events

The core idea:

At a high level, PolyDub is just a fast loop:

Audio In → Speech‑to‑Text → Translate → Text‑to‑Speech → Audio Out
Enter fullscreen mode Exit fullscreen mode

Step 1: Build a UI that’s ready for multiple languages

I started with a Next.js app and a UI built from small components (buttons, selects, transcript panels). The key is keeping copy centralized so it can be extracted and translated.

Tip: Avoid hard‑coding strings deep in components. Make them easy to collect.

Landing Page

Step 2: Automate UI translation

Manually translating UI strings is a time sink. I used Lingo.dev to automate extraction and generation of locale files.

What that gets you:

  • Automatic string extraction from React components
  • Versionable JSON locale files
  • One build step to update all languages

Example flow

  • Write UI in English
  • Run build
  • Locale files are generated
  • UI is instantly multilingual

Step 3: Stream audio and translate in real time

For live audio, I used WebSockets and a Node server to keep latency low. The server:

  1. Receives audio chunks from the speaker
  2. Runs speech‑to‑text (Deepgram STT)
  3. Translates text (Lingo.dev SDK)
  4. Generates speech (Deepgram TTS)
  5. Streams the audio to listeners

Diagram (simple + memorable)

Browser → WS Server → STT → Translate → TTS → WS → Browser
Enter fullscreen mode Exit fullscreen mode

Step 4: Keep it human‑sounding

Synthetic voices can feel robotic, so I used Deepgram Aura voices for more natural delivery. This makes a huge difference for engagement.

Tip: Let users pick voices per language. It adds personality and makes the app feel premium.

Step 5: Add transcripts for trust

People trust systems more when they can see what they’re hearing. I show:

  • Source transcript (what was said)
  • Target transcript (what was translated)

This doubles as accessibility and debugging during live sessions.

The architecture

  • Next.js frontend for the UI
  • WebSocket server for streaming audio
  • Deepgram for STT + TTS
  • Lingo.dev for translations + UI i18n

Rooms

What you can copy into your own app

You don’t need to build a full dubbing platform. You can still:

  • Add multilingual UI in one build step
  • Show real‑time translated captions
  • Offer a translated audio track for live events

Wrap‑up

The goal isn’t to impress with AI — it’s to remove friction. When people can understand you instantly, you unlock a much bigger audience.

Github: https://github.com/crypticsaiyan/Polydub

Top comments (0)