I Built an AI Tool That Dubs Any YouTube Video Into 50+ Languages

Tomáš Dobrý — Sat, 28 Mar 2026 18:04:16 +0000

There are millions of great YouTube videos out there. Tutorials, podcasts, documentaries, lectures. But most of them are locked behind a language barrier.

I built TubeVoice to fix that.

What it does

Paste any YouTube URL, pick a target language, and TubeVoice generates a fully dubbed version in minutes. The original background music and ambient sounds stay intact — only the voice changes.

Try it free: tubevoice.io

How it works

Transcription — AI transcribes the original audio (Whisper)
Translation — Neural machine translation to your chosen language
Voice Synthesis — Natural-sounding TTS generates the dubbed voice
Audio Mixing — AI separates vocals from background using source separation (Demucs), then mixes the new voice with the original background audio

The result sounds surprisingly natural. The timing is preserved, background music plays through, and the dubbed voice matches the original pacing.

Tech Stack

Frontend: Next.js + Tailwind CSS (deployed on Vercel)
Backend: Python + Celery workers (Railway)
Transcription: OpenAI Whisper
Translation: GPT-4o
TTS: Google Chirp3-HD (Basic), ElevenLabs (Standard/Premium)
Audio Separation: Demucs (htdemucs model)
Database: Supabase (PostgreSQL)
Payments: Stripe

Three quality tiers

Tier	Voice Engine	Credits/min	Best for
Basic	Google Chirp3-HD	1	Casual listening
Standard	ElevenLabs Flash	3	Good quality dubbing
Premium	ElevenLabs Dubbing API	6	Professional results

Challenges I faced

Audio sync — Matching dubbed speech timing to the original is hard. Different languages have different word lengths. The TTS engine handles this reasonably well, but it's not perfect.
Background preservation — Using Demucs for source separation was a game-changer. It cleanly separates vocals from music/ambience, so the dubbed version retains the original feel.
Cost optimization — Running Whisper + GPT-4o + TTS + Demucs adds up. I optimized by running Demucs locally on Apple Silicon (MPS) instead of cloud GPU.

What's next

More voice options and voice cloning
Subtitle generation alongside dubbing
API for developers
Mobile app (PWA)

Would love to hear your feedback. What features would make this useful for you?

🔗 tubevoice.io — Free tier available (3 free credits to try it out)