DEV Community

Tomáš Dobrý
Tomáš Dobrý

Posted on

I Built an AI Tool That Dubs Any YouTube Video Into 50+ Languages

There are millions of great YouTube videos out there. Tutorials, podcasts, documentaries, lectures. But most of them are locked behind a language barrier.

I built TubeVoice to fix that.

What it does

Paste any YouTube URL, pick a target language, and TubeVoice generates a fully dubbed version in minutes. The original background music and ambient sounds stay intact — only the voice changes.

Try it free: tubevoice.io

How it works

  1. Transcription — AI transcribes the original audio (Whisper)
  2. Translation — Neural machine translation to your chosen language
  3. Voice Synthesis — Natural-sounding TTS generates the dubbed voice
  4. Audio Mixing — AI separates vocals from background using source separation (Demucs), then mixes the new voice with the original background audio

The result sounds surprisingly natural. The timing is preserved, background music plays through, and the dubbed voice matches the original pacing.

Tech Stack

  • Frontend: Next.js + Tailwind CSS (deployed on Vercel)
  • Backend: Python + Celery workers (Railway)
  • Transcription: OpenAI Whisper
  • Translation: GPT-4o
  • TTS: Google Chirp3-HD (Basic), ElevenLabs (Standard/Premium)
  • Audio Separation: Demucs (htdemucs model)
  • Database: Supabase (PostgreSQL)
  • Payments: Stripe

Three quality tiers

Tier Voice Engine Credits/min Best for
Basic Google Chirp3-HD 1 Casual listening
Standard ElevenLabs Flash 3 Good quality dubbing
Premium ElevenLabs Dubbing API 6 Professional results

Challenges I faced

  • Audio sync — Matching dubbed speech timing to the original is hard. Different languages have different word lengths. The TTS engine handles this reasonably well, but it's not perfect.
  • Background preservation — Using Demucs for source separation was a game-changer. It cleanly separates vocals from music/ambience, so the dubbed version retains the original feel.
  • Cost optimization — Running Whisper + GPT-4o + TTS + Demucs adds up. I optimized by running Demucs locally on Apple Silicon (MPS) instead of cloud GPU.

What's next

  • More voice options and voice cloning
  • Subtitle generation alongside dubbing
  • API for developers
  • Mobile app (PWA)

Would love to hear your feedback. What features would make this useful for you?

🔗 tubevoice.io — Free tier available (3 free credits to try it out)

Top comments (0)