<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tomáš Dobrý</title>
    <description>The latest articles on DEV Community by Tomáš Dobrý (@tubevoice).</description>
    <link>https://dev.to/tubevoice</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3848095%2F233e8039-cc0c-4493-9866-13804e1bc480.png</url>
      <title>DEV Community: Tomáš Dobrý</title>
      <link>https://dev.to/tubevoice</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tubevoice"/>
    <language>en</language>
    <item>
      <title>I Built an AI Tool That Dubs Any YouTube Video Into 50+ Languages</title>
      <dc:creator>Tomáš Dobrý</dc:creator>
      <pubDate>Sat, 28 Mar 2026 18:04:16 +0000</pubDate>
      <link>https://dev.to/tubevoice/i-built-an-ai-tool-that-dubs-any-youtube-video-into-50-languages-3bd1</link>
      <guid>https://dev.to/tubevoice/i-built-an-ai-tool-that-dubs-any-youtube-video-into-50-languages-3bd1</guid>
      <description>&lt;p&gt;There are millions of great YouTube videos out there. Tutorials, podcasts, documentaries, lectures. But most of them are locked behind a language barrier.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;TubeVoice&lt;/strong&gt; to fix that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Paste any YouTube URL, pick a target language, and TubeVoice generates a fully dubbed version in minutes. The original background music and ambient sounds stay intact — only the voice changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it free:&lt;/strong&gt; &lt;a href="https://tubevoice.io" rel="noopener noreferrer"&gt;tubevoice.io&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Transcription&lt;/strong&gt; — AI transcribes the original audio (Whisper)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translation&lt;/strong&gt; — Neural machine translation to your chosen language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice Synthesis&lt;/strong&gt; — Natural-sounding TTS generates the dubbed voice&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio Mixing&lt;/strong&gt; — AI separates vocals from background using source separation (Demucs), then mixes the new voice with the original background audio&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result sounds surprisingly natural. The timing is preserved, background music plays through, and the dubbed voice matches the original pacing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Next.js + Tailwind CSS (deployed on Vercel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Python + Celery workers (Railway)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transcription:&lt;/strong&gt; OpenAI Whisper&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translation:&lt;/strong&gt; GPT-4o&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TTS:&lt;/strong&gt; Google Chirp3-HD (Basic), ElevenLabs (Standard/Premium)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio Separation:&lt;/strong&gt; Demucs (htdemucs model)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; Supabase (PostgreSQL)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payments:&lt;/strong&gt; Stripe&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Three quality tiers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Voice Engine&lt;/th&gt;
&lt;th&gt;Credits/min&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Google Chirp3-HD&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Casual listening&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;ElevenLabs Flash&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Good quality dubbing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium&lt;/td&gt;
&lt;td&gt;ElevenLabs Dubbing API&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Professional results&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Challenges I faced
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Audio sync&lt;/strong&gt; — Matching dubbed speech timing to the original is hard. Different languages have different word lengths. The TTS engine handles this reasonably well, but it's not perfect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background preservation&lt;/strong&gt; — Using Demucs for source separation was a game-changer. It cleanly separates vocals from music/ambience, so the dubbed version retains the original feel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization&lt;/strong&gt; — Running Whisper + GPT-4o + TTS + Demucs adds up. I optimized by running Demucs locally on Apple Silicon (MPS) instead of cloud GPU.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;More voice options and voice cloning&lt;/li&gt;
&lt;li&gt;Subtitle generation alongside dubbing&lt;/li&gt;
&lt;li&gt;API for developers&lt;/li&gt;
&lt;li&gt;Mobile app (PWA)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Would love to hear your feedback. What features would make this useful for you?&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://tubevoice.io" rel="noopener noreferrer"&gt;tubevoice.io&lt;/a&gt;&lt;/strong&gt; — Free tier available (3 free credits to try it out)&lt;/p&gt;

</description>
      <category>showdev</category>
    </item>
  </channel>
</rss>
