DEV Community

Shawn
Shawn

Posted on

VoxTube – Convert YouTube videos to audio with local TTS

I built a tool to convert YouTube videos into podcasts

Body

Problem: I kept queuing YouTube tutorials and talks but never watching them. Video demands attention in a way that audio doesn't.

Solution: VoxTube extracts transcripts from YouTube videos and converts them to audio using high-quality TTS.

Now I "watch" YouTube during my commute, while cooking, and during workouts.

Technical details:

  • Built with Bun + Hono (~300 lines)
  • Uses Kokoro TTS (runs locally via Docker)
  • Caches generated audio
  • No cloud dependencies

What I learned:

  • Bun's file APIs are really nice for streaming audio
  • Modern TTS (Kokoro) sounds surprisingly natural
  • Most YouTube videos have transcripts available

Stats:

  • 2 weeks to MVP
  • ~300 lines of code
  • 0 monthly costs (runs locally)

GitHub: https://github.com/shawn-dsz/voxtube

Happy to answer questions about the build!

Top comments (0)