I built a tool to convert YouTube videos into podcasts
Body
Problem: I kept queuing YouTube tutorials and talks but never watching them. Video demands attention in a way that audio doesn't.
Solution: VoxTube extracts transcripts from YouTube videos and converts them to audio using high-quality TTS.
Now I "watch" YouTube during my commute, while cooking, and during workouts.
Technical details:
- Built with Bun + Hono (~300 lines)
- Uses Kokoro TTS (runs locally via Docker)
- Caches generated audio
- No cloud dependencies
What I learned:
- Bun's file APIs are really nice for streaming audio
- Modern TTS (Kokoro) sounds surprisingly natural
- Most YouTube videos have transcripts available
Stats:
- 2 weeks to MVP
- ~300 lines of code
- 0 monthly costs (runs locally)
GitHub: https://github.com/shawn-dsz/voxtube
Happy to answer questions about the build!
Top comments (0)