Building an AI Dubbing Service: My Development Journey
The Idea
"What if users could upload audio/video files and have AI automatically dub them into different languages?"
This simple concept evolved into a fully functional service deployed in just one day.
What I Built
AI Dubbing Service — A web app that converts audio/video files into multiple languages automatically.
Pipeline:
- Upload audio/video file
- Extract speech (STT) using ElevenLabs Scribe v1
- Translate text using Google Cloud Translation API
- Generate dubbed audio (TTS) using ElevenLabs Multilingual v2
- For videos: Merge original video + dubbed audio using ffmpeg.wasm
- Download the result
Supported Languages: Korean, English, Japanese, Spanish
File Support: MP3, WAV, MP4, WebM (up to 100MB)
The Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 15 (App Router) |
| Language | TypeScript |
| Styling | Tailwind CSS |
| Auth | NextAuth.js + Google OAuth |
| Database | Turso (libSQL) |
| File Storage | Vercel Blob |
| Video Processing | ffmpeg.wasm |
| Deployment | Vercel |
The Development Process
Why One Day Was Possible
The traditional approach would take 2-3 weeks:
- Week 1: Design architecture, set up services
- Week 2: Implement each API integration separately
- Week 3: Build UI, test, debug
What changed: I used Claude Code (AI coding agent) to compress this timeline.
Hour-by-Hour Breakdown
Hour 1: Architecture Design
- Described all requirements
- Claude Code designed the entire Next.js structure
- File routes, API endpoints, component hierarchy — all at once
Hours 2-3: API Integration
- ElevenLabs STT setup
- Google Translate API connection
- ElevenLabs TTS implementation
- All with consistent error handling patterns
Hours 4-5: UI Implementation
- Dark glassmorphism design
- Real-time soundwave animation
- Progress indicators (STT → Translation → TTS)
- Drag-and-drop file upload
- Side-by-side video comparison player
Hours 6-7: Video Processing
- ffmpeg.wasm integration
- Video + audio merging in the browser
- Auto-sync adjustment using playbackRate
- MP4 download functionality
Hours 7-8: Auth & Deployment
- Google OAuth configuration
- Whitelist system with Turso
- Vercel deployment
- Custom domain setup
Key Challenges & Solutions
Challenge 1: Sync Audio to Video
Problem: Dubbed audio might be slightly longer/shorter than original video.
Solution: Calculate playbackRate ratio
const playbackRate = originalVideoDuration / dubbedAudioDuration;
audioElement.playbackRate = playbackRate;
This auto-adjusts dubbed audio speed to match video length perfectly.
Challenge 2: Browser-Side Video Processing
Problem: Video merging usually requires a server with FFmpeg installed.
Solution: Use ffmpeg.wasm to run FFmpeg directly in the browser
- No server needed
- No file uploads to slow servers
- Instant local processing
- Privacy-friendly (files never leave user's device)
Challenge 3: Consistent API Integration
Problem: Multiple external APIs with different error patterns.
Solution: Create unified error handling
- Centralized error types
- Retry logic with exponential backoff
- Consistent response formats
- Better debugging
Challenge 4: Claude Code Communication
Problem: Generic prompts led to mediocre results.
Solution: Be extremely specific
- "Make a file upload" → "Create a /api/upload route that accepts FormData with MP4 files, validates size <100MB, uploads to Vercel Blob, returns fileUrl"
- "Add error handling" → "Catch these specific errors: 'Invalid API key', 'File too large', 'Unsupported format' — return custom messages for each"
What Claude Code Did Right
✅ Architectural coherence — All components fit together perfectly without refactoring
✅ Convention-following — Respected Next.js 15 App Router best practices automatically
✅ Error handling — Implemented defensive code without explicit instruction
✅ Type safety — Generated proper TypeScript interfaces from the start
✅ Debugging speed — When I copy-pasted error messages, Claude Code fixed them instantly
The Real Lesson
The bottleneck wasn't coding. It was communication.
What Didn't Work
"Make it faster"
"Better UI design"
"Handle errors properly"
What Worked
"Create a /api/dub route that:
- Accepts FormData with audio file
- Sends to ElevenLabs Scribe v1 with API key in headers
- Returns JSON: { text, duration, language }
- On error, catch these specific codes: 401, 413, 422
- Return status-appropriate HTTP responses"
Specificity = Speed.
Project Metrics
| Metric | Value |
|---|---|
| Development Time | 8 hours |
| Lines of Code | ~2000 |
| API Integrations | 3 (ElevenLabs, Google, NextAuth) |
| Supported Languages | 4 (Korean, English, Japanese, Spanish) |
| Max File Size | 100MB |
| Deployment Platform | Vercel |
| Time to First Deployment | 8 hours |
What's Next: Lip-Sync
The current version syncs audio length to video. But true cinematic dubbing requires lip-sync — matching mouth movements to audio.
The Challenge:
- Extract phonemes from audio
- Detect facial landmarks in video
- Regenerate mouth shapes to match phonemes
- Merge back into original video
Current Options:
- Wav2Lip (Open-source, free, but slow)
- HeyGen API (Fast, proprietary, paid)
- Sync Labs (REST API, pricing TBD)
This is the next mountain to climb.
Key Takeaways
1. AI agents excel at iteration, not inspiration
- Claude Code couldn't dream up the idea
- But it implemented the dream faster than humanly possible
2. Specificity beats vagueness
- Vague prompts = vague results
- Detailed specs = precise implementation
3. Error-driven development works
- When something breaks, copy-paste the error
- Claude Code diagnoses and fixes instantly
4. One feature at a time
- Requesting 5 features simultaneously causes conflicts
- Sequential requests = cleaner code
5. Always review security code
- AI-generated auth/security code should always be reviewed
- Never deploy untested auth systems
Try It
Live Demo: https://ai-voice-bot-rose.vercel.app
GitHub: https://github.com/hyosikkk/ai-voice-bot
Tech Stack:
- Next.js 15
- ElevenLabs APIs
- Google Cloud Translation
- Turso Database
- Vercel Deployment
The service works. The code is open-source. The journey was unforgettable.
From zero to deployed in one day. With AI as my co-developer.
That's the future of software development.
Top comments (0)