최효식

Posted on Mar 14

Development of a dubbing service using Claude

#ai #api #showdev #webdev

Building an AI Dubbing Service: My Development Journey

The Idea

"What if users could upload audio/video files and have AI automatically dub them into different languages?"

This simple concept evolved into a fully functional service deployed in just one day.

What I Built

AI Dubbing Service — A web app that converts audio/video files into multiple languages automatically.

Pipeline:

Upload audio/video file
Extract speech (STT) using ElevenLabs Scribe v1
Translate text using Google Cloud Translation API
Generate dubbed audio (TTS) using ElevenLabs Multilingual v2
For videos: Merge original video + dubbed audio using ffmpeg.wasm
Download the result

Supported Languages: Korean, English, Japanese, Spanish

File Support: MP3, WAV, MP4, WebM (up to 100MB)

The Tech Stack

Layer	Technology
Frontend	Next.js 15 (App Router)
Language	TypeScript
Styling	Tailwind CSS
Auth	NextAuth.js + Google OAuth
Database	Turso (libSQL)
File Storage	Vercel Blob
Video Processing	ffmpeg.wasm
Deployment	Vercel

The Development Process

Why One Day Was Possible

The traditional approach would take 2-3 weeks:

Week 1: Design architecture, set up services
Week 2: Implement each API integration separately
Week 3: Build UI, test, debug

What changed: I used Claude Code (AI coding agent) to compress this timeline.

Hour-by-Hour Breakdown

Hour 1: Architecture Design

Described all requirements
Claude Code designed the entire Next.js structure
File routes, API endpoints, component hierarchy — all at once

Hours 2-3: API Integration

ElevenLabs STT setup
Google Translate API connection
ElevenLabs TTS implementation
All with consistent error handling patterns

Hours 4-5: UI Implementation

Dark glassmorphism design
Real-time soundwave animation
Progress indicators (STT → Translation → TTS)
Drag-and-drop file upload
Side-by-side video comparison player

Hours 6-7: Video Processing

ffmpeg.wasm integration
Video + audio merging in the browser
Auto-sync adjustment using playbackRate
MP4 download functionality

Hours 7-8: Auth & Deployment

Google OAuth configuration
Whitelist system with Turso
Vercel deployment
Custom domain setup

Key Challenges & Solutions

Challenge 1: Sync Audio to Video

Problem: Dubbed audio might be slightly longer/shorter than original video.

Solution: Calculate playbackRate ratio

const playbackRate = originalVideoDuration / dubbedAudioDuration;
audioElement.playbackRate = playbackRate;

This auto-adjusts dubbed audio speed to match video length perfectly.

Challenge 2: Browser-Side Video Processing

Problem: Video merging usually requires a server with FFmpeg installed.

Solution: Use ffmpeg.wasm to run FFmpeg directly in the browser

No server needed
No file uploads to slow servers
Instant local processing
Privacy-friendly (files never leave user's device)

Challenge 3: Consistent API Integration

Problem: Multiple external APIs with different error patterns.

Solution: Create unified error handling

Centralized error types
Retry logic with exponential backoff
Consistent response formats
Better debugging

Challenge 4: Claude Code Communication

Problem: Generic prompts led to mediocre results.

Solution: Be extremely specific

"Make a file upload" → "Create a /api/upload route that accepts FormData with MP4 files, validates size <100MB, uploads to Vercel Blob, returns fileUrl"
"Add error handling" → "Catch these specific errors: 'Invalid API key', 'File too large', 'Unsupported format' — return custom messages for each"

What Claude Code Did Right

✅ Architectural coherence — All components fit together perfectly without refactoring

✅ Convention-following — Respected Next.js 15 App Router best practices automatically

✅ Error handling — Implemented defensive code without explicit instruction

✅ Type safety — Generated proper TypeScript interfaces from the start

✅ Debugging speed — When I copy-pasted error messages, Claude Code fixed them instantly

The Real Lesson

The bottleneck wasn't coding. It was communication.

What Didn't Work

"Make it faster"
"Better UI design"
"Handle errors properly"

What Worked

"Create a /api/dub route that:
- Accepts FormData with audio file
- Sends to ElevenLabs Scribe v1 with API key in headers
- Returns JSON: { text, duration, language }
- On error, catch these specific codes: 401, 413, 422
- Return status-appropriate HTTP responses"

Specificity = Speed.

Project Metrics

Metric	Value
Development Time	8 hours
Lines of Code	~2000
API Integrations	3 (ElevenLabs, Google, NextAuth)
Supported Languages	4 (Korean, English, Japanese, Spanish)
Max File Size	100MB
Deployment Platform	Vercel
Time to First Deployment	8 hours

What's Next: Lip-Sync

The current version syncs audio length to video. But true cinematic dubbing requires lip-sync — matching mouth movements to audio.

The Challenge:

Extract phonemes from audio
Detect facial landmarks in video
Regenerate mouth shapes to match phonemes
Merge back into original video

Current Options:

Wav2Lip (Open-source, free, but slow)
HeyGen API (Fast, proprietary, paid)
Sync Labs (REST API, pricing TBD)

This is the next mountain to climb.

Key Takeaways

1. AI agents excel at iteration, not inspiration

Claude Code couldn't dream up the idea
But it implemented the dream faster than humanly possible

2. Specificity beats vagueness

Vague prompts = vague results
Detailed specs = precise implementation

3. Error-driven development works

When something breaks, copy-paste the error
Claude Code diagnoses and fixes instantly

4. One feature at a time

Requesting 5 features simultaneously causes conflicts
Sequential requests = cleaner code

5. Always review security code

AI-generated auth/security code should always be reviewed
Never deploy untested auth systems

Try It

Live Demo: https://ai-voice-bot-rose.vercel.app

GitHub: https://github.com/hyosikkk/ai-voice-bot

Tech Stack:

Next.js 15
ElevenLabs APIs
Google Cloud Translation
Turso Database
Vercel Deployment

The service works. The code is open-source. The journey was unforgettable.

From zero to deployed in one day. With AI as my co-developer.

That's the future of software development.

DEV Community