DEV Community

최효식
최효식

Posted on

Development of a dubbing service using Claude

Building an AI Dubbing Service: My Development Journey

The Idea

"What if users could upload audio/video files and have AI automatically dub them into different languages?"

This simple concept evolved into a fully functional service deployed in just one day.


What I Built

AI Dubbing Service — A web app that converts audio/video files into multiple languages automatically.

Pipeline:

  1. Upload audio/video file
  2. Extract speech (STT) using ElevenLabs Scribe v1
  3. Translate text using Google Cloud Translation API
  4. Generate dubbed audio (TTS) using ElevenLabs Multilingual v2
  5. For videos: Merge original video + dubbed audio using ffmpeg.wasm
  6. Download the result

Supported Languages: Korean, English, Japanese, Spanish

File Support: MP3, WAV, MP4, WebM (up to 100MB)


The Tech Stack

Layer Technology
Frontend Next.js 15 (App Router)
Language TypeScript
Styling Tailwind CSS
Auth NextAuth.js + Google OAuth
Database Turso (libSQL)
File Storage Vercel Blob
Video Processing ffmpeg.wasm
Deployment Vercel

The Development Process

Why One Day Was Possible

The traditional approach would take 2-3 weeks:

  • Week 1: Design architecture, set up services
  • Week 2: Implement each API integration separately
  • Week 3: Build UI, test, debug

What changed: I used Claude Code (AI coding agent) to compress this timeline.

Hour-by-Hour Breakdown

Hour 1: Architecture Design

  • Described all requirements
  • Claude Code designed the entire Next.js structure
  • File routes, API endpoints, component hierarchy — all at once

Hours 2-3: API Integration

  • ElevenLabs STT setup
  • Google Translate API connection
  • ElevenLabs TTS implementation
  • All with consistent error handling patterns

Hours 4-5: UI Implementation

  • Dark glassmorphism design
  • Real-time soundwave animation
  • Progress indicators (STT → Translation → TTS)
  • Drag-and-drop file upload
  • Side-by-side video comparison player

Hours 6-7: Video Processing

  • ffmpeg.wasm integration
  • Video + audio merging in the browser
  • Auto-sync adjustment using playbackRate
  • MP4 download functionality

Hours 7-8: Auth & Deployment

  • Google OAuth configuration
  • Whitelist system with Turso
  • Vercel deployment
  • Custom domain setup

Key Challenges & Solutions

Challenge 1: Sync Audio to Video

Problem: Dubbed audio might be slightly longer/shorter than original video.

Solution: Calculate playbackRate ratio

const playbackRate = originalVideoDuration / dubbedAudioDuration;
audioElement.playbackRate = playbackRate;
Enter fullscreen mode Exit fullscreen mode

This auto-adjusts dubbed audio speed to match video length perfectly.

Challenge 2: Browser-Side Video Processing

Problem: Video merging usually requires a server with FFmpeg installed.

Solution: Use ffmpeg.wasm to run FFmpeg directly in the browser

  • No server needed
  • No file uploads to slow servers
  • Instant local processing
  • Privacy-friendly (files never leave user's device)

Challenge 3: Consistent API Integration

Problem: Multiple external APIs with different error patterns.

Solution: Create unified error handling

  • Centralized error types
  • Retry logic with exponential backoff
  • Consistent response formats
  • Better debugging

Challenge 4: Claude Code Communication

Problem: Generic prompts led to mediocre results.

Solution: Be extremely specific

  • "Make a file upload" → "Create a /api/upload route that accepts FormData with MP4 files, validates size <100MB, uploads to Vercel Blob, returns fileUrl"
  • "Add error handling" → "Catch these specific errors: 'Invalid API key', 'File too large', 'Unsupported format' — return custom messages for each"

What Claude Code Did Right

Architectural coherence — All components fit together perfectly without refactoring

Convention-following — Respected Next.js 15 App Router best practices automatically

Error handling — Implemented defensive code without explicit instruction

Type safety — Generated proper TypeScript interfaces from the start

Debugging speed — When I copy-pasted error messages, Claude Code fixed them instantly


The Real Lesson

The bottleneck wasn't coding. It was communication.

What Didn't Work

"Make it faster"
"Better UI design"
"Handle errors properly"
Enter fullscreen mode Exit fullscreen mode

What Worked

"Create a /api/dub route that:
- Accepts FormData with audio file
- Sends to ElevenLabs Scribe v1 with API key in headers
- Returns JSON: { text, duration, language }
- On error, catch these specific codes: 401, 413, 422
- Return status-appropriate HTTP responses"
Enter fullscreen mode Exit fullscreen mode

Specificity = Speed.


Project Metrics

Metric Value
Development Time 8 hours
Lines of Code ~2000
API Integrations 3 (ElevenLabs, Google, NextAuth)
Supported Languages 4 (Korean, English, Japanese, Spanish)
Max File Size 100MB
Deployment Platform Vercel
Time to First Deployment 8 hours

What's Next: Lip-Sync

The current version syncs audio length to video. But true cinematic dubbing requires lip-sync — matching mouth movements to audio.

The Challenge:

  • Extract phonemes from audio
  • Detect facial landmarks in video
  • Regenerate mouth shapes to match phonemes
  • Merge back into original video

Current Options:

  1. Wav2Lip (Open-source, free, but slow)
  2. HeyGen API (Fast, proprietary, paid)
  3. Sync Labs (REST API, pricing TBD)

This is the next mountain to climb.


Key Takeaways

1. AI agents excel at iteration, not inspiration

  • Claude Code couldn't dream up the idea
  • But it implemented the dream faster than humanly possible

2. Specificity beats vagueness

  • Vague prompts = vague results
  • Detailed specs = precise implementation

3. Error-driven development works

  • When something breaks, copy-paste the error
  • Claude Code diagnoses and fixes instantly

4. One feature at a time

  • Requesting 5 features simultaneously causes conflicts
  • Sequential requests = cleaner code

5. Always review security code

  • AI-generated auth/security code should always be reviewed
  • Never deploy untested auth systems

Try It

Live Demo: https://ai-voice-bot-rose.vercel.app

GitHub: https://github.com/hyosikkk/ai-voice-bot

Tech Stack:

  • Next.js 15
  • ElevenLabs APIs
  • Google Cloud Translation
  • Turso Database
  • Vercel Deployment

The service works. The code is open-source. The journey was unforgettable.

From zero to deployed in one day. With AI as my co-developer.

That's the future of software development.

Top comments (0)