DEV Community

yongha
yongha

Posted on

I Built an AI Audio Dubbing Service Using Claude

"What if I let an AI coding agent build an entire production app from scratch?" — So I tried it.

Instead of building something trivial like a to-do list, I wanted to create something actually useful.

The idea: upload any audio or video file, pick a target language, and get back a fully dubbed MP3.

I called it AgentDub 🎙️


How It Works

The core is a 3-step AI pipeline:

Uploaded File
     │
     ▼
① Speech-to-Text
   ElevenLabs Scribe API
   → Extracts speech from the uploaded file as text
     │
     ▼
② Translation
   Google Gemini 2.5 Flash
   → Translates the text into the target language
     │
     ▼
③ Text-to-Speech
   ElevenLabs TTS (Multilingual v2)
   → Converts translated text back into natural audio
     │
     ▼
Download dubbed MP3
Enter fullscreen mode Exit fullscreen mode

For example: upload a 30-second English voice memo, select Korean, and within a couple of minutes you get a Korean-dubbed MP3 back.


Screenshots


Dashboard — File Upload & Language Selection

Drag and drop your file, pick from 18 languages, hit Generate. The dubbed audio plays directly in the browser. One click to download as MP3.


Docs - Website Description

You can see a brief description of the website on the Docs page next to the Dashboard.


Access Denied Page

Only whitelisted emails can sign in. Non-approved accounts see a friendly error with their blocked email clearly shown and a prompt to switch accounts.


Tech Stack

Area Technology
Framework Next.js 15 (App Router)
Styling Tailwind CSS
Auth NextAuth.js + Google OAuth
Database Turso (libSQL / SQLite)
Translation Google Gemini 2.5 Flash
Voice Processing ElevenLabs Scribe + TTS
Deployment Vercel

How I Used Claude as a Coding Agent

This is the part I actually want to talk about.

I know Next.js reasonably well, but ElevenLabs, Turso, and advanced NextAuth patterns were all new to me. Without the agent, this would have taken days. With it, I shipped in a day.

Here's what that actually looked like in practice.

1. Bootstrapping the Entire Project in One Shot

My first message to Claude was roughly:

"Set up a new Next.js project with App Router, TypeScript, and Tailwind. Install next-auth for Google OAuth and @libsql/client for Turso. Set up a clean navbar with a dark design."

What came back: a complete project scaffold with a navbar, global CSS with design tokens, a SessionProvider wrapper, auth configuration, and a Turso client — all wired together. That's 45 minutes of setup, done instantly.

2. Handling multipart/form-data Correctly

ElevenLabs' STT API requires files as multipart/form-data. The tricky part is receiving a file on the Next.js server side and re-forwarding it to ElevenLabs without corrupting it.

The pattern Claude gave me:

const sttForm = new FormData();
sttForm.append("file", file, file.name);
sttForm.append("model_id", "scribe_v1");

const sttRes = await fetch(`https://api.elevenlabs.io/v1/speech-to-text`, {
  method: "POST",
  headers: { "xi-api-key": ELEVENLABS_API_KEY },
  body: sttForm,
});
Enter fullscreen mode Exit fullscreen mode

Simple in hindsight, but this specific pattern — appending the file with its name, not passing Content-Type explicitly — took time to get right. Claude nailed it first try.

3. Debugging a Vercel Build Failure

After my first deployment, I got this:

Error: Missing env variable: TURSO_AUTH_TOKEN
Build error occurred
Enter fullscreen mode Exit fullscreen mode

The root cause: I was instantiating the Turso client at module load time. Vercel's build phase doesn't inject runtime environment variables, so it threw immediately.

Claude's fix was a Lazy Singleton pattern:

let _db: ReturnType<typeof createClient> | null = null;

function getDb() {
  if (_db) return _db;
  if (!process.env.TURSO_DATABASE_URL) throw new Error("Missing TURSO_DATABASE_URL");
  if (!process.env.TURSO_AUTH_TOKEN) throw new Error("Missing TURSO_AUTH_TOKEN");
  _db = createClient({ url: process.env.TURSO_DATABASE_URL, authToken: process.env.TURSO_AUTH_TOKEN });
  return _db;
}

// Proxy keeps all db.execute() calls working unchanged
export const db = new Proxy({} as ReturnType<typeof createClient>, {
  get(_, prop) { return getDb()[prop as keyof ReturnType<typeof createClient>]; },
});
Enter fullscreen mode Exit fullscreen mode

The Proxy wrapper meant zero changes to existing db.execute() calls elsewhere in the codebase. Clean fix, no refactoring needed.

4. Navigating Deprecated API Versions

I hit three consecutive model errors with Gemini:

  • gemini-1.5-flash404 (retired)
  • gemini-2.0-flash429 (free quota is 0)
  • gemini-1.5-flash-latest404 (also retired)

Instead of spending time trawling through Google's changelog, I just described the error to Claude and got the correct current free-tier model name immediately: gemini-2.5-flash.

Same thing happened with ElevenLabs voice IDs — a hardcoded voice ID started returning 402 because it became a paid-only voice. Claude refactored the TTS step to dynamically call /v1/voices and pick the first available voice from my account:

async function getVoiceId(apiKey: string): Promise<string> {
  const res = await fetch(`https://api.elevenlabs.io/v1/voices`, {
    headers: { "xi-api-key": apiKey },
  });
  const data = await res.json();
  const voices = data?.voices ?? [];
  if (voices.length === 0) throw new Error("No voices in your ElevenLabs account.");
  return voices[0].voice_id;
}
Enter fullscreen mode Exit fullscreen mode

No more hardcoded IDs that silently break.

5. Security Design I Wouldn't Have Thought Of

I needed email whitelist access control. My initial idea was simple: check if the email is in the DB, return false from the NextAuth signIn callback if not.

Claude pushed further:

  • Instead of return false (which shows a generic NextAuth error page), redirect to a custom /denied page
  • Pass the blocked email as a URL parameter so the user sees exactly which account was rejected
  • Add a reason param (not_whitelisted, no_email, db_error) for different error states
  • Wrap the DB call in try/catch so a database failure doesn't crash the entire auth flow
async signIn({ user }) {
  try {
    if (!user.email) return `/denied?reason=no_email`;
    const allowed = await isEmailWhitelisted(user.email);
    if (!allowed) return `/denied?reason=not_whitelisted&email=${encodeURIComponent(user.email)}`;
    return true;
  } catch (err) {
    console.error("[auth] signIn error:", err);
    return `/denied?reason=db_error`;
  }
}
Enter fullscreen mode Exit fullscreen mode

That's a level of defensive design I would have skipped if building alone.


What I Learned About Working with AI Coding Agents

What works really well:

  • Paste the full error message. Don't summarise it — paste it verbatim. The agent can pinpoint exact causes from stack traces and status codes.
  • One concern per request. "Build the UI first, then wire the API in the next message" produces cleaner code than "build everything at once."
  • Let it handle boilerplate. Auth setup, DB schema, environment variable validation — the stuff that's tedious but well-documented is where agents shine.
  • Ask only for changed files. Requesting the full file every time causes unnecessary rewrites. "Only show me what changed" keeps things clean.

Where to stay sharp:

  • Agents can reference deprecated API versions — always verify model names and endpoint paths against the live documentation when you hit a 404.
  • Test everything yourself. The code is usually correct but assumptions about your specific plan or account setup need manual verification.
  • For genuinely novel logic (your actual business logic, edge cases unique to your domain), be more hands-on and review carefully.

Links


If you've been using AI coding agents in your workflow, I'd love to hear how in the comments — especially any tips for keeping the output quality high on larger projects.

Top comments (0)