yongha

Posted on Mar 17

I Built an AI Audio Dubbing Service Using Claude

#agents #ai #showdev #sideprojects

"What if I let an AI coding agent build an entire production app from scratch?" — So I tried it.

Instead of building something trivial like a to-do list, I wanted to create something actually useful.

The idea: upload any audio or video file, pick a target language, and get back a fully dubbed MP3.

I called it AgentDub 🎙️

How It Works

The core is a 3-step AI pipeline:

Uploaded File
     │
     ▼
① Speech-to-Text
   ElevenLabs Scribe API
   → Extracts speech from the uploaded file as text
     │
     ▼
② Translation
   Google Gemini 2.5 Flash
   → Translates the text into the target language
     │
     ▼
③ Text-to-Speech
   ElevenLabs TTS (Multilingual v2)
   → Converts translated text back into natural audio
     │
     ▼
Download dubbed MP3

For example: upload a 30-second English voice memo, select Korean, and within a couple of minutes you get a Korean-dubbed MP3 back.

Screenshots

Dashboard — File Upload & Language Selection

Drag and drop your file, pick from 18 languages, hit Generate. The dubbed audio plays directly in the browser. One click to download as MP3.

Docs - Website Description

You can see a brief description of the website on the Docs page next to the Dashboard.

Access Denied Page

Only whitelisted emails can sign in. Non-approved accounts see a friendly error with their blocked email clearly shown and a prompt to switch accounts.

Tech Stack

Area	Technology
Framework	Next.js 15 (App Router)
Styling	Tailwind CSS
Auth	NextAuth.js + Google OAuth
Database	Turso (libSQL / SQLite)
Translation	Google Gemini 2.5 Flash
Voice Processing	ElevenLabs Scribe + TTS
Deployment	Vercel

How I Used Claude as a Coding Agent

This is the part I actually want to talk about.

I know Next.js reasonably well, but ElevenLabs, Turso, and advanced NextAuth patterns were all new to me. Without the agent, this would have taken days. With it, I shipped in a day.

Here's what that actually looked like in practice.

1. Bootstrapping the Entire Project in One Shot

My first message to Claude was roughly:

"Set up a new Next.js project with App Router, TypeScript, and Tailwind. Install next-auth for Google OAuth and @libsql/client for Turso. Set up a clean navbar with a dark design."

What came back: a complete project scaffold with a navbar, global CSS with design tokens, a SessionProvider wrapper, auth configuration, and a Turso client — all wired together. That's 45 minutes of setup, done instantly.

2. Handling multipart/form-data Correctly

ElevenLabs' STT API requires files as multipart/form-data. The tricky part is receiving a file on the Next.js server side and re-forwarding it to ElevenLabs without corrupting it.

The pattern Claude gave me:

const sttForm = new FormData();
sttForm.append("file", file, file.name);
sttForm.append("model_id", "scribe_v1");

const sttRes = await fetch(`https://api.elevenlabs.io/v1/speech-to-text`, {
  method: "POST",
  headers: { "xi-api-key": ELEVENLABS_API_KEY },
  body: sttForm,
});

Simple in hindsight, but this specific pattern — appending the file with its name, not passing Content-Type explicitly — took time to get right. Claude nailed it first try.

3. Debugging a Vercel Build Failure

After my first deployment, I got this:

Error: Missing env variable: TURSO_AUTH_TOKEN
Build error occurred

The root cause: I was instantiating the Turso client at module load time. Vercel's build phase doesn't inject runtime environment variables, so it threw immediately.

Claude's fix was a Lazy Singleton pattern:

let _db: ReturnType<typeof createClient> | null = null;

function getDb() {
  if (_db) return _db;
  if (!process.env.TURSO_DATABASE_URL) throw new Error("Missing TURSO_DATABASE_URL");
  if (!process.env.TURSO_AUTH_TOKEN) throw new Error("Missing TURSO_AUTH_TOKEN");
  _db = createClient({ url: process.env.TURSO_DATABASE_URL, authToken: process.env.TURSO_AUTH_TOKEN });
  return _db;
}

// Proxy keeps all db.execute() calls working unchanged
export const db = new Proxy({} as ReturnType<typeof createClient>, {
  get(_, prop) { return getDb()[prop as keyof ReturnType<typeof createClient>]; },
});

The Proxy wrapper meant zero changes to existing db.execute() calls elsewhere in the codebase. Clean fix, no refactoring needed.

4. Navigating Deprecated API Versions

I hit three consecutive model errors with Gemini:

gemini-1.5-flash → 404 (retired)
gemini-2.0-flash → 429 (free quota is 0)
gemini-1.5-flash-latest → 404 (also retired)

Instead of spending time trawling through Google's changelog, I just described the error to Claude and got the correct current free-tier model name immediately: gemini-2.5-flash.

Same thing happened with ElevenLabs voice IDs — a hardcoded voice ID started returning 402 because it became a paid-only voice. Claude refactored the TTS step to dynamically call /v1/voices and pick the first available voice from my account:

async function getVoiceId(apiKey: string): Promise<string> {
  const res = await fetch(`https://api.elevenlabs.io/v1/voices`, {
    headers: { "xi-api-key": apiKey },
  });
  const data = await res.json();
  const voices = data?.voices ?? [];
  if (voices.length === 0) throw new Error("No voices in your ElevenLabs account.");
  return voices[0].voice_id;
}

No more hardcoded IDs that silently break.

5. Security Design I Wouldn't Have Thought Of

I needed email whitelist access control. My initial idea was simple: check if the email is in the DB, return false from the NextAuth signIn callback if not.

Claude pushed further:

Instead of return false (which shows a generic NextAuth error page), redirect to a custom /denied page
Pass the blocked email as a URL parameter so the user sees exactly which account was rejected
Add a reason param (not_whitelisted, no_email, db_error) for different error states
Wrap the DB call in try/catch so a database failure doesn't crash the entire auth flow

async signIn({ user }) {
  try {
    if (!user.email) return `/denied?reason=no_email`;
    const allowed = await isEmailWhitelisted(user.email);
    if (!allowed) return `/denied?reason=not_whitelisted&email=${encodeURIComponent(user.email)}`;
    return true;
  } catch (err) {
    console.error("[auth] signIn error:", err);
    return `/denied?reason=db_error`;
  }
}

That's a level of defensive design I would have skipped if building alone.

What I Learned About Working with AI Coding Agents

What works really well:

Paste the full error message. Don't summarise it — paste it verbatim. The agent can pinpoint exact causes from stack traces and status codes.
One concern per request. "Build the UI first, then wire the API in the next message" produces cleaner code than "build everything at once."
Let it handle boilerplate. Auth setup, DB schema, environment variable validation — the stuff that's tedious but well-documented is where agents shine.
Ask only for changed files. Requesting the full file every time causes unnecessary rewrites. "Only show me what changed" keeps things clean.

Where to stay sharp:

Agents can reference deprecated API versions — always verify model names and endpoint paths against the live documentation when you hit a 404.
Test everything yourself. The code is usually correct but assumptions about your specific plan or account setup need manual verification.
For genuinely novel logic (your actual business logic, edge cases unique to your domain), be more hands-on and review carefully.

DEV Community