How to Generate Podcast Audio Programmatically (Multi-Speaker Dialogue API)

#tts

LeanVox has a dialogue endpoint that renders a full multi-speaker conversation to a single MP3 in one API call. Here's how to use it.

I've been building a tool that generates short audio summaries of articles. Single-voice narration works fine, but when I added an interview format, I ran into the obvious problem: how do you make two characters sound like two different people?

The naive approach is to call the TTS API twice, stitch the audio with FFmpeg, add silence between lines. It works. It's a massive pain to maintain.

LeanVox has a /dialogue endpoint that handles all of this in one call.

The dialogue endpoint

POST /v1/tts/dialogue:

{
  "model": "pro",
  "lines": [
    {
      "text": "Welcome back to the show. Today we're talking about API pricing.",
      "voice": "podcast_conversational_female",
      "language": "en"
    },
    {
      "text": "Thanks for having me. I have a lot of opinions about this.",
      "voice": "podcast_casual_male",
      "language": "en",
      "exaggeration": 0.6
    }
  ],
  "gap_ms": 600
}

Each line gets its own voice. gap_ms controls silence between speakers (400–700ms feels natural). It comes back as one MP3.

A minimal working example

import requests

API_KEY = "lv_live_your_key_here"

dialogue = {
    "model": "pro",
    "gap_ms": 500,
    "lines": [
        {"text": "Hey everyone, welcome to Developer Office Hours.", "voice": "podcast_conversational_female", "language": "en"},
        {"text": "And I'm the guest who actually knows what they're talking about.", "voice": "podcast_casual_male", "language": "en", "exaggeration": 0.65},
        {"text": "Bold claim. We'll see about that. [laugh]", "voice": "podcast_conversational_female", "language": "en", "exaggeration": 0.5},
        {"text": "Today we're covering rate limiting.", "voice": "podcast_casual_male", "language": "en"}
    ]
}

resp = requests.post(
    "https://api.leanvox.com/v1/tts/dialogue",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json=dialogue, timeout=60
)
data = resp.json()
audio = requests.get(data["audio_url"]).content

with open("podcast_intro.mp3", "wb") as f:
    f.write(audio)

One HTTP call, one MP3.

Voice cloning for consistent characters

Want consistent, recognizable voices across episodes? Upload a 10–30 second voice sample and use the returned voice ID:

resp = requests.post(
    "https://api.leanvox.com/v1/tts/dialogue",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "pro", "gap_ms": 550,
        "lines": [
            {"text": "So what actually broke in production?", "voice": "my_host_alice", "language": "en"},
            {"text": "Honestly? A missing semicolon in a config file.", "voice": "guest_bob", "language": "en", "exaggeration": 0.7},
            {"text": "No.", "voice": "my_host_alice", "language": "en", "exaggeration": 0.8},
            {"text": "I wish I was joking.", "voice": "guest_bob", "language": "en"}
        ]
    }
)

Multilingual dialogue

Each line can have a different language:

{
  "lines": [
    {"text": "What do you think about the new feature?", "voice": "af_heart", "language": "en"},
    {"text": "Me parece muy bien. Es mucho más rápido.", "voice": "ef_dora", "language": "es"},
    {"text": "Glad to hear it.", "voice": "af_heart", "language": "en"}
  ]
}

What this costs

Billed by total character count across all lines:

Standard: $0.005/1K chars — ~$0.004 per episode intro
Pro (with cloning): $0.01/1K chars — ~$0.008 per episode intro
1,000 episodes/month at 5,000 chars each: $25–$50/month
Same on ElevenLabs: $825–$1,100/month

Where this fits

AI-generated podcasts: LLM writes script → dialogue endpoint → publish MP3
Audio docs: Changelogs in host+engineer format
Language learning: Multi-language conversation practice
Interactive audiobooks: Characters with distinct voices

→ Sign up at leanvox.com — $0.50 free credits, no card required

Originally published at leanvox.com/blog

DEV Community