LeanVox has a dialogue endpoint that renders a full multi-speaker conversation to a single MP3 in one API call. Here's how to use it.
I've been building a tool that generates short audio summaries of articles. Single-voice narration works fine, but when I added an interview format, I ran into the obvious problem: how do you make two characters sound like two different people?
The naive approach is to call the TTS API twice, stitch the audio with FFmpeg, add silence between lines. It works. It's a massive pain to maintain.
LeanVox has a /dialogue endpoint that handles all of this in one call.
The dialogue endpoint
POST /v1/tts/dialogue:
{
"model": "pro",
"lines": [
{
"text": "Welcome back to the show. Today we're talking about API pricing.",
"voice": "emma",
"language": "en"
},
{
"text": "Thanks for having me. I have a lot of opinions about this.",
"voice": "james",
"language": "en",
"exaggeration": 0.6
}
],
"gap_ms": 600
}
Each line gets its own voice. gap_ms controls silence between speakers (400–700ms feels natural). It comes back as one MP3.
A minimal working example
import requests
API_KEY = "lv_live_your_key_here"
dialogue = {
"model": "pro",
"gap_ms": 500,
"lines": [
{"text": "Hey everyone, welcome to Developer Office Hours.", "voice": "emma", "language": "en"},
{"text": "And I'm the guest who actually knows what they're talking about.", "voice": "james", "language": "en", "exaggeration": 0.65},
{"text": "Bold claim. We'll see about that. [laugh]", "voice": "emma", "language": "en", "exaggeration": 0.5},
{"text": "Today we're covering rate limiting.", "voice": "james", "language": "en"}
]
}
resp = requests.post(
"https://api.leanvox.com/v1/tts/dialogue",
headers={"Authorization": f"Bearer {API_KEY}"},
json=dialogue, timeout=60
)
data = resp.json()
audio = requests.get(data["audio_url"]).content
with open("podcast_intro.mp3", "wb") as f:
f.write(audio)
One HTTP call, one MP3.
Voice cloning for consistent characters
Want consistent, recognizable voices across episodes? Upload a 10–30 second voice sample and use the returned voice ID:
resp = requests.post(
"https://api.leanvox.com/v1/tts/dialogue",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "pro", "gap_ms": 550,
"lines": [
{"text": "So what actually broke in production?", "voice": "my_host_alice", "language": "en"},
{"text": "Honestly? A missing semicolon in a config file.", "voice": "guest_bob", "language": "en", "exaggeration": 0.7},
{"text": "No.", "voice": "my_host_alice", "language": "en", "exaggeration": 0.8},
{"text": "I wish I was joking.", "voice": "guest_bob", "language": "en"}
]
}
)
Multilingual dialogue
Each line can have a different language:
{
"lines": [
{"text": "What do you think about the new feature?", "voice": "af_heart", "language": "en"},
{"text": "Me parece muy bien. Es mucho más rápido.", "voice": "ef_dora", "language": "es"},
{"text": "Glad to hear it.", "voice": "af_heart", "language": "en"}
]
}
What this costs
Billed by total character count across all lines:
- Standard: $0.005/1K chars — ~$0.004 per episode intro
- Pro (with cloning): $0.01/1K chars — ~$0.008 per episode intro
- 1,000 episodes/month at 5,000 chars each: $25–$50/month
- Same on ElevenLabs: $825–$1,100/month
Where this fits
- AI-generated podcasts: LLM writes script → dialogue endpoint → publish MP3
- Audio docs: Changelogs in host+engineer format
- Language learning: Multi-language conversation practice
- Interactive audiobooks: Characters with distinct voices
→ Sign up at leanvox.com — $0.50 free credits, no card required
Originally published at leanvox.com/blog
Top comments (0)