
If you're using OpenAI's text-to-speech API, you're paying $15 per million characters. Kokoro generates speech that sounds just as natural — for $0.77 per million characters. That's roughly 20x less.
You don't need to rewrite anything. Kokoro runs through an OpenAI-compatible endpoint on deAPI. Change base_url and api_key, keep the rest of your code identical.
This tutorial walks through the switch, covers voice options, and shows what else opens up once you're on deAPI — including voice cloning and voice design from a text description.
What you'll need
- Python 3.8+ or Node.js 18+
- The OpenAI SDK (
pip install openaiornpm install openai) - A deAPI account — sign up at app.deapi.ai/dashboard, grab your API key from Settings → API Keys. You get $5 in free credits, no credit card required.
The two-line migration
Here's your existing OpenAI TTS code:
from openai import OpenAI
client = OpenAI(api_key="sk-...")
audio = client.audio.speech.create(
model="tts-1",
voice="nova",
input="Welcome back to the show. Today we're covering something interesting."
)
with open("output.mp3", "wb") as f:
f.write(audio.content)
Now here's the deAPI version:
from openai import OpenAI
client = OpenAI(
api_key="dpn-sk-your-key-here", # <- line 1
base_url="https://oai.deapi.ai/v1" # <- line 2
)
audio = client.audio.speech.create(
model="Kokoro",
voice="af_nova",
input="Welcome back to the show. Today we're covering something interesting."
)
with open("output.mp3", "wb") as f:
f.write(audio.content)
Two parameters changed. The rest — client.audio.speech.create(), the response object, writing to file — stays identical. If you have a wrapper function around OpenAI TTS, you're looking at a 30-second migration.
The same works in Node.js:
import OpenAI from "openai";
import { writeFileSync } from "fs";
const client = new OpenAI({
apiKey: "dpn-sk-your-key-here",
baseURL: "https://oai.deapi.ai/v1",
});
const audio = await client.audio.speech.create({
model: "Kokoro",
voice: "af_nova",
input: "Welcome back to the show. Today we're covering something interesting.",
});
const buffer = Buffer.from(await audio.arrayBuffer());
writeFileSync("output.mp3", buffer);
Picking a voice
Kokoro ships with 40+ voices across seven languages. Here are the ones most OpenAI users will reach for first:
| OpenAI voice | Kokoro equivalent | Gender | Accent |
|---|---|---|---|
| alloy | af_alloy | Male | US English |
| echo | am_echo | Male | US English |
| fable | bm_fable | Male | British |
| nova | af_nova | Female | US English |
| onyx | am_onyx | Male | US English |
| shimmer | bf_lily | Female | British |
Beyond these, Kokoro has voices OpenAI doesn't offer. af_heart is a warm female voice that works well for meditation apps. am_fenrir has a deeper tone suited for trailers and dramatic narration. Spanish, French, Hindi, Italian, and Brazilian Portuguese each have dedicated native speakers — not the same English voice attempting another language.
Output formats
Kokoro supports mp3, wav, flac, and opus. Pass the format through the OpenAI SDK's response_format parameter:
audio = client.audio.speech.create(
model="Kokoro",
voice="am_adam",
input="Testing FLAC output at 24kHz sample rate.",
response_format="flac"
)
All output runs at 24kHz sample rate — production quality for podcasts and audiobooks.
Going further: Qwen3 TTS
Kokoro handles most developer use cases out of the box. But deAPI also runs Qwen3 TTS, which adds voice cloning and voice design on top.
Voice cloning takes a 10-second audio sample and reproduces that speaker on any text you send. Record once in English, and the cloned voice carries over to nine other languages with the same timbre and pacing.
Voice design works the other way around. Describe what you want in plain text — "British male, 70s, warm baritone, documentary narrator energy" — and the model builds a matching speaker from scratch. Useful for prototyping characters or brand voices before hiring talent.
All three modes run through deAPI's native TTS endpoint. Here's a voice design example:
import requests
import time
API_KEY = "dpn-sk-your-key-here"
BASE = "https://api.deapi.ai/api/v1/client"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Accept": "application/json"}
response = requests.post(f"{BASE}/txt2audio", headers=HEADERS, data={
"text": "And here, deep in the rainforest, we find a creature unlike any other.",
"model": "Qwen3_TTS_12Hz_1_7B_VoiceDesign",
"mode": "voice_design",
"lang": "English",
"speed": 1.0,
"format": "mp3",
"sample_rate": 24000,
"instruct": "British male, 70s, warm baritone, measured pace, documentary narrator."
})
request_id = response.json()["data"]["request_id"]
while True:
result = requests.get(
f"{BASE}/request-status/{request_id}", headers=HEADERS
).json()
if result["data"]["status"] == "done":
print(f"Audio: {result['data']['result_url']}")
break
if result["data"]["status"] == "error":
print(f"Error: {result['data']}")
break
time.sleep(1)
Qwen3 TTS costs $12.86 per million characters — still cheaper than OpenAI, and it gives you voice cloning that OpenAI doesn't offer at any price.
What it costs
| Provider | Price per 1M chars | Voice cloning | Voice design |
|---|---|---|---|
| OpenAI tts-1 | $15.00 | No | No |
| OpenAI tts-1-hd | $30.00 | No | No |
| deAPI Kokoro | $0.77 | No | No |
| deAPI Qwen3 TTS | $12.86 | Yes | Yes |
The $5 free credit on deAPI signup covers about 6.5 million characters through Kokoro. For context, the entire Harry Potter series is roughly 6.1 million characters. You could generate the full audiobook before spending a cent of your own money.
Full working script
Here's a complete, copy-paste script that generates speech and saves it to disk:
from openai import OpenAI
from pathlib import Path
client = OpenAI(
api_key="dpn-sk-your-key-here",
base_url="https://oai.deapi.ai/v1"
)
text = """
Three things happened this week that changed how I think about voice AI.
First, the cost dropped. Again. Second, the quality caught up with the
premium providers. And third, I realized I'd been overthinking this.
"""
audio = client.audio.speech.create(
model="Kokoro",
voice="af_nova",
input=text.strip(),
response_format="mp3"
)
output = Path("speech_output.mp3")
output.write_bytes(audio.content)
chars = len(text.strip())
cost = chars * 0.77 / 1_000_000
print(f"Saved to {output}")
print(f"Characters: {chars}")
print(f"Estimated cost: ${cost:.6f}")
Run it:
pip install openai
python tts_demo.py
What's next
- Full API docs — docs.deapi.ai — all TTS endpoints, parameters, and language codes
- Playground — app.deapi.ai/dashboard — test voices in the browser before writing code
- OpenAI compatibility guide — docs.deapi.ai/openai-compatibility — migration details for image gen, transcription, and embeddings alongside TTS
The OpenAI SDK compatibility means you can swap one endpoint at a time. TTS first, then transcription or image generation whenever you're ready — same api_key, same base_url.
Built with deAPI — open-source AI models, decentralized GPUs, $5 free credits to start.
Top comments (0)