deAPI

Posted on May 29

OpenAI TTS Too Expensive? Switch to Kokoro in 2 Lines of Code

#ai #openai #deapi #api

If you're using OpenAI's text-to-speech API, you're paying $15 per million characters. Kokoro generates speech that sounds just as natural — for $0.77 per million characters. That's roughly 20x less.

You don't need to rewrite anything. Kokoro runs through an OpenAI-compatible endpoint on deAPI. Change base_url and api_key, keep the rest of your code identical.

This tutorial walks through the switch, covers voice options, and shows what else opens up once you're on deAPI — including voice cloning and voice design from a text description.

What you'll need

Python 3.8+ or Node.js 18+
The OpenAI SDK (pip install openai or npm install openai)
A deAPI account — sign up at app.deapi.ai/dashboard, grab your API key from Settings → API Keys. You get $5 in free credits, no credit card required.

The two-line migration

Here's your existing OpenAI TTS code:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

audio = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="Welcome back to the show. Today we're covering something interesting."
)

with open("output.mp3", "wb") as f:
    f.write(audio.content)

Now here's the deAPI version:

from openai import OpenAI

client = OpenAI(
    api_key="dpn-sk-your-key-here",       # <- line 1
    base_url="https://oai.deapi.ai/v1"    # <- line 2
)

audio = client.audio.speech.create(
    model="Kokoro",
    voice="af_nova",
    input="Welcome back to the show. Today we're covering something interesting."
)

with open("output.mp3", "wb") as f:
    f.write(audio.content)

Two parameters changed. The rest — client.audio.speech.create(), the response object, writing to file — stays identical. If you have a wrapper function around OpenAI TTS, you're looking at a 30-second migration.

The same works in Node.js:

import OpenAI from "openai";
import { writeFileSync } from "fs";

const client = new OpenAI({
  apiKey: "dpn-sk-your-key-here",
  baseURL: "https://oai.deapi.ai/v1",
});

const audio = await client.audio.speech.create({
  model: "Kokoro",
  voice: "af_nova",
  input: "Welcome back to the show. Today we're covering something interesting.",
});

const buffer = Buffer.from(await audio.arrayBuffer());
writeFileSync("output.mp3", buffer);

Picking a voice

Kokoro ships with 40+ voices across seven languages. Here are the ones most OpenAI users will reach for first:

OpenAI voice	Kokoro equivalent	Gender	Accent
alloy	af_alloy	Male	US English
echo	am_echo	Male	US English
fable	bm_fable	Male	British
nova	af_nova	Female	US English
onyx	am_onyx	Male	US English
shimmer	bf_lily	Female	British

Beyond these, Kokoro has voices OpenAI doesn't offer. af_heart is a warm female voice that works well for meditation apps. am_fenrir has a deeper tone suited for trailers and dramatic narration. Spanish, French, Hindi, Italian, and Brazilian Portuguese each have dedicated native speakers — not the same English voice attempting another language.

Output formats

Kokoro supports mp3, wav, flac, and opus. Pass the format through the OpenAI SDK's response_format parameter:

audio = client.audio.speech.create(
    model="Kokoro",
    voice="am_adam",
    input="Testing FLAC output at 24kHz sample rate.",
    response_format="flac"
)

All output runs at 24kHz sample rate — production quality for podcasts and audiobooks.

Going further: Qwen3 TTS

Kokoro handles most developer use cases out of the box. But deAPI also runs Qwen3 TTS, which adds voice cloning and voice design on top.

Voice cloning takes a 10-second audio sample and reproduces that speaker on any text you send. Record once in English, and the cloned voice carries over to nine other languages with the same timbre and pacing.

Voice design works the other way around. Describe what you want in plain text — "British male, 70s, warm baritone, documentary narrator energy" — and the model builds a matching speaker from scratch. Useful for prototyping characters or brand voices before hiring talent.

All three modes run through deAPI's native TTS endpoint. Here's a voice design example:

import requests
import time

API_KEY = "dpn-sk-your-key-here"
BASE = "https://api.deapi.ai/api/v1/client"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Accept": "application/json"}

response = requests.post(f"{BASE}/txt2audio", headers=HEADERS, data={
    "text": "And here, deep in the rainforest, we find a creature unlike any other.",
    "model": "Qwen3_TTS_12Hz_1_7B_VoiceDesign",
    "mode": "voice_design",
    "lang": "English",
    "speed": 1.0,
    "format": "mp3",
    "sample_rate": 24000,
    "instruct": "British male, 70s, warm baritone, measured pace, documentary narrator."
})

request_id = response.json()["data"]["request_id"]

while True:
    result = requests.get(
        f"{BASE}/request-status/{request_id}", headers=HEADERS
    ).json()
    if result["data"]["status"] == "done":
        print(f"Audio: {result['data']['result_url']}")
        break
    if result["data"]["status"] == "error":
        print(f"Error: {result['data']}")
        break
    time.sleep(1)

Qwen3 TTS costs $12.86 per million characters — still cheaper than OpenAI, and it gives you voice cloning that OpenAI doesn't offer at any price.

What it costs

Provider	Price per 1M chars	Voice cloning	Voice design
OpenAI tts-1	$15.00	No	No
OpenAI tts-1-hd	$30.00	No	No
deAPI Kokoro	$0.77	No	No
deAPI Qwen3 TTS	$12.86	Yes	Yes

The $5 free credit on deAPI signup covers about 6.5 million characters through Kokoro. For context, the entire Harry Potter series is roughly 6.1 million characters. You could generate the full audiobook before spending a cent of your own money.

Full working script

Here's a complete, copy-paste script that generates speech and saves it to disk:

from openai import OpenAI
from pathlib import Path

client = OpenAI(
    api_key="dpn-sk-your-key-here",
    base_url="https://oai.deapi.ai/v1"
)

text = """
Three things happened this week that changed how I think about voice AI.
First, the cost dropped. Again. Second, the quality caught up with the
premium providers. And third, I realized I'd been overthinking this.
"""

audio = client.audio.speech.create(
    model="Kokoro",
    voice="af_nova",
    input=text.strip(),
    response_format="mp3"
)

output = Path("speech_output.mp3")
output.write_bytes(audio.content)

chars = len(text.strip())
cost = chars * 0.77 / 1_000_000
print(f"Saved to {output}")
print(f"Characters: {chars}")
print(f"Estimated cost: ${cost:.6f}")

Run it:

pip install openai
python tts_demo.py

What's next

Full API docs — docs.deapi.ai — all TTS endpoints, parameters, and language codes
Playground — app.deapi.ai/dashboard — test voices in the browser before writing code
OpenAI compatibility guide — docs.deapi.ai/openai-compatibility — migration details for image gen, transcription, and embeddings alongside TTS

The OpenAI SDK compatibility means you can swap one endpoint at a time. TTS first, then transcription or image generation whenever you're ready — same api_key, same base_url.

Built with deAPI — open-source AI models, decentralized GPUs, $5 free credits to start.

DEV Community