DEV Community

deAPI
deAPI

Posted on

OpenAI TTS Too Expensive? Switch to Kokoro in 2 Lines of Code


If you're using OpenAI's text-to-speech API, you're paying $15 per million characters. Kokoro generates speech that sounds just as natural — for $0.77 per million characters. That's roughly 20x less.

You don't need to rewrite anything. Kokoro runs through an OpenAI-compatible endpoint on deAPI. Change base_url and api_key, keep the rest of your code identical.

This tutorial walks through the switch, covers voice options, and shows what else opens up once you're on deAPI — including voice cloning and voice design from a text description.

What you'll need

  • Python 3.8+ or Node.js 18+
  • The OpenAI SDK (pip install openai or npm install openai)
  • A deAPI account — sign up at app.deapi.ai/dashboard, grab your API key from Settings → API Keys. You get $5 in free credits, no credit card required.

The two-line migration

Here's your existing OpenAI TTS code:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

audio = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="Welcome back to the show. Today we're covering something interesting."
)

with open("output.mp3", "wb") as f:
    f.write(audio.content)
Enter fullscreen mode Exit fullscreen mode

Now here's the deAPI version:

from openai import OpenAI

client = OpenAI(
    api_key="dpn-sk-your-key-here",       # <- line 1
    base_url="https://oai.deapi.ai/v1"    # <- line 2
)

audio = client.audio.speech.create(
    model="Kokoro",
    voice="af_nova",
    input="Welcome back to the show. Today we're covering something interesting."
)

with open("output.mp3", "wb") as f:
    f.write(audio.content)
Enter fullscreen mode Exit fullscreen mode

Two parameters changed. The rest — client.audio.speech.create(), the response object, writing to file — stays identical. If you have a wrapper function around OpenAI TTS, you're looking at a 30-second migration.

The same works in Node.js:

import OpenAI from "openai";
import { writeFileSync } from "fs";

const client = new OpenAI({
  apiKey: "dpn-sk-your-key-here",
  baseURL: "https://oai.deapi.ai/v1",
});

const audio = await client.audio.speech.create({
  model: "Kokoro",
  voice: "af_nova",
  input: "Welcome back to the show. Today we're covering something interesting.",
});

const buffer = Buffer.from(await audio.arrayBuffer());
writeFileSync("output.mp3", buffer);
Enter fullscreen mode Exit fullscreen mode

Picking a voice

Kokoro ships with 40+ voices across seven languages. Here are the ones most OpenAI users will reach for first:

OpenAI voice Kokoro equivalent Gender Accent
alloy af_alloy Male US English
echo am_echo Male US English
fable bm_fable Male British
nova af_nova Female US English
onyx am_onyx Male US English
shimmer bf_lily Female British

Beyond these, Kokoro has voices OpenAI doesn't offer. af_heart is a warm female voice that works well for meditation apps. am_fenrir has a deeper tone suited for trailers and dramatic narration. Spanish, French, Hindi, Italian, and Brazilian Portuguese each have dedicated native speakers — not the same English voice attempting another language.

Output formats

Kokoro supports mp3, wav, flac, and opus. Pass the format through the OpenAI SDK's response_format parameter:

audio = client.audio.speech.create(
    model="Kokoro",
    voice="am_adam",
    input="Testing FLAC output at 24kHz sample rate.",
    response_format="flac"
)
Enter fullscreen mode Exit fullscreen mode

All output runs at 24kHz sample rate — production quality for podcasts and audiobooks.

Going further: Qwen3 TTS

Kokoro handles most developer use cases out of the box. But deAPI also runs Qwen3 TTS, which adds voice cloning and voice design on top.

Voice cloning takes a 10-second audio sample and reproduces that speaker on any text you send. Record once in English, and the cloned voice carries over to nine other languages with the same timbre and pacing.

Voice design works the other way around. Describe what you want in plain text — "British male, 70s, warm baritone, documentary narrator energy" — and the model builds a matching speaker from scratch. Useful for prototyping characters or brand voices before hiring talent.

All three modes run through deAPI's native TTS endpoint. Here's a voice design example:

import requests
import time

API_KEY = "dpn-sk-your-key-here"
BASE = "https://api.deapi.ai/api/v1/client"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Accept": "application/json"}

response = requests.post(f"{BASE}/txt2audio", headers=HEADERS, data={
    "text": "And here, deep in the rainforest, we find a creature unlike any other.",
    "model": "Qwen3_TTS_12Hz_1_7B_VoiceDesign",
    "mode": "voice_design",
    "lang": "English",
    "speed": 1.0,
    "format": "mp3",
    "sample_rate": 24000,
    "instruct": "British male, 70s, warm baritone, measured pace, documentary narrator."
})

request_id = response.json()["data"]["request_id"]

while True:
    result = requests.get(
        f"{BASE}/request-status/{request_id}", headers=HEADERS
    ).json()
    if result["data"]["status"] == "done":
        print(f"Audio: {result['data']['result_url']}")
        break
    if result["data"]["status"] == "error":
        print(f"Error: {result['data']}")
        break
    time.sleep(1)
Enter fullscreen mode Exit fullscreen mode

Qwen3 TTS costs $12.86 per million characters — still cheaper than OpenAI, and it gives you voice cloning that OpenAI doesn't offer at any price.

What it costs

Provider Price per 1M chars Voice cloning Voice design
OpenAI tts-1 $15.00 No No
OpenAI tts-1-hd $30.00 No No
deAPI Kokoro $0.77 No No
deAPI Qwen3 TTS $12.86 Yes Yes

The $5 free credit on deAPI signup covers about 6.5 million characters through Kokoro. For context, the entire Harry Potter series is roughly 6.1 million characters. You could generate the full audiobook before spending a cent of your own money.

Full working script

Here's a complete, copy-paste script that generates speech and saves it to disk:

from openai import OpenAI
from pathlib import Path

client = OpenAI(
    api_key="dpn-sk-your-key-here",
    base_url="https://oai.deapi.ai/v1"
)

text = """
Three things happened this week that changed how I think about voice AI.
First, the cost dropped. Again. Second, the quality caught up with the
premium providers. And third, I realized I'd been overthinking this.
"""

audio = client.audio.speech.create(
    model="Kokoro",
    voice="af_nova",
    input=text.strip(),
    response_format="mp3"
)

output = Path("speech_output.mp3")
output.write_bytes(audio.content)

chars = len(text.strip())
cost = chars * 0.77 / 1_000_000
print(f"Saved to {output}")
print(f"Characters: {chars}")
print(f"Estimated cost: ${cost:.6f}")
Enter fullscreen mode Exit fullscreen mode

Run it:

pip install openai
python tts_demo.py
Enter fullscreen mode Exit fullscreen mode

What's next

The OpenAI SDK compatibility means you can swap one endpoint at a time. TTS first, then transcription or image generation whenever you're ready — same api_key, same base_url.


Built with deAPI — open-source AI models, decentralized GPUs, $5 free credits to start.

Top comments (0)