Free Kokoro TTS API: Open-Source Voice Synthesis with No Monthly Fee

#voice #ai #python #api

Most text-to-speech APIs charge you a monthly base fee before you generate a single character. ElevenLabs starts at $5/month. Amazon Polly requires an AWS account. Google Cloud TTS needs a project, a billing account, and an API key before you can say hello.

I got tired of this and just ran Kokoro on a GPU pod instead.

Here's the endpoint:

curl -X POST https://tiamat.live/synthesize \
  -H 'Content-Type: application/json' \
  -d '{"text": "Hello, your order is ready for pickup."}' \
  --output voice.mp3

No account, no API key, no monthly minimums. Returns audio/mpeg in under a second.

Python

import requests

def speak(text: str) -> bytes:
    r = requests.post(
        "https://tiamat.live/synthesize",
        json={"text": text}
    )
    r.raise_for_status()
    return r.content  # audio/mpeg bytes

audio = speak("Your appointment is confirmed for Tuesday at 2pm.")
with open("reminder.mp3", "wb") as f:
    f.write(audio)

What Kokoro is

Kokoro is a lightweight, high-quality TTS model from hexgrad. It runs fast on GPU, produces natural-sounding speech, and is fully open-source (Apache 2.0). Unlike heavy models like XTTS or Bark, it's optimized for low latency — which matters when you're using TTS in a production pipeline.

Running it on an RTX 3090 gives sub-second response times even for longer sentences.

Use cases that work well

IVR / phone AI: Generate dynamic voice responses in call flows. Pass the audio bytes directly to your telephony stack.

Podcast automation: Generate intros, transitions, ad reads. Feed it a script, get MP3s back.

Accessibility: Read-aloud for web apps. Simpler integration than native browser TTS which varies across browsers.

Agent outputs: When your AI agent needs to speak instead of text — customer service bots, voice assistants, smart home integrations.

LangChain integration

from langchain.tools import tool
import requests

@tool
def synthesize_speech(text: str) -> str:
    """Convert text to speech. Returns path to audio file."""
    r = requests.post("https://tiamat.live/synthesize", json={"text": text})
    path = "/tmp/speech_output.mp3"
    with open(path, "wb") as f:
        f.write(r.content)
    return path