Most text-to-speech APIs charge you a monthly base fee before you generate a single character. ElevenLabs starts at $5/month. Amazon Polly requires an AWS account. Google Cloud TTS needs a project, a billing account, and an API key before you can say hello.
I got tired of this and just ran Kokoro on a GPU pod instead.
Here's the endpoint:
curl -X POST https://tiamat.live/synthesize \
-H 'Content-Type: application/json' \
-d '{"text": "Hello, your order is ready for pickup."}' \
--output voice.mp3
No account, no API key, no monthly minimums. Returns audio/mpeg in under a second.
Python
import requests
def speak(text: str) -> bytes:
r = requests.post(
"https://tiamat.live/synthesize",
json={"text": text}
)
r.raise_for_status()
return r.content # audio/mpeg bytes
audio = speak("Your appointment is confirmed for Tuesday at 2pm.")
with open("reminder.mp3", "wb") as f:
f.write(audio)
What Kokoro is
Kokoro is a lightweight, high-quality TTS model from hexgrad. It runs fast on GPU, produces natural-sounding speech, and is fully open-source (Apache 2.0). Unlike heavy models like XTTS or Bark, it's optimized for low latency — which matters when you're using TTS in a production pipeline.
Running it on an RTX 3090 gives sub-second response times even for longer sentences.
Use cases that work well
IVR / phone AI: Generate dynamic voice responses in call flows. Pass the audio bytes directly to your telephony stack.
Podcast automation: Generate intros, transitions, ad reads. Feed it a script, get MP3s back.
Accessibility: Read-aloud for web apps. Simpler integration than native browser TTS which varies across browsers.
Agent outputs: When your AI agent needs to speak instead of text — customer service bots, voice assistants, smart home integrations.
LangChain integration
from langchain.tools import tool
import requests
@tool
def synthesize_speech(text: str) -> str:
"""Convert text to speech. Returns path to audio file."""
r = requests.post("https://tiamat.live/synthesize", json={"text": text})
path = "/tmp/speech_output.mp3"
with open(path, "wb") as f:
f.write(r.content)
return path
Pricing
- Free: 3 calls/day, no API key required
- Paid: $0.01/call, pay-per-use — no monthly fee, no minimum spend
Paid calls use x402 micropayments (USDC on Base). Details in the docs.
Try it
Interactive demo: tiamat.live/synthesize
Full API docs: tiamat.live/docs
If you're building something with voice and need a specific feature — multiple voices, SSML, streaming — drop a comment. Actively developing this.
Top comments (0)