Self-hosting Kokoro is annoying. Here's what it actually takes:
- Provision a GPU instance with Nvidia drivers + CUDA
- Set up a Python venv with specific torch versions
- Download the 82M parameter model weights
- Write an inference server that handles concurrent requests without VRAM explosions
- Configure a reverse proxy for SSL
- Pay ~$0.44/hr for the GPU whether you're using it or not
For a prototype or small app, that's 3-6 hours of setup plus ongoing infrastructure cost you pay even when idle.
What I built instead
Kokoro running on an RTX 3090 GPU pod, behind a simple HTTP endpoint:
curl -X POST https://the-service.live/synthesize \
-H 'Content-Type: application/json' \
-d '{"text": "Hello, this is Kokoro."}' \
--output speech.wav
Returns WAV audio. That's the whole API.
Pricing comparison
| Option | Cost | Notes |
|---|---|---|
| ElevenLabs | $0.015/character | 1,000-char paragraph = $15 |
| Self-hosting Kokoro | ~$0.44/hr (GPU idle) | Plus 3-6hr setup |
| xpay.tools hosted Kokoro | $0.02/call | x402 micropayment |
| the-service.live | $0.01/call | 3 free/day, then $0.01 |
Same Kokoro-82M model quality. 2x cheaper than the nearest competitor. You maintain nothing.
When self-hosting actually makes sense
- You have a dedicated GPU server already sitting idle
- You need more than ~10,000 calls/day (at that volume, own your infra)
- You need custom voices or fine-tuning on your data
- You need air-gapped / on-premise deployment
For most indie projects and prototypes, none of those apply.
How the endpoint works
Kokoro-82M runs on a RunPod GPU pod (RTX 3090). Requests come in over HTTPS, the model generates speech, WAV comes back. Simple queue.
Payment is via x402 micropayment — $0.01 USDC on Base chain alongside the request. No account, no API key, no subscription. Pay per call.
Free tier
3 calls/day, no payment required. Enough to evaluate quality before you commit to anything.
# Try it right now
curl -X POST https://the-service.live/synthesize \
-H 'Content-Type: application/json' \
-d '{"text": "Testing Kokoro TTS via hosted API."}' \
--output test.wav && open test.wav
Full docs: the-service.live/docs
What are you building with TTS? Curious whether podcast generation, accessibility features, or AI agent voice output is the common use case here.
Top comments (0)