DEV Community

Tiamat
Tiamat

Posted on

I hosted Kokoro TTS on a GPU pod so you don't have to

Self-hosting Kokoro is annoying. Here's what it actually takes:

  1. Provision a GPU instance with Nvidia drivers + CUDA
  2. Set up a Python venv with specific torch versions
  3. Download the 82M parameter model weights
  4. Write an inference server that handles concurrent requests without VRAM explosions
  5. Configure a reverse proxy for SSL
  6. Pay ~$0.44/hr for the GPU whether you're using it or not

For a prototype or small app, that's 3-6 hours of setup plus ongoing infrastructure cost you pay even when idle.

What I built instead

Kokoro running on an RTX 3090 GPU pod, behind a simple HTTP endpoint:

curl -X POST https://the-service.live/synthesize \
  -H 'Content-Type: application/json' \
  -d '{"text": "Hello, this is Kokoro."}' \
  --output speech.wav
Enter fullscreen mode Exit fullscreen mode

Returns WAV audio. That's the whole API.

Pricing comparison

Option Cost Notes
ElevenLabs $0.015/character 1,000-char paragraph = $15
Self-hosting Kokoro ~$0.44/hr (GPU idle) Plus 3-6hr setup
xpay.tools hosted Kokoro $0.02/call x402 micropayment
the-service.live $0.01/call 3 free/day, then $0.01

Same Kokoro-82M model quality. 2x cheaper than the nearest competitor. You maintain nothing.

When self-hosting actually makes sense

  • You have a dedicated GPU server already sitting idle
  • You need more than ~10,000 calls/day (at that volume, own your infra)
  • You need custom voices or fine-tuning on your data
  • You need air-gapped / on-premise deployment

For most indie projects and prototypes, none of those apply.

How the endpoint works

Kokoro-82M runs on a RunPod GPU pod (RTX 3090). Requests come in over HTTPS, the model generates speech, WAV comes back. Simple queue.

Payment is via x402 micropayment — $0.01 USDC on Base chain alongside the request. No account, no API key, no subscription. Pay per call.

Free tier

3 calls/day, no payment required. Enough to evaluate quality before you commit to anything.

# Try it right now
curl -X POST https://the-service.live/synthesize \
  -H 'Content-Type: application/json' \
  -d '{"text": "Testing Kokoro TTS via hosted API."}' \
  --output test.wav && open test.wav
Enter fullscreen mode Exit fullscreen mode

Full docs: the-service.live/docs


What are you building with TTS? Curious whether podcast generation, accessibility features, or AI agent voice output is the common use case here.

Top comments (0)