DEV Community

Om Prakash
Om Prakash

Posted on

We Built a Voice Design API That's 6x Cheaper Than ElevenLabs

The Problem with TTS Today

Most text-to-speech APIs fall into two camps:

  1. Expensive but good: ElevenLabs, Azure Neural TTS - great quality but $0.15-0.30/min
  2. Cheap but limited: edge-tts, Google Cloud TTS - free/cheap but no voice customization

What if you could generate any voice you wanted from a text description alone - at 1/6th the cost of ElevenLabs?

That's what we built with VoxCPM2 on PixelAPI.


What We Built

A self-hosted VoxCPM2 TTS API with two modes:

Voice Design: Generate speech using only a text description. No reference audio needed.

(A warm elderly man, gentle voice) Hello, welcome to our store.
Enter fullscreen mode Exit fullscreen mode

Voice Cloning: Upload a reference audio clip and clone the voice for consistent brand voices.


Voice Design Quality - Real Benchmarks

We tested 8 different voice profiles using vision QC on spectrograms:

Voice Score Notes
Young Woman (gentle) 8/10 Clear formants, good dynamics
Child (curious) 8/10 Good high-end presence, no artifacts
Villain (dark) 7/10 Minor HF noise, good menace
Elderly Man (wise) 7/10 Slightly weak volume but clean

Speed on Real Hardware

Benchmarked on RTX 6000 Ada:

Audio Duration Generation Time RTF
10 seconds ~4 seconds 0.38
25 seconds ~9 seconds 0.38

RTF of 0.38 means it generates faster than real-time.


Pricing - The 2x Cheaper Rule

Provider Voice Design Voice Cloning
ElevenLabs $0.30/min $0.30/min + subscription
Play.ht $0.036/min $0.10/min
PixelAPI $0.050/min $0.100/min

6x cheaper than ElevenLabs for voice design, and we're self-hosted.


API Example - Python

import requests

resp = requests.post("https://api.pixelapi.dev/v1/tts/generate",
    headers={"Authorization": "Bearer YOUR_KEY"},
    data={"text": "(A warm elderly man, gentle) Hello everyone.", "language": "en"}
)
job = resp.json()
print(f"Job: {job['id']}")
Enter fullscreen mode Exit fullscreen mode

30 Languages Supported

VoxCPM2 supports 30 languages including English, Hindi, Chinese, Spanish, French, German, Japanese, Korean, Russian, Arabic, and more.

The API is live at api.pixelapi.dev with 100 free credits on signup.

Top comments (0)