If you are tired of paying per-character fees for TTS services, you can now run OpenBMB’s VoxCPM2 locally. This 2-billion-parameter model drops traditional audio tokenization, opting instead for a diffusion autoregressive architecture that produces 48 kHz, studio-quality audio. It supports 30 languages and offers advanced features like zero-shot voice design.
The Technical Advantage
Unlike two-stage models like F5-TTS or Kokoro that encode text into discrete audio tokens before decoding, VoxCPM2 stays within the latent space of a learned audio VAE. This approach preserves nuances like breath patterns and mid-sentence emotional shifts that are usually lost during quantization.
Prerequisites
To run this efficiently, ensure your environment meets these requirements:
- Python 3.10 or 3.11
- CUDA 12.0+
- PyTorch 2.5.0+
- 8 GB VRAM (bfloat16 precision)
- FFmpeg installed on your path
Setup and Inference
Start by creating a virtual environment, then install the package:
python3 -m venv voxcpm-env
source voxcpm-env/bin/activate
pip install voxcpm
To test your installation and pull the model weights, run:
python -c "from voxcpm import VoxCPM2; m = VoxCPM2(); print('ready')"
Serving via OpenAI-Compatible API
VoxCPM2 supports vLLM-Omni, which exposes a /v1/audio/speech endpoint. This allows you to swap your existing OpenAI SDK base URL to point at your local GPU:
pip install "vllm==0.19.0" vllm-omni
vllm serve openbmb/VoxCPM2 --omni --port 8000
Exposing via Pinggy
Since your local port 8000 isn't public, use Pinggy to create a secure tunnel without complex firewall configuration:
ssh -p 443 -R0:localhost:8000 free.pinggy.io
This command returns a public HTTPS URL. You can even secure this endpoint with basic auth if you are sharing it with a small team:
ssh -p 443 -R0:localhost:8000 a.pinggy.io -t "b:myuser:mypassword"
Integration
Once the tunnel is active, call your local instance from any environment just as you would with a cloud provider:
from openai import OpenAI
client = OpenAI(
base_url="https://your-pinggy-url/v1",
api_key="not-needed"
)
response = client.audio.speech.create(
model="openbmb/VoxCPM2",
voice="default",
input="Local GPU deployment is live!"
)
response.stream_to_file("output.mp3")

Top comments (0)