Self-Hosting High-Fidelity TTS: Deploying VoxCPM2 with a Public Pinggy Tunnel

#voiceai #python #vllm #selfhosting

If you are tired of paying per-character fees for TTS services, you can now run OpenBMB’s VoxCPM2 locally. This 2-billion-parameter model drops traditional audio tokenization, opting instead for a diffusion autoregressive architecture that produces 48 kHz, studio-quality audio. It supports 30 languages and offers advanced features like zero-shot voice design.

The Technical Advantage

Unlike two-stage models like F5-TTS or Kokoro that encode text into discrete audio tokens before decoding, VoxCPM2 stays within the latent space of a learned audio VAE. This approach preserves nuances like breath patterns and mid-sentence emotional shifts that are usually lost during quantization.

Prerequisites

To run this efficiently, ensure your environment meets these requirements:

Python 3.10 or 3.11
CUDA 12.0+
PyTorch 2.5.0+
8 GB VRAM (bfloat16 precision)
FFmpeg installed on your path

Setup and Inference

Start by creating a virtual environment, then install the package:

python3 -m venv voxcpm-env
source voxcpm-env/bin/activate
pip install voxcpm

To test your installation and pull the model weights, run:

python -c "from voxcpm import VoxCPM2; m = VoxCPM2(); print('ready')"

Serving via OpenAI-Compatible API

VoxCPM2 supports vLLM-Omni, which exposes a /v1/audio/speech endpoint. This allows you to swap your existing OpenAI SDK base URL to point at your local GPU:

pip install "vllm==0.19.0" vllm-omni
vllm serve openbmb/VoxCPM2 --omni --port 8000

Exposing via Pinggy

Since your local port 8000 isn't public, use Pinggy to create a secure tunnel without complex firewall configuration:

ssh -p 443 -R0:localhost:8000 free.pinggy.io

This command returns a public HTTPS URL. You can even secure this endpoint with basic auth if you are sharing it with a small team:

ssh -p 443 -R0:localhost:8000 a.pinggy.io -t "b:myuser:mypassword"

Integration

Once the tunnel is active, call your local instance from any environment just as you would with a cloud provider:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-pinggy-url/v1",
    api_key="not-needed"
)

response = client.audio.speech.create(
    model="openbmb/VoxCPM2",
    voice="default",
    input="Local GPU deployment is live!"
)
response.stream_to_file("output.mp3")