Open-Sourcing VoxCast: CPU-Only Multi-Turn Podcast Generation With Low Memory Usage

#ai #python #opensource #machinelearning

I just open-sourced VoxCast, a lightweight app for generating synthetic multi-turn podcast episodes from:

a reference voice sample
a persona prompt
a topic prompt

The core thing I think is worth sharing: it runs locally on CPU only, with relatively low memory usage.

A lot of voice and speech demos assume access to a GPU, which is fine for labs and hosted products but less useful for local experimentation. I wanted something that could run on inexpensive hardware, stay local, and still be good enough for fast prototyping.

What VoxCast does

The workflow is straightforward:

Upload a short reference voice sample
Set a persona and topic
Generate a back-and-forth podcast episode
Download the result

Why I built it

I wanted to test a simple idea: can you turn voice cloning into a usable content primitive instead of a one-off demo?

Not just “generate audio from text,” but:
generate a structured conversation with synthetic hosts, locally, without needing a GPU box.

Why CPU-only matters

For me, this is the interesting part:

lower infrastructure cost
easier local development
easier demos on cheap devices
fewer deployment constraints
better fit for tinkering and rapid iteration