DEV Community

Chris King
Chris King

Posted on

Open-Sourcing VoxCast: CPU-Only Multi-Turn Podcast Generation With Low Memory Usage

I just open-sourced VoxCast, a lightweight app for generating synthetic multi-turn podcast episodes from:

  1. a reference voice sample
  2. a persona prompt
  3. a topic prompt

The core thing I think is worth sharing: it runs locally on CPU only, with relatively low memory usage.

A lot of voice and speech demos assume access to a GPU, which is fine for labs and hosted products but less useful for local experimentation. I wanted something that could run on inexpensive hardware, stay local, and still be good enough for fast prototyping.

What VoxCast does

The workflow is straightforward:

  1. Upload a short reference voice sample
  2. Set a persona and topic
  3. Generate a back-and-forth podcast episode
  4. Download the result

Why I built it

I wanted to test a simple idea: can you turn voice cloning into a usable content primitive instead of a one-off demo?

Not just “generate audio from text,” but:
generate a structured conversation with synthetic hosts, locally, without needing a GPU box.

Why CPU-only matters

For me, this is the interesting part:

lower infrastructure cost
easier local development
easier demos on cheap devices
fewer deployment constraints
better fit for tinkering and rapid iteration

Repo

GitHub: https://github.com/chrisk60331/VoxCast

I posted a demo below.

Interested in feedback, especially from people building local-first voice apps or lightweight inference workflows.

Top comments (0)