I just open-sourced VoxCast, a lightweight app for generating synthetic multi-turn podcast episodes from:
- a reference voice sample
- a persona prompt
- a topic prompt
The core thing I think is worth sharing: it runs locally on CPU only, with relatively low memory usage.
A lot of voice and speech demos assume access to a GPU, which is fine for labs and hosted products but less useful for local experimentation. I wanted something that could run on inexpensive hardware, stay local, and still be good enough for fast prototyping.
What VoxCast does
The workflow is straightforward:
- Upload a short reference voice sample
- Set a persona and topic
- Generate a back-and-forth podcast episode
- Download the result
Why I built it
I wanted to test a simple idea: can you turn voice cloning into a usable content primitive instead of a one-off demo?
Not just “generate audio from text,” but:
generate a structured conversation with synthetic hosts, locally, without needing a GPU box.
Why CPU-only matters
For me, this is the interesting part:
lower infrastructure cost
easier local development
easier demos on cheap devices
fewer deployment constraints
better fit for tinkering and rapid iteration
Repo
GitHub: https://github.com/chrisk60331/VoxCast
I posted a demo below.
Interested in feedback, especially from people building local-first voice apps or lightweight inference workflows.
Top comments (0)