Every AI coding assistant requires typing. GitHub Copilot, Continue, Kiro — they all expect you to type your prompts. But what if you could just talk?
That's why I built VoxPilot.
The Problem
I spend a lot of time typing prompts like "refactor this function to use async/await with proper error handling and add unit tests." That's 15 seconds of typing for something I could say in 3 seconds.
For developers with RSI or carpal tunnel, the problem is worse. Typing isn't just slow — it's painful.
The Solution
VoxPilot is a VS Code extension that captures your voice, transcribes it locally using Moonshine ASR, and sends the text to your coding assistant.
The key word is "locally." Your audio never leaves your machine. There are no API keys, no cloud calls, no telemetry. The ASR model is 27MB and runs via ONNX Runtime.
How It Works
Microphone → PCM Audio → Voice Activity Detection → Moonshine ASR → Text → VS Code Chat
Audio Capture: Native CLI tools (arecord on Linux, sox on macOS, ffmpeg on Windows) capture raw PCM audio at 16kHz.
Voice Activity Detection: An energy-based VAD detects when you start and stop speaking. No need to press a button — just talk.
Transcription: Moonshine's encoder-decoder architecture processes the audio through ONNX Runtime. The Tiny model (27MB) handles quick commands; the Base model (65MB) is better for longer dictation.
Delivery: The transcript goes to VS Code's Chat API, targeting whatever participant you've configured (Copilot, Continue, etc.).
Privacy
This was non-negotiable. Voice data is sensitive. VoxPilot processes everything in-memory and never writes audio to disk or sends it over the network.
Try It
- Open VSX: https://open-vsx.org/extension/natearcher-ai/voxpilot
- GitHub: https://github.com/natearcher-ai/voxpilot
MIT licensed. PRs welcome. Star the repo if it's useful.
Top comments (0)