Voice Commander is an AI-powered voice transcription tool designed for developers. It combines GPU-accelerated Whisper.cpp for local transcription with Gemini API for intelligent text refinement. The result: clean, structured prompts from natural speech.
Key Innovation: Post-processing with Gemini removes filler words, fixes grammar, and structures output into XML/JSON formats - perfect for feeding into LLMs or documentation.
β¨ Key Features
- π€ F8/F9 Hotkeys - Quick recording with keyboard shortcuts
- π GPU Acceleration - CUDA-powered Whisper transcription (GPU-only mode, no CPU fallback)
- π€ AI Refinement - Gemini API removes filler words, fixes grammar, structures output
- π Structured Output - XML/JSON/plain text formats for LLM consumption
- π Auto-paste - Transcribed text automatically inserted at cursor
- π VS Code Extension - Seamless integration with your editor
- π Privacy-First - Transcription runs locally, only refined text hits API
π οΈ How It Works
- Press F8 to start recording
- Speak naturally: "um, so like, I need a function that uh calculates fibonacci"
- Press F9 to stop
- Whisper transcribes locally with GPU acceleration
- Gemini refines: Removes fillers, fixes grammar, structures output
- Auto-pastes: Clean text appears at cursor
Example:
Input: "um so like I want to [NOISE] create a function that uh calculates fibonacci"
Output: "Create a function that calculates the Fibonacci sequence"
π Setup
1. Install whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make
2. Download Model
bash ./models/download-ggml-model.sh medium.en
3. Install Python Dependencies
pip install sounddevice scipy numpy pyperclip pynput
4. Run Voice Commander
python portable_commander.py
π» VS Code Extension
The project includes a VS Code extension for seamless integration with your coding environment. See the VScode_extension/ folder for installation instructions.
π Requirements
- whisper.cpp compiled in parent directory
- Python 3.7+
- Microphone access
- Optional: GPU for faster transcription
π Links & Resources
- π» GitHub Repository
- π Full Article
Originally published at masihmoafi.com
Top comments (0)