🎙️ Voice Commander: AI-Powered Voice Transcription for Developers

#ai #whisper #voiceai #vscode

Voice Commander is an AI-powered voice transcription tool designed for developers. It combines GPU-accelerated Whisper.cpp for local transcription with Gemini API for intelligent text refinement. The result: clean, structured prompts from natural speech.

Key Innovation: Post-processing with Gemini removes filler words, fixes grammar, and structures output into XML/JSON formats - perfect for feeding into LLMs or documentation.

✨ Key Features

🎤 F8/F9 Hotkeys - Quick recording with keyboard shortcuts
🚀 GPU Acceleration - CUDA-powered Whisper transcription (GPU-only mode, no CPU fallback)
🤖 AI Refinement - Gemini API removes filler words, fixes grammar, structures output
📝 Structured Output - XML/JSON/plain text formats for LLM consumption
📋 Auto-paste - Transcribed text automatically inserted at cursor
🔌 VS Code Extension - Seamless integration with your editor
🔒 Privacy-First - Transcription runs locally, only refined text hits API

🛠️ How It Works

Press F8 to start recording
Speak naturally: "um, so like, I need a function that uh calculates fibonacci"
Press F9 to stop
Whisper transcribes locally with GPU acceleration
Gemini refines: Removes fillers, fixes grammar, structures output
Auto-pastes: Clean text appears at cursor

Example:

Input: "um so like I want to [NOISE] create a function that uh calculates fibonacci"

Output: "Create a function that calculates the Fibonacci sequence"

🚀 Setup

1. Install whisper.cpp

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make

2. Download Model

bash ./models/download-ggml-model.sh medium.en

3. Install Python Dependencies

pip install sounddevice scipy numpy pyperclip pynput

4. Run Voice Commander

python portable_commander.py

💻 VS Code Extension

The project includes a VS Code extension for seamless integration with your coding environment. See the VScode_extension/ folder for installation instructions.