DEV Community

Masih Maafi
Masih Maafi

Posted on • Originally published at masihmoafi.com

πŸŽ™οΈ Voice Commander: AI-Powered Voice Transcription for Developers

Voice Commander is an AI-powered voice transcription tool designed for developers. It combines GPU-accelerated Whisper.cpp for local transcription with Gemini API for intelligent text refinement. The result: clean, structured prompts from natural speech.

Key Innovation: Post-processing with Gemini removes filler words, fixes grammar, and structures output into XML/JSON formats - perfect for feeding into LLMs or documentation.

✨ Key Features

  • 🎀 F8/F9 Hotkeys - Quick recording with keyboard shortcuts
  • πŸš€ GPU Acceleration - CUDA-powered Whisper transcription (GPU-only mode, no CPU fallback)
  • πŸ€– AI Refinement - Gemini API removes filler words, fixes grammar, structures output
  • πŸ“ Structured Output - XML/JSON/plain text formats for LLM consumption
  • πŸ“‹ Auto-paste - Transcribed text automatically inserted at cursor
  • πŸ”Œ VS Code Extension - Seamless integration with your editor
  • πŸ”’ Privacy-First - Transcription runs locally, only refined text hits API

πŸ› οΈ How It Works

  1. Press F8 to start recording
  2. Speak naturally: "um, so like, I need a function that uh calculates fibonacci"
  3. Press F9 to stop
  4. Whisper transcribes locally with GPU acceleration
  5. Gemini refines: Removes fillers, fixes grammar, structures output
  6. Auto-pastes: Clean text appears at cursor

Example:

Input: "um so like I want to [NOISE] create a function that uh calculates fibonacci"

Output: "Create a function that calculates the Fibonacci sequence"

πŸš€ Setup

1. Install whisper.cpp

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make
Enter fullscreen mode Exit fullscreen mode

2. Download Model

bash ./models/download-ggml-model.sh medium.en
Enter fullscreen mode Exit fullscreen mode

3. Install Python Dependencies

pip install sounddevice scipy numpy pyperclip pynput
Enter fullscreen mode Exit fullscreen mode

4. Run Voice Commander

python portable_commander.py
Enter fullscreen mode Exit fullscreen mode

πŸ’» VS Code Extension

The project includes a VS Code extension for seamless integration with your coding environment. See the VScode_extension/ folder for installation instructions.

πŸ“‹ Requirements

  • whisper.cpp compiled in parent directory
  • Python 3.7+
  • Microphone access
  • Optional: GPU for faster transcription

πŸ”— Links & Resources

Originally published at masihmoafi.com

Top comments (0)