Click, Speak, Type: A Windows Dictation MVP (Offline, Vosk)

#devchallenge #githubchallenge #cli #githubcopilot

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

I built Windows Voice Dictation Tool 🎙️🖱️⌨️ — a lightweight dictation app for Windows that converts speech to text and types it directly into any active application.

The idea is simple: instead of being limited to one editor, you can click anywhere (Notepad, Word, browser text boxes, chats, IDEs) and dictate as if your voice were the keyboard.

It runs offline using Vosk, so after the first setup there are no accounts, no API keys, and no data sent to servers. Privacy-friendly by design

Key features I implemented:

Universal text input: types into the focused app (Notepad, Word, web, chat, etc.)
Global hotkey: CTRL + ALT + D to start/stop dictation from anywhere
Real-time transcription
GUI settings (Tkinter): enable/disable, choose language/model
Model manager: detects installed models and can auto-download a default model
Multi-language (English + Spanish) when models are installed
System tray for quick status/controls

Repo: AnthonyErazo/Voice-Dictation

Demo

Short demo (3 minutes)

This is the fast walkthrough showing the app working end-to-end.

Watch the 3-min demo

Full walkthrough (28 minutes)

If you want the complete build process (planning, prompts, iterations, debugging, and final run), here is the full recording.

Watch the full 28-min walkthrough

Screenshot

Here’s the app running: GUI settings + live dictation into a document ✨

(If you open the video you’ll see the full flow end-to-end.)

My Experience with GitHub Copilot CLI

I used GitHub Copilot CLI as my “terminal teammate” to build this MVP quickly and iteratively.

What Copilot CLI helped me with:

Scaffolding the Python project structure (modules, responsibilities, clean separation)
Implementing the dictation loop (audio capture, recognition, text injection)
Designing the GUI logic (Tkinter settings + model/language selection)
Adding the model detection/download behavior so the app is usable on first run
Debugging issues and refining edge cases (hotkey behavior, freezing UI, model switching)

I kept the development prompts/notes in the repository as well:

Prompts log: prompts.md

Quick note on accuracy

Vosk accuracy depends a lot on the chosen model (small vs medium vs large) and your microphone/noise.

This MVP is a solid baseline and can later be improved by swapping engines or integrating more advanced local models.