DEV Community

Anthony Erazo
Anthony Erazo

Posted on

Click, Speak, Type: A Windows Dictation MVP (Offline, Vosk)

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

I built Windows Voice Dictation Tool 🎙️🖱️⌨️ — a lightweight dictation app for Windows that converts speech to text and types it directly into any active application.

The idea is simple: instead of being limited to one editor, you can click anywhere (Notepad, Word, browser text boxes, chats, IDEs) and dictate as if your voice were the keyboard.

It runs offline using Vosk, so after the first setup there are no accounts, no API keys, and no data sent to servers. Privacy-friendly by design

Key features I implemented:

  • Universal text input: types into the focused app (Notepad, Word, web, chat, etc.)
  • Global hotkey: CTRL + ALT + D to start/stop dictation from anywhere
  • Real-time transcription
  • GUI settings (Tkinter): enable/disable, choose language/model
  • Model manager: detects installed models and can auto-download a default model
  • Multi-language (English + Spanish) when models are installed
  • System tray for quick status/controls

Repo: AnthonyErazo/Voice-Dictation


Demo

Short demo (3 minutes)

This is the fast walkthrough showing the app working end-to-end.

Watch the 3-min demo


Full walkthrough (28 minutes)

If you want the complete build process (planning, prompts, iterations, debugging, and final run), here is the full recording.

Watch the full 28-min walkthrough

Screenshot

Here’s the app running: GUI settings + live dictation into a document ✨

(If you open the video you’ll see the full flow end-to-end.)


My Experience with GitHub Copilot CLI

I used GitHub Copilot CLI as my “terminal teammate” to build this MVP quickly and iteratively.

What Copilot CLI helped me with:

  • Scaffolding the Python project structure (modules, responsibilities, clean separation)
  • Implementing the dictation loop (audio capture, recognition, text injection)
  • Designing the GUI logic (Tkinter settings + model/language selection)
  • Adding the model detection/download behavior so the app is usable on first run
  • Debugging issues and refining edge cases (hotkey behavior, freezing UI, model switching)

I kept the development prompts/notes in the repository as well:


Quick note on accuracy

Vosk accuracy depends a lot on the chosen model (small vs medium vs large) and your microphone/noise.

This MVP is a solid baseline and can later be improved by swapping engines or integrating more advanced local models.


Top comments (0)