DEV Community

Asher Buk
Asher Buk

Posted on

I Built an Offline Voice Typing App for Linux - Speak to AI

https://github.com/AshBuk/speak-to-ai

Despite the existence of various voice-to-text applications, I couldn't find a suitable solution for my daily use on my Linux OS. Therefore, I decided to create and share my open-source project with the community. Speak to AI is:

  • 100% offline — uses Whisper locally, no cloud
  • Works everywhere — editors, browsers, terminals, AI chats
  • Global hotkeys — press, speak, release
  • Native Linux — supports X11 and Wayland
  • AppImage — download and run, no installation

Tech Stack

  • Go for the core app (fast, small binary)
  • whisper.cpp for speech recognition
  • xdotool/ydotool for keyboard simulation
  • PulseAudio/PipeWire for audio capture

The Hard Parts

X11 vs Wayland: Different typing mechanisms. Solution: detect environment and use appropriate method.

Audio permissions: Global hotkeys need input group membership. Clear docs help users set this up.

Model size: Whisper models are big. Using quantized small quantize model balances speed and accuracy.

Results

  • Storage: 277.2MB (whisper small q5 model, dependencies, go-binary)
  • Memory: ~300MB RAM during operation
  • <1s latency for short phrases
  • 90%+ accuracy for clear speech
  • Works on Fedora, Ubuntu.

I would be grateful if you test it on your Linux environment! Check the DE documentation:
https://github.com/AshBuk/speak-to-ai/blob/master/docs/Desktop_Environment_Support.md

Try It

GitHub: https://github.com/AshBuk/speak-to-ai

Top comments (0)