I Built an Offline Voice Typing App for Linux - Speak to AI

#linux #opensource #privacy #go

Despite the existence of various voice-to-text applications, I couldn't find a suitable solution for my daily use on my Linux OS. Therefore, I decided to create and share my open-source project with the community. Speak to AI is:

100% offline — uses Whisper locally, no cloud
Works everywhere — editors, browsers, terminals, AI chats
Global hotkeys — press, speak, release
Native Linux — supports X11 and Wayland
AppImage — download and run, no installation

Tech Stack

Go for the core app (fast, small binary)
whisper.cpp for speech recognition (via CGO)
evdev/D-Bus for global hotkeys
xdotool/ydotool for keyboard simulation
PulseAudio/PipeWire for audio capture
WebSocket API for integrations

The Hard Parts

X11 vs Wayland: Different typing mechanisms. Solution: detect environment and use appropriate method.

Audio permissions: Global hotkeys need input group membership. Clear docs help users set this up.

Model size: Whisper models are big. Using quantized small quantize model balances speed and accuracy.

Results

Storage: 277.2MB (whisper small q5 model, dependencies, go-binary)
Memory: ~300MB RAM during operation
<1s latency for short phrases
90%+ accuracy for clear speech
Works on Fedora, Ubuntu.

In-depth look at system design and technical hurdles: AshBuk.hashnode.dev

Try It

GitHub: https://github.com/AshBuk/speak-to-ai

DEV Community

I Built an Offline Voice Typing App for Linux - Speak to AI

Tech Stack

The Hard Parts

Results

Try It

Top comments (0)