This is a submission for the DEV Weekend Challenge: Community
The Community
I built this for Linux users who do a lot of typing and want fast transcription directly in their existing workflow.
That includes:
- developers writing docs, comments, and messages
- students taking quick notes
- writers drafting ideas
- multilingual users who need reliable transcription across languages
The goal was simple: no heavy app, no complicated flow, just speak and continue typing.
What I Built
I built Shruti, a minimal system-wide speech-to-text utility for Linux X11.
Workflow:
- Place cursor in any text field
- Press hotkey once to start recording
- Press the same hotkey again to stop
- Shruti transcribes and types text at the cursor
- Press
Escanytime to cancel current recording
Why it’s useful
- Works across apps where typing is possible
- Uses the user’s own Gemini API key
- High-quality transcription across languages
- Minimal visual HUD while recording/transcribing
- No always-on background process while idle.
Demo
- GitHub: https://github.com/creasac/shruti
- Demo video: https://youtu.be/UYwDzhuUQPQ
Code
shruti
Minimal desktop speech-to-text using Gemini.
Install
curl -fsSL https://raw.githubusercontent.com/creasac/shruti/main/bootstrap.sh | bash
What setup asks:
- Gemini API key (hidden input)
- Preferred hotkey (default
Ctrl+Space)
Hotkey behavior after setup:
- Press hotkey once: start recording
- Press hotkey again: stop and transcribe
- Press
Esc: cancel current recording
Nothing runs in background while idle.
Configuration
Files:
~/.config/shruti/config.toml~/.config/shruti/credentials.toml
API key location:
- Stored only in
~/.config/shruti/credentials.toml
To remove your key:
rm -f ~/.config/shruti/credentials.toml
To remove all Shruti config data:
rm -rf ~/.config/shruti
Editable config fields:
modelhotkeymax_record_secondssample_ratechannelsprompt
Commands
shruti setup
shruti doctor --verbose
shruti oneshot
Limitations
- Linux X11 only (Wayland blocks unrestricted global hotkey/input injection for security)
License
MIT. See LICENSE.
How I Built It
- Python 3.11+ for fast iteration and portability
-
Gemini API (
gemini-2.5-flash-lite) for transcription - sounddevice + numpy for microphone capture and waveform data
- xdotool for text insertion at the active cursor location (X11)
- Tkinter for a compact overlay HUD
-
TOML config in
~/.config/shruti/for user-controlled settings and API key storage
Design choices
- Keep the product intentionally narrow: one job, done well
- Hotkey-first interaction for speed
- Minimal UI and minimal config surface
- Users bring their own Gemini API key, so usage and billing stay fully under their control.
- Cloud-based transcription gives wider language coverage, keeps local compute light, and allows easy model upgrades.
What’s Next
- Better packaging/distribution options
- More desktop-environment setup helpers
- Explore Wayland-compatible path where feasible
Built by @creasac.
Top comments (0)