The Community

I built this for Linux users who do a lot of typing and want fast transcription directly in their existing workflow.

That includes:

developers writing docs, comments, and messages

students taking quick notes

writers drafting ideas

multilingual users who need reliable transcription across languages

The goal was simple: no heavy app, no complicated flow, just speak and continue typing.

What I Built

I built Shruti, a minimal system-wide speech-to-text utility for Linux X11.

Workflow:

Place cursor in any text field

Press hotkey once to start recording

Press the same hotkey again to stop

Shruti transcribes and types text at the cursor

Press Esc anytime to cancel current recording

Why it’s useful

Works across apps where typing is possible

Uses the user’s own Gemini API key

High-quality transcription across languages

Minimal visual HUD while recording/transcribing

No always-on background process while idle.

Code

creasac / shruti

minimal linux (x11) speech-to-text utility

shruti

Minimal desktop speech-to-text using Gemini.

Install

curl -fsSL https://raw.githubusercontent.com/creasac/shruti/main/bootstrap.sh | bash

What setup asks:

Gemini API key (hidden input)
Preferred hotkey (default Ctrl+Space)

Hotkey behavior after setup:

Press hotkey once: start recording
Press hotkey again: stop and transcribe
Press Esc: cancel current recording

Nothing runs in background while idle.

Configuration

Files:

~/.config/shruti/config.toml
~/.config/shruti/credentials.toml

API key location:

Stored only in ~/.config/shruti/credentials.toml

To remove your key:

rm -f ~/.config/shruti/credentials.toml

To remove all Shruti config data:

rm -rf ~/.config/shruti

Editable config fields:

model
hotkey
max_record_seconds
sample_rate
channels
prompt

Commands

shruti setup
shruti doctor --verbose
shruti oneshot

Limitations

Linux X11 only (Wayland blocks unrestricted global hotkey/input injection for security)

License

MIT. See LICENSE.

View on GitHub

How I Built It

Python 3.11+ for fast iteration and portability

Gemini API (gemini-2.5-flash-lite) for transcription

sounddevice + numpy for microphone capture and waveform data

xdotool for text insertion at the active cursor location (X11)

Tkinter for a compact overlay HUD

TOML config in ~/.config/shruti/ for user-controlled settings and API key storage

Design choices

Keep the product intentionally narrow: one job, done well

Hotkey-first interaction for speed

Minimal UI and minimal config surface

Users bring their own Gemini API key, so usage and billing stay fully under their control.

Cloud-based transcription gives wider language coverage, keeps local compute light, and allows easy model upgrades.

DEV Community

Built Shruti: A Minimal System-Wide Speech-to-Text Tool for Linux