DEV Community

creasac
creasac

Posted on

Built Shruti: A Minimal System-Wide Speech-to-Text Tool for Linux

DEV Weekend Challenge: Community

This is a submission for the DEV Weekend Challenge: Community

The Community

I built this for Linux users who do a lot of typing and want fast transcription directly in their existing workflow.

That includes:

  • developers writing docs, comments, and messages
  • students taking quick notes
  • writers drafting ideas
  • multilingual users who need reliable transcription across languages

The goal was simple: no heavy app, no complicated flow, just speak and continue typing.

What I Built

I built Shruti, a minimal system-wide speech-to-text utility for Linux X11.

Workflow:

  1. Place cursor in any text field
  2. Press hotkey once to start recording
  3. Press the same hotkey again to stop
  4. Shruti transcribes and types text at the cursor
  5. Press Esc anytime to cancel current recording

Why it’s useful

  • Works across apps where typing is possible
  • Uses the user’s own Gemini API key
  • High-quality transcription across languages
  • Minimal visual HUD while recording/transcribing
  • No always-on background process while idle.

Demo

Code

shruti

Minimal desktop speech-to-text using Gemini.

Install

curl -fsSL https://raw.githubusercontent.com/creasac/shruti/main/bootstrap.sh | bash
Enter fullscreen mode Exit fullscreen mode

What setup asks:

  • Gemini API key (hidden input)
  • Preferred hotkey (default Ctrl+Space)

Hotkey behavior after setup:

  • Press hotkey once: start recording
  • Press hotkey again: stop and transcribe
  • Press Esc: cancel current recording

Nothing runs in background while idle.

Configuration

Files:

  • ~/.config/shruti/config.toml
  • ~/.config/shruti/credentials.toml

API key location:

  • Stored only in ~/.config/shruti/credentials.toml

To remove your key:

rm -f ~/.config/shruti/credentials.toml
Enter fullscreen mode Exit fullscreen mode

To remove all Shruti config data:

rm -rf ~/.config/shruti
Enter fullscreen mode Exit fullscreen mode

Editable config fields:

  • model
  • hotkey
  • max_record_seconds
  • sample_rate
  • channels
  • prompt

Commands

shruti setup
shruti doctor --verbose
shruti oneshot
Enter fullscreen mode Exit fullscreen mode

Limitations

  • Linux X11 only (Wayland blocks unrestricted global hotkey/input injection for security)

License

MIT. See LICENSE.




How I Built It

  • Python 3.11+ for fast iteration and portability
  • Gemini API (gemini-2.5-flash-lite) for transcription
  • sounddevice + numpy for microphone capture and waveform data
  • xdotool for text insertion at the active cursor location (X11)
  • Tkinter for a compact overlay HUD
  • TOML config in ~/.config/shruti/ for user-controlled settings and API key storage

Design choices

  • Keep the product intentionally narrow: one job, done well
  • Hotkey-first interaction for speed
  • Minimal UI and minimal config surface
  • Users bring their own Gemini API key, so usage and billing stay fully under their control.
  • Cloud-based transcription gives wider language coverage, keeps local compute light, and allows easy model upgrades.

What’s Next

  • Better packaging/distribution options
  • More desktop-environment setup helpers
  • Explore Wayland-compatible path where feasible

Built by @creasac.

Top comments (0)