TL;DR: Open-source macOS menubar app. Hold Fn → speak → release → text appears. Uses OpenAI Whisper API. No subscriptions. Your API key. ~$0.006 per minute. Built with Hammerspoon + 400 lines of Lua.
The Problem: Why I Built This
Apple's built-in dictation is either limited (offline) or requires full Siri integration (cloud). Third-party dictation apps want $10-30/month subscriptions, run heavy local models that eat RAM, or lock you into their ecosystem.
I wanted something dead simple:
- Push-to-talk (hold key, speak, release)
- Instant paste into any active field
- My own API key (full cost control)
- Zero bloat (no Electron, no heavy models)
- Hackable (plain Lua config, open source)
So I built it in a weekend using Hammerspoon.
What It Does
Dictator-Speech-to-Text is a lightweight menubar app that:
- Records audio while you hold the Fn key (configurable)
- Compresses to FLAC (~50% smaller than WAV → faster upload)
- Sends to OpenAI Whisper API (or any compatible endpoint)
- Auto-pastes transcribed text into your active application
That's it. No UI wizards, no account signup, no analytics.
Why It's Fast
Most dictation tools waste time on:
- Loading heavy local models (Whisper Large = 3GB+ RAM)
- Inefficient audio encoding (WAV uploads are 2x bigger)
- UI overhead (Electron apps, system dialogs)
Dictator avoids all of this:
- SoX for instant FLAC conversion (< 0.4s for typical recordings)
- HTTP/2 streaming to Whisper API
- Debounced release detection (no accidental double-triggers)
- Exponential backoff on rate limits (so you never lose audio)
Typical workflow: Hold Fn → speak 10 seconds → release → text appears in ~1.5-2 seconds total.
Cost: Literally Pennies
OpenAI Whisper API pricing: $0.006 per minute of audio.
Let's say you dictate 30 minutes per day:
- Daily cost: $0.18
- Monthly cost: ~$5.40
Compare that to:
- Dragon Anywhere: $15/month
- Otter.ai Pro: $16.99/month
- Most "AI dictation" apps: $10-30/month
Plus, you can switch providers (Groq, Cloudflare Workers AI, local Whisper server) by changing one config line.
Technical Highlights (For Devs)
Architecture:
- ui.lua: Menubar icon + status indicator
- recorder.lua: Push-to-talk state machine (debounced release, audio capture via SoX)
- api.lua: HTTP client with retry logic, rate limiting, backoff
- config.lua: User-configurable hotkey, API endpoint, model, language
Why Hammerspoon?
- Native macOS APIs (no Electron bloat)
- Lua is fast enough for this use case
- Entire codebase: ~400 lines
- Startup overhead: < 10ms
Audio Pipeline:
Fn key held
→ SoX records to /tmp/*.wav
→ Fn released
→ SoX converts to FLAC (~50% compression)
→ POST to Whisper API
→ Paste transcription via hs.eventtap
Error Handling:
- Network timeouts → retry with exponential backoff
- Rate limits → auto-retry after delay
- Audio corruption → shows error notification, keeps recording buffer
When to Use This vs. Alternatives
Use Dictator if:
- You want push-to-talk (not always-on listening)
- You're okay with API costs (pennies, but not free)
- You want full control (API key, provider, model)
- You're on macOS and already use/like Hammerspoon
Use something else if:
- You need 100% offline (use Whisper.cpp locally)
- You want always-on voice commands (use Siri/Talon)
- You're on Linux/Windows (Hammerspoon is macOS-only)
Try It
GitHub: Dictator-Speech-to-Text
Cool Website: Dictators Website
MIT licensed. Issues/PRs welcome.
Questions? Drop a comment—happy to explain any technical details or help with setup!
Top comments (0)