Simon Busshart

Posted on Jan 22

I Built a 2KB Dictation Tool That Costs Pennies and Runs Circles Around Apple's Built-in Dictation

#ai #hammerspoon #whisper #lua

TL;DR: Open-source macOS menubar app. Hold Fn → speak → release → text appears. Uses OpenAI Whisper API. No subscriptions. Your API key. ~$0.006 per minute. Built with Hammerspoon + 400 lines of Lua.

The Problem: Why I Built This

Apple's built-in dictation is either limited (offline) or requires full Siri integration (cloud). Third-party dictation apps want $10-30/month subscriptions, run heavy local models that eat RAM, or lock you into their ecosystem.

I wanted something dead simple:

Push-to-talk (hold key, speak, release)
Instant paste into any active field
My own API key (full cost control)
Zero bloat (no Electron, no heavy models)
Hackable (plain Lua config, open source)

So I built it in a weekend using Hammerspoon.

What It Does

Dictator-Speech-to-Text is a lightweight menubar app that:

Records audio while you hold the Fn key (configurable)
Compresses to FLAC (~50% smaller than WAV → faster upload)
Sends to OpenAI Whisper API (or any compatible endpoint)
Auto-pastes transcribed text into your active application

That's it. No UI wizards, no account signup, no analytics.

Why It's Fast

Most dictation tools waste time on:

Loading heavy local models (Whisper Large = 3GB+ RAM)
Inefficient audio encoding (WAV uploads are 2x bigger)
UI overhead (Electron apps, system dialogs)

Dictator avoids all of this:

SoX for instant FLAC conversion (< 0.4s for typical recordings)
HTTP/2 streaming to Whisper API
Debounced release detection (no accidental double-triggers)
Exponential backoff on rate limits (so you never lose audio)

Typical workflow: Hold Fn → speak 10 seconds → release → text appears in ~1.5-2 seconds total.

Cost: Literally Pennies

OpenAI Whisper API pricing: $0.006 per minute of audio.

Let's say you dictate 30 minutes per day:

Daily cost: $0.18
Monthly cost: ~$5.40

Compare that to:

Dragon Anywhere: $15/month
Otter.ai Pro: $16.99/month
Most "AI dictation" apps: $10-30/month

Plus, you can switch providers (Groq, Cloudflare Workers AI, local Whisper server) by changing one config line.

Technical Highlights (For Devs)

Architecture:

ui.lua: Menubar icon + status indicator
recorder.lua: Push-to-talk state machine (debounced release, audio capture via SoX)
api.lua: HTTP client with retry logic, rate limiting, backoff
config.lua: User-configurable hotkey, API endpoint, model, language

Why Hammerspoon?

Native macOS APIs (no Electron bloat)
Lua is fast enough for this use case
Entire codebase: ~400 lines
Startup overhead: < 10ms

Audio Pipeline:

Fn key held
  → SoX records to /tmp/*.wav
  → Fn released
  → SoX converts to FLAC (~50% compression)
  → POST to Whisper API
  → Paste transcription via hs.eventtap

Error Handling:

Network timeouts → retry with exponential backoff
Rate limits → auto-retry after delay
Audio corruption → shows error notification, keeps recording buffer

When to Use This vs. Alternatives

Use Dictator if:

You want push-to-talk (not always-on listening)
You're okay with API costs (pennies, but not free)
You want full control (API key, provider, model)
You're on macOS and already use/like Hammerspoon

Use something else if:

You need 100% offline (use Whisper.cpp locally)
You want always-on voice commands (use Siri/Talon)
You're on Linux/Windows (Hammerspoon is macOS-only)

Try It

GitHub: Dictator-Speech-to-Text
Cool Website: Dictators Website

MIT licensed. Issues/PRs welcome.

Questions? Drop a comment—happy to explain any technical details or help with setup!

DEV Community