DEV Community

Cover image for I Built a 2KB Dictation Tool That Costs Pennies and Runs Circles Around Apple's Built-in Dictation
Simon Busshart
Simon Busshart

Posted on

I Built a 2KB Dictation Tool That Costs Pennies and Runs Circles Around Apple's Built-in Dictation

TL;DR: Open-source macOS menubar app. Hold Fn → speak → release → text appears. Uses OpenAI Whisper API. No subscriptions. Your API key. ~$0.006 per minute. Built with Hammerspoon + 400 lines of Lua.

The Problem: Why I Built This

Apple's built-in dictation is either limited (offline) or requires full Siri integration (cloud). Third-party dictation apps want $10-30/month subscriptions, run heavy local models that eat RAM, or lock you into their ecosystem.

I wanted something dead simple:

  • Push-to-talk (hold key, speak, release)
  • Instant paste into any active field
  • My own API key (full cost control)
  • Zero bloat (no Electron, no heavy models)
  • Hackable (plain Lua config, open source)

So I built it in a weekend using Hammerspoon.

What It Does

Dictator-Speech-to-Text is a lightweight menubar app that:

  • Records audio while you hold the Fn key (configurable)
  • Compresses to FLAC (~50% smaller than WAV → faster upload)
  • Sends to OpenAI Whisper API (or any compatible endpoint)
  • Auto-pastes transcribed text into your active application

That's it. No UI wizards, no account signup, no analytics.

Why It's Fast

Most dictation tools waste time on:

  • Loading heavy local models (Whisper Large = 3GB+ RAM)
  • Inefficient audio encoding (WAV uploads are 2x bigger)
  • UI overhead (Electron apps, system dialogs)

Dictator avoids all of this:

  • SoX for instant FLAC conversion (< 0.4s for typical recordings)
  • HTTP/2 streaming to Whisper API
  • Debounced release detection (no accidental double-triggers)
  • Exponential backoff on rate limits (so you never lose audio)

Typical workflow: Hold Fn → speak 10 seconds → release → text appears in ~1.5-2 seconds total.

Cost: Literally Pennies

OpenAI Whisper API pricing: $0.006 per minute of audio.

Let's say you dictate 30 minutes per day:

  • Daily cost: $0.18
  • Monthly cost: ~$5.40

Compare that to:

  • Dragon Anywhere: $15/month
  • Otter.ai Pro: $16.99/month
  • Most "AI dictation" apps: $10-30/month

Plus, you can switch providers (Groq, Cloudflare Workers AI, local Whisper server) by changing one config line.

Technical Highlights (For Devs)

Architecture:

  • ui.lua: Menubar icon + status indicator
  • recorder.lua: Push-to-talk state machine (debounced release, audio capture via SoX)
  • api.lua: HTTP client with retry logic, rate limiting, backoff
  • config.lua: User-configurable hotkey, API endpoint, model, language

Why Hammerspoon?

  • Native macOS APIs (no Electron bloat)
  • Lua is fast enough for this use case
  • Entire codebase: ~400 lines
  • Startup overhead: < 10ms

Audio Pipeline:

Fn key held
  → SoX records to /tmp/*.wav
  → Fn released
  → SoX converts to FLAC (~50% compression)
  → POST to Whisper API
  → Paste transcription via hs.eventtap
Enter fullscreen mode Exit fullscreen mode

Error Handling:

  • Network timeouts → retry with exponential backoff
  • Rate limits → auto-retry after delay
  • Audio corruption → shows error notification, keeps recording buffer

When to Use This vs. Alternatives

Use Dictator if:

  • You want push-to-talk (not always-on listening)
  • You're okay with API costs (pennies, but not free)
  • You want full control (API key, provider, model)
  • You're on macOS and already use/like Hammerspoon

Use something else if:

  • You need 100% offline (use Whisper.cpp locally)
  • You want always-on voice commands (use Siri/Talon)
  • You're on Linux/Windows (Hammerspoon is macOS-only)

Try It

GitHub: Dictator-Speech-to-Text
Cool Website: Dictators Website

MIT licensed. Issues/PRs welcome.


Questions? Drop a comment—happy to explain any technical details or help with setup!

Top comments (0)