<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simon Busshart</title>
    <description>The latest articles on DEV Community by Simon Busshart (@simon_busshart_478f2366bc).</description>
    <link>https://dev.to/simon_busshart_478f2366bc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3418804%2F981d7f15-0fcf-41a7-962a-324a4b4d39d5.jpg</url>
      <title>DEV Community: Simon Busshart</title>
      <link>https://dev.to/simon_busshart_478f2366bc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/simon_busshart_478f2366bc"/>
    <language>en</language>
    <item>
      <title>I Built a 2KB Dictation Tool That Costs Pennies and Runs Circles Around Apple's Built-in Dictation</title>
      <dc:creator>Simon Busshart</dc:creator>
      <pubDate>Thu, 22 Jan 2026 08:16:38 +0000</pubDate>
      <link>https://dev.to/simon_busshart_478f2366bc/i-built-a-2kb-dictation-tool-that-costs-pennies-and-runs-circles-around-apples-built-in-dictation-27d0</link>
      <guid>https://dev.to/simon_busshart_478f2366bc/i-built-a-2kb-dictation-tool-that-costs-pennies-and-runs-circles-around-apples-built-in-dictation-27d0</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Open-source macOS menubar app. Hold Fn → speak → release → text appears. Uses OpenAI Whisper API. No subscriptions. Your API key. ~$0.006 per minute. Built with Hammerspoon + 400 lines of Lua.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Why I Built This
&lt;/h2&gt;

&lt;p&gt;Apple's built-in dictation is either limited (offline) or requires full Siri integration (cloud). Third-party dictation apps want $10-30/month subscriptions, run heavy local models that eat RAM, or lock you into their ecosystem.&lt;/p&gt;

&lt;p&gt;I wanted something dead simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Push-to-talk&lt;/strong&gt; (hold key, speak, release)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instant paste&lt;/strong&gt; into any active field&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My own API key&lt;/strong&gt; (full cost control)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero bloat&lt;/strong&gt; (no Electron, no heavy models)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hackable&lt;/strong&gt; (plain Lua config, open source)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built it in a weekend using Hammerspoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dictator-Speech-to-Text&lt;/strong&gt; is a lightweight menubar app that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Records audio while you hold the Fn key (configurable)&lt;/li&gt;
&lt;li&gt;Compresses to FLAC (~50% smaller than WAV → faster upload)&lt;/li&gt;
&lt;li&gt;Sends to OpenAI Whisper API (or any compatible endpoint)&lt;/li&gt;
&lt;li&gt;Auto-pastes transcribed text into your active application&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;That's it.&lt;/strong&gt; No UI wizards, no account signup, no analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Fast
&lt;/h2&gt;

&lt;p&gt;Most dictation tools waste time on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loading heavy local models (Whisper Large = 3GB+ RAM)&lt;/li&gt;
&lt;li&gt;Inefficient audio encoding (WAV uploads are 2x bigger)&lt;/li&gt;
&lt;li&gt;UI overhead (Electron apps, system dialogs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dictator avoids all of this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SoX&lt;/strong&gt; for instant FLAC conversion (&amp;lt; 0.4s for typical recordings)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP/2 streaming&lt;/strong&gt; to Whisper API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debounced release detection&lt;/strong&gt; (no accidental double-triggers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exponential backoff&lt;/strong&gt; on rate limits (so you never lose audio)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical workflow: Hold Fn → speak 10 seconds → release → text appears in ~1.5-2 seconds total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost: Literally Pennies
&lt;/h2&gt;

&lt;p&gt;OpenAI Whisper API pricing: &lt;strong&gt;$0.006 per minute&lt;/strong&gt; of audio.&lt;/p&gt;

&lt;p&gt;Let's say you dictate 30 minutes per day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily cost: $0.18&lt;/li&gt;
&lt;li&gt;Monthly cost: ~$5.40&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare that to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dragon Anywhere: $15/month&lt;/li&gt;
&lt;li&gt;Otter.ai Pro: $16.99/month&lt;/li&gt;
&lt;li&gt;Most "AI dictation" apps: $10-30/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus, you can switch providers (Groq, Cloudflare Workers AI, local Whisper server) by changing one config line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Highlights (For Devs)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ui.lua&lt;/strong&gt;: Menubar icon + status indicator&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;recorder.lua&lt;/strong&gt;: Push-to-talk state machine (debounced release, audio capture via SoX)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;api.lua&lt;/strong&gt;: HTTP client with retry logic, rate limiting, backoff&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;config.lua&lt;/strong&gt;: User-configurable hotkey, API endpoint, model, language&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Hammerspoon?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Native macOS APIs (no Electron bloat)&lt;/li&gt;
&lt;li&gt;Lua is fast enough for this use case&lt;/li&gt;
&lt;li&gt;Entire codebase: ~400 lines&lt;/li&gt;
&lt;li&gt;Startup overhead: &amp;lt; 10ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Audio Pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fn key held
  → SoX records to /tmp/*.wav
  → Fn released
  → SoX converts to FLAC (~50% compression)
  → POST to Whisper API
  → Paste transcription via hs.eventtap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Error Handling:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network timeouts → retry with exponential backoff&lt;/li&gt;
&lt;li&gt;Rate limits → auto-retry after delay&lt;/li&gt;
&lt;li&gt;Audio corruption → shows error notification, keeps recording buffer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to Use This vs. Alternatives
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Dictator if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want push-to-talk (not always-on listening)&lt;/li&gt;
&lt;li&gt;You're okay with API costs (pennies, but not free)&lt;/li&gt;
&lt;li&gt;You want full control (API key, provider, model)&lt;/li&gt;
&lt;li&gt;You're on macOS and already use/like Hammerspoon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use something else if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need 100% offline (use Whisper.cpp locally)&lt;/li&gt;
&lt;li&gt;You want always-on voice commands (use Siri/Talon)&lt;/li&gt;
&lt;li&gt;You're on Linux/Windows (Hammerspoon is macOS-only)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Glossardi/Dictator-Speech-to-Text" rel="noopener noreferrer"&gt;Dictator-Speech-to-Text&lt;/a&gt;&lt;br&gt;
Cool Website: &lt;a href="https://dictator.click" rel="noopener noreferrer"&gt;Dictators Website&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. Issues/PRs welcome.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Questions?&lt;/strong&gt; Drop a comment—happy to explain any technical details or help with setup!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>hammerspoon</category>
      <category>whisper</category>
      <category>lua</category>
    </item>
  </channel>
</rss>
