<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ondrej Machala</title>
    <description>The latest articles on DEV Community by Ondrej Machala (@omachala).</description>
    <link>https://dev.to/omachala</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3724472%2F05d15299-914a-440a-ac2c-efa12c49e5da.jpeg</url>
      <title>DEV Community: Ondrej Machala</title>
      <link>https://dev.to/omachala</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/omachala"/>
    <language>en</language>
    <item>
      <title>How to Set Up Diction: The Self-Hosted Speech-to-Text Alternative to Wispr Flow</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Sat, 18 Apr 2026 10:44:08 +0000</pubDate>
      <link>https://dev.to/omachala/how-to-set-up-diction-the-self-hosted-speech-to-text-alternative-to-wispr-flow-20km</link>
      <guid>https://dev.to/omachala/how-to-set-up-diction-the-self-hosted-speech-to-text-alternative-to-wispr-flow-20km</guid>
      <description>&lt;p&gt;This article is about getting your own private speech-to-text on your iPhone. Tap a key, speak, watch the words land in whatever app you're in. No cloud in the middle, no subscription, no company on the other end reading what you said. The keyboard is &lt;a href="https://apps.apple.com/us/app/diction-ai-voice-keyboard/id6759807364" rel="noopener noreferrer"&gt;Diction&lt;/a&gt;. This post is the full setup, start to finish, blank machine to working dictation in under thirty minutes.&lt;/p&gt;

&lt;p&gt;I built the server side for myself:&lt;br&gt;
&lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;https://github.com/omachala/diction&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I talk to my AI agents all day. Claude in the terminal, my &lt;a href="https://dev.to/omachala/i-run-an-ai-agent-in-telegram-all-day-i-stopped-typing-to-it-3g7o"&gt;Telegram bot OpenClaw&lt;/a&gt;, a handful of others. Voice for everything. Long prompts, half-formed plans, emails I want rewritten, code I want reviewed. Every word used to pass through someone else's transcription cloud before my own agents ever heard it. Not anymore.&lt;/p&gt;

&lt;p&gt;A small Docker stack on a box at home now handles the transcription. An optional cleanup step scrubs filler words and fixes punctuation using any LLM you want: OpenAI, Groq, a local Ollama model, anything OpenAI-compatible.&lt;/p&gt;

&lt;p&gt;Every command is below.&lt;/p&gt;
&lt;h2&gt;
  
  
  What You'll End Up With
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A box at home running the speech model, 24/7&lt;/li&gt;
&lt;li&gt;Your iPhone sending audio to it over your home WiFi&lt;/li&gt;
&lt;li&gt;Optional: an LLM of your choice for cleaning up filler words and fixing punctuation (OpenAI, Groq, Anthropic, a local Ollama model, anything with an OpenAI-compatible API)&lt;/li&gt;
&lt;li&gt;Total running cost with cleanup on: depends on the LLM you pick. Roughly a cent per hour of dictation on &lt;code&gt;gpt-4o-mini&lt;/code&gt;, zero if you run a local model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The speech part is free forever. The cleanup part costs whatever your LLM provider charges. Use a local model and pay nothing. More on that at the end.&lt;/p&gt;
&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Any machine that can run Docker: Mac mini, an old laptop, a home server in a closet, a NUC, a home lab box. Apple Silicon or any modern x86 works fine. Raspberry Pi is a stretch for the speech part. Anything newer is comfortable.&lt;/li&gt;
&lt;li&gt;An iPhone running iOS 17 or newer&lt;/li&gt;
&lt;li&gt;Both on the same WiFi network&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Optional:&lt;/em&gt; an API key for any OpenAI-compatible LLM (OpenAI, Groq, Together, Anthropic via a proxy, Ollama running locally, etc.) if you want AI cleanup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll assume you know what Docker is and how to open a terminal. That's it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: Install Docker
&lt;/h2&gt;

&lt;p&gt;You need Docker Engine plus Docker Compose. Both come bundled in Docker Desktop on Mac and Windows. On Linux you install them separately (they're both free and open source).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS (Intel or Apple Silicon):&lt;/strong&gt; Download &lt;a href="https://www.docker.com/products/docker-desktop/" rel="noopener noreferrer"&gt;Docker Desktop&lt;/a&gt;, open the &lt;code&gt;.dmg&lt;/code&gt;, drag the whale icon to Applications, launch it. The first run asks for admin credentials (it needs to install a helper tool and set up networking). When the whale icon in the menu bar stops animating and says "Docker Desktop is running", you're ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt; Download &lt;a href="https://www.docker.com/products/docker-desktop/" rel="noopener noreferrer"&gt;Docker Desktop&lt;/a&gt;. The installer will enable WSL2 if it's not already on - this is required, and needs a reboot. After the reboot, launch Docker Desktop. Same whale icon in the system tray tells you when it's ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linux:&lt;/strong&gt; Either install Docker Desktop (same download page) or go with the native packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ubuntu / Debian&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;docker.io docker-compose-plugin

&lt;span class="c"&gt;# Fedora / RHEL&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;dnf &lt;span class="nb"&gt;install &lt;/span&gt;docker docker-compose-plugin

&lt;span class="c"&gt;# Arch&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;pacman &lt;span class="nt"&gt;-S&lt;/span&gt; docker docker-compose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the service and add your user to the &lt;code&gt;docker&lt;/code&gt; group so you don't need &lt;code&gt;sudo&lt;/code&gt; every time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; docker
&lt;span class="nb"&gt;sudo &lt;/span&gt;usermod &lt;span class="nt"&gt;-aG&lt;/span&gt; docker &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$USER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Log out and back in (or reboot) so the group change takes effect. Yes, you really need to log out. Running &lt;code&gt;newgrp docker&lt;/code&gt; works too but only in the current shell.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verify it's all working:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nt"&gt;--version&lt;/span&gt;
docker compose version
docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; hello-world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last command pulls a tiny test image and prints a greeting. If it fails with "permission denied" on Linux, you skipped the log-out-and-back-in step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apple Silicon users, one extra thing:&lt;/strong&gt; open Docker Desktop → Settings → General and make sure "Use Rosetta for x86/amd64 emulation" is enabled. This is the default on recent Docker Desktop builds. The Diction gateway image is built for amd64 (multi-arch is on the roadmap), so Docker needs Rosetta to run it on your M1/M2/M3/M4. Performance impact is negligible - the speech model image is multi-arch and runs natively on arm64, so Rosetta is only handling the small Go binary in front of it.&lt;/p&gt;

&lt;p&gt;While you're in Settings, also check &lt;strong&gt;Resources → Memory&lt;/strong&gt;. The default Docker Desktop VM ships with 2 GB, which is tight for &lt;code&gt;medium&lt;/code&gt; (~2.1 GB) and will OOM silently. Bump to 4 GB if you're running anything above &lt;code&gt;small&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Create a Project Folder
&lt;/h2&gt;

&lt;p&gt;Pick a home for the compose file and any supporting config. Anywhere works. I use &lt;code&gt;~/diction&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/diction &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ~/diction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything in the rest of this article assumes you're sitting in that folder. Docker Compose looks for &lt;code&gt;docker-compose.yml&lt;/code&gt; in the current directory, so all the &lt;code&gt;docker compose&lt;/code&gt; commands Just Work as long as you &lt;code&gt;cd ~/diction&lt;/code&gt; first.&lt;/p&gt;

&lt;p&gt;If you're setting this up on a remote server (Linux box in a closet, NUC, etc.), SSH in and run the same command there. Where you edit the file is up to you: &lt;code&gt;nano docker-compose.yml&lt;/code&gt; on the server, VSCode Remote-SSH, or editing locally and &lt;code&gt;scp&lt;/code&gt;-ing the file over. All fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Write the Compose File
&lt;/h2&gt;

&lt;p&gt;Here's what we're about to spin up. Two containers working together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;Diction Gateway&lt;/a&gt;&lt;/strong&gt;. The open-source Go service at the front of the stack. On the outside it speaks the standard OpenAI transcription API (&lt;code&gt;POST /v1/audio/transcriptions&lt;/code&gt;), which is what the Diction iPhone app talks to. On the inside it routes your audio to whichever speech model you've loaded, and optionally passes the transcript through an LLM for cleanup. The source is on GitHub, MIT licensed. Small, boring Go. Read it, fork it, bend it to your needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A voice model&lt;/strong&gt;. The engine that actually turns audio into text. For this starter stack we're using &lt;code&gt;faster-whisper&lt;/code&gt; - a compact, battle-tested open-source model that ships in sizes &lt;code&gt;tiny&lt;/code&gt;, &lt;code&gt;base&lt;/code&gt;, &lt;code&gt;small&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;large-v3&lt;/code&gt;, and &lt;code&gt;large-v3-turbo&lt;/code&gt;. Bigger means more accurate and slower. We'll run &lt;code&gt;small&lt;/code&gt;. It's the sweet spot for CPU-only machines: accurate enough for real dictation, transcribes a 5-second clip in 1 to 2 seconds on a modern Mac mini or NUC.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've got an NVIDIA GPU sitting in the machine, you can skip &lt;code&gt;small&lt;/code&gt; and run something far better (Parakeet or &lt;code&gt;large-v3-turbo&lt;/code&gt;). Jump to the "Got an NVIDIA GPU Sitting Idle?" section below before you paste the compose file. Otherwise continue here.&lt;/p&gt;

&lt;p&gt;Paste this into &lt;code&gt;~/diction/docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;whisper-small&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-whisper-small&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;whisper-models:/root/.cache/huggingface&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-small&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__INFERENCE_DEVICE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;

  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-gateway&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;whisper-small&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;small&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;whisper-models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What each line does
&lt;/h3&gt;

&lt;p&gt;Quick tour so you know what you're pasting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;whisper-small&lt;/code&gt; service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;image: fedirz/faster-whisper-server:latest-cpu&lt;/code&gt;. The voice model engine. &lt;code&gt;faster-whisper&lt;/code&gt; is a C++/CTranslate2 reimplementation of the original open-source voice model from OpenAI, running 4x faster with less memory. &lt;code&gt;fedirz/faster-whisper-server&lt;/code&gt; wraps it in a small Python server that speaks the OpenAI transcription API. The &lt;code&gt;-cpu&lt;/code&gt; tag is the CPU build. There's also a &lt;code&gt;-cuda&lt;/code&gt; tag for NVIDIA users (see the GPU section below).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container_name: diction-whisper-small&lt;/code&gt;. Just a friendly name so &lt;code&gt;docker ps&lt;/code&gt; shows something readable instead of a random string.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;restart: unless-stopped&lt;/code&gt;. If the container crashes or the host reboots, Docker brings it back. The only thing that stops it is you explicitly running &lt;code&gt;docker compose down&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;volumes: - whisper-models:/root/.cache/huggingface&lt;/code&gt;. The model weights are downloaded on first start (about 500MB for &lt;code&gt;small&lt;/code&gt;). This volume persists them across container rebuilds, so you don't re-download every time you pull a newer image.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WHISPER__MODEL: Systran/faster-whisper-small&lt;/code&gt;. The specific voice model to load. It's a HuggingFace repo ID. You can swap this for any CT2-compatible voice model.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WHISPER__INFERENCE_DEVICE: cpu&lt;/code&gt;. Tells it to run on CPU. Swap to &lt;code&gt;cuda&lt;/code&gt; if you've got an NVIDIA card (full example in the GPU section below).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;gateway&lt;/code&gt; service:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;image: ghcr.io/omachala/diction-gateway:latest&lt;/code&gt;. The Diction gateway from GitHub Container Registry.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;platform: linux/amd64&lt;/code&gt;. The current published image is amd64-only. On Apple Silicon, Docker will run it under Rosetta transparently. Drop this line on a native x86 host if you want the error message to be slightly tidier on &lt;code&gt;docker compose config&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ports: - "8080:8080"&lt;/code&gt;. Maps port 8080 on the host to 8080 in the container. This is the one your iPhone will talk to. If 8080 is already in use on your machine, change the left side: &lt;code&gt;"18080:8080"&lt;/code&gt; and use &lt;code&gt;http://your-ip:18080&lt;/code&gt; from the phone.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;depends_on: - whisper-small&lt;/code&gt;. Docker starts the whisper container first so the gateway doesn't throw connection-refused on startup. Not strictly required (the gateway retries), but makes logs cleaner.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DEFAULT_MODEL: small&lt;/code&gt;. The model the gateway routes to when the iPhone sends a request without specifying one. The gateway has a built-in mapping of short names (&lt;code&gt;small&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;large-v3-turbo&lt;/code&gt;, &lt;code&gt;parakeet-v3&lt;/code&gt;) to backend service URLs. Setting &lt;code&gt;DEFAULT_MODEL: small&lt;/code&gt; makes it expect a service named &lt;code&gt;whisper-small&lt;/code&gt; on port 8000. This is why the first service is named &lt;code&gt;whisper-small&lt;/code&gt; and not &lt;code&gt;whisper&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;volumes:&lt;/code&gt; block at the bottom:&lt;/strong&gt; declares the named volume Docker uses for the model cache. Named volumes are managed by Docker itself and survive container rebuilds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model sizes and what to pick
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;small&lt;/code&gt; is the starter. It's accurate enough for everyday dictation and fits comfortably on any modern laptop or NUC. If you want something else, swap &lt;code&gt;WHISPER__MODEL&lt;/code&gt; in the compose file:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;CPU latency (5s clip)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Systran/faster-whisper-tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;39M&lt;/td&gt;
&lt;td&gt;~350 MB&lt;/td&gt;
&lt;td&gt;1-2s&lt;/td&gt;
&lt;td&gt;Fast, lower accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Systran/faster-whisper-small&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;244M&lt;/td&gt;
&lt;td&gt;~850 MB&lt;/td&gt;
&lt;td&gt;3-4s&lt;/td&gt;
&lt;td&gt;Sweet spot for CPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Systran/faster-whisper-medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;769M&lt;/td&gt;
&lt;td&gt;~2.1 GB&lt;/td&gt;
&lt;td&gt;8-12s&lt;/td&gt;
&lt;td&gt;More accurate, slow on CPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deepdml/faster-whisper-large-v3-turbo-ct2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;809M&lt;/td&gt;
&lt;td&gt;~2.3 GB&lt;/td&gt;
&lt;td&gt;&amp;lt;2s on GPU&lt;/td&gt;
&lt;td&gt;Best with NVIDIA&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency numbers are from my own homelab (AMD Ryzen 9 7940HS, CPU-only). Apple Silicon is in the same ballpark: fast enough for &lt;code&gt;small&lt;/code&gt; to feel instant, slow enough that &lt;code&gt;medium&lt;/code&gt; will make you wait.&lt;/p&gt;

&lt;p&gt;Two rules when switching models:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Also change &lt;code&gt;DEFAULT_MODEL&lt;/code&gt; on the gateway to match one of: &lt;code&gt;tiny&lt;/code&gt;, &lt;code&gt;small&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;large-v3-turbo&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Rename the service to the one the gateway expects: &lt;code&gt;whisper-tiny&lt;/code&gt;, &lt;code&gt;whisper-small&lt;/code&gt;, &lt;code&gt;whisper-medium&lt;/code&gt;, or &lt;code&gt;whisper-large-turbo&lt;/code&gt;. The gateway looks up its backend by service hostname.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Skip either and the gateway will give you a 404 when the app asks for a model.&lt;/p&gt;

&lt;h3&gt;
  
  
  One caveat for Mac mini / Apple Silicon users
&lt;/h3&gt;

&lt;p&gt;Docker on macOS runs everything inside a Linux VM. That VM can't reach Apple's GPU or Neural Engine. Containers are CPU-only regardless of how nice your M4's GPU is. Sounds bad on paper, but for dictation workloads you won't feel it: the &lt;code&gt;small&lt;/code&gt; voice model handles a short sentence well under five seconds on an M-series CPU. Longer dictations scale linearly. If you want GPU speed, either (a) run a Linux box with an NVIDIA card and keep the Mac as a client, or (b) use Diction's on-device mode on the iPhone itself (Core ML on the Neural Engine).&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Start Everything
&lt;/h2&gt;

&lt;p&gt;Make sure you're in the project folder, then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-d&lt;/code&gt; flag runs the containers in the background (detached mode).&lt;/p&gt;

&lt;p&gt;On the first run this takes a minute or two. Docker pulls two images from their registries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;fedirz/faster-whisper-server:latest-cpu&lt;/code&gt; - about 1.7 GB, includes the Python runtime and CTranslate2 binaries&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ghcr.io/omachala/diction-gateway:latest&lt;/code&gt; - about 210 MB, a compiled Go binary plus ffmpeg for audio conversion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After the pulls finish, the voice model container does one more thing on first boot: it downloads the model weights from HuggingFace into the &lt;code&gt;whisper-models&lt;/code&gt; volume (&lt;code&gt;~500 MB&lt;/code&gt; for &lt;code&gt;small&lt;/code&gt;). Subsequent restarts skip this step - the volume is persistent. That's why there's a &lt;code&gt;volumes:&lt;/code&gt; block in the compose file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Check everything is healthy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see both services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;NAME                     STATUS
diction-gateway          Up 30 seconds
diction-whisper-small    Up 30 seconds (health: starting)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;health: starting&lt;/code&gt; on the whisper container is normal for the first couple of minutes. It's loading the model into RAM. Once that's done, the status will flip to &lt;code&gt;Up (healthy)&lt;/code&gt; or just &lt;code&gt;Up&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Watching logs
&lt;/h3&gt;

&lt;p&gt;If something looks wrong, look at the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose logs &lt;span class="nt"&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-f&lt;/code&gt; follows them in real time. Ctrl+C to detach.&lt;/p&gt;

&lt;p&gt;You can also tail a single service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose logs &lt;span class="nt"&gt;-f&lt;/span&gt; gateway
docker compose logs &lt;span class="nt"&gt;-f&lt;/span&gt; whisper-small
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What healthy logs look like&lt;/strong&gt; (abbreviated):&lt;/p&gt;

&lt;p&gt;Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"gateway starting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"8080"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"backend registered"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"small"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"http://whisper-small:8000"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whisper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Common early errors:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pull access denied&lt;/code&gt; on the gateway image. A stale GitHub Container Registry token is cached in your Docker config (on macOS, usually in the login keychain from a past &lt;code&gt;docker login&lt;/code&gt;). Run &lt;code&gt;docker logout ghcr.io&lt;/code&gt; - yes, even if you don't think you're logged in - and try again.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;exec format error&lt;/code&gt; on Apple Silicon. Rosetta isn't enabled. Go back to Docker Desktop → Settings → General and flip the Rosetta option on.&lt;/li&gt;
&lt;li&gt;The voice model container stuck on &lt;code&gt;health: starting&lt;/code&gt; for more than 3 minutes. Usually means it's still downloading weights on a slow connection. Check &lt;code&gt;docker compose logs -f whisper-small&lt;/code&gt; to see the download progress.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stopping and restarting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose stop        &lt;span class="c"&gt;# stop containers, keep their state&lt;/span&gt;
docker compose start       &lt;span class="c"&gt;# start them again&lt;/span&gt;
docker compose down        &lt;span class="c"&gt;# stop and remove containers (volumes survive)&lt;/span&gt;
docker compose down &lt;span class="nt"&gt;-v&lt;/span&gt;     &lt;span class="c"&gt;# stop, remove containers AND volumes (re-downloads weights)&lt;/span&gt;
docker compose pull        &lt;span class="c"&gt;# get newer images&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;       &lt;span class="c"&gt;# apply pulls / config changes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model cache in the &lt;code&gt;whisper-models&lt;/code&gt; volume is shared across rebuilds, so &lt;code&gt;docker compose pull &amp;amp;&amp;amp; docker compose up -d&lt;/code&gt; to upgrade is a ~30-second operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Test It
&lt;/h2&gt;

&lt;p&gt;Before you go anywhere near the iPhone, prove the server itself works. A broken stack is easier to debug from a terminal than from a keyboard extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Get an audio file
&lt;/h3&gt;

&lt;p&gt;The quickest path: use your phone's built-in &lt;strong&gt;Voice Memos&lt;/strong&gt; app. Record yourself saying "Hello from my home server." Hit stop. Share → &lt;strong&gt;Save to Files&lt;/strong&gt;, or AirDrop to your Mac, or email it to yourself. You want the &lt;code&gt;.m4a&lt;/code&gt; file on the same machine that's running the containers.&lt;/p&gt;

&lt;p&gt;On Linux without a phone handy, record with &lt;code&gt;arecord&lt;/code&gt; or &lt;code&gt;sox&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 5 seconds of 16-bit mono WAV at 16 kHz - whisper's native format&lt;/span&gt;
arecord &lt;span class="nt"&gt;-f&lt;/span&gt; S16_LE &lt;span class="nt"&gt;-r&lt;/span&gt; 16000 &lt;span class="nt"&gt;-c&lt;/span&gt; 1 &lt;span class="nt"&gt;-d&lt;/span&gt; 5 voice-memo.wav
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On macOS, skip recording altogether and let the system generate a clip with &lt;code&gt;say&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;say &lt;span class="nt"&gt;-o&lt;/span&gt; voice-memo.aiff &lt;span class="s2"&gt;"Hello from my home server"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you an &lt;code&gt;.aiff&lt;/code&gt; the gateway accepts directly. Handy for scripted testing where you don't feel like holding a microphone.&lt;/p&gt;

&lt;p&gt;No microphone and no speech synth? Grab any short speech clip you have lying around. MP3, WAV, M4A, AIFF, FLAC, Ogg - they all work. The voice model handles re-encoding internally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hit the gateway
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/v1/audio/transcriptions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@voice-memo.m4a"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"model=small"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll get back something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Hello from my home server."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole speech pipeline. Running on your hardware. Your audio never left the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ask for different response formats
&lt;/h3&gt;

&lt;p&gt;The same endpoint supports &lt;code&gt;response_format=text&lt;/code&gt; if you'd rather have a plain string (useful if you're piping it into a shell):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/v1/audio/transcriptions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@voice-memo.m4a"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"model=small"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"response_format=text"&lt;/span&gt;
&lt;span class="c"&gt;# → Hello from my home server.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check the response headers
&lt;/h3&gt;

&lt;p&gt;The gateway adds timing info to the response headers - useful for benchmarking without reading logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; - &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  http://localhost:8080/v1/audio/transcriptions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@voice-memo.m4a"&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"model=small"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;X-Diction-Whisper-Ms&lt;/code&gt; - how many milliseconds the speech model took&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-Diction-LLM-Ms&lt;/code&gt; - appears only if you've enabled the cleanup step in Step 7&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Talk to it from Python
&lt;/h3&gt;

&lt;p&gt;Since the gateway speaks the OpenAI transcription API, the official &lt;code&gt;openai&lt;/code&gt; Python SDK works against it directly. Useful if you want to script transcriptions from a laptop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://192.168.1.42:8080/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anything&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# the gateway doesn't check this by default
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice-memo.m4a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same story with the Node SDK, LangChain, or any other tool that expects OpenAI's speech API. Diction becomes a drop-in local replacement for &lt;code&gt;api.openai.com/v1/audio/transcriptions&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  If the test fails
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection refused.&lt;/strong&gt; The gateway container isn't running. &lt;code&gt;docker compose ps&lt;/code&gt; to confirm.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;504 Gateway Timeout.&lt;/strong&gt; The whisper container is still starting (model loading into RAM). Give it another 60 seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;400 Bad Request: "invalid audio file".&lt;/strong&gt; Your file is corrupted or in a format whisper doesn't understand. Try a freshly recorded clip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;404 Not Found.&lt;/strong&gt; You probably have a typo in the URL. The path is exactly &lt;code&gt;/v1/audio/transcriptions&lt;/code&gt; - plural, with &lt;code&gt;/v1/&lt;/code&gt; prefix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empty response / hang.&lt;/strong&gt; The voice model container crashed out of memory mid-transcription. Check &lt;code&gt;docker compose logs whisper-small&lt;/code&gt;. &lt;code&gt;small&lt;/code&gt; should be fine on any machine with 2GB of free RAM; if you upgraded to &lt;code&gt;medium&lt;/code&gt; and the host doesn't have 3GB free, it'll OOM.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 6: Find Your Server's LAN IP
&lt;/h2&gt;

&lt;p&gt;Your iPhone needs an address to reach this. Your server probably has two kinds: a public IP (facing the internet, you don't want to use that) and a private LAN IP (on your home WiFi, that's the one).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipconfig getifaddr en0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;en0&lt;/code&gt; is usually Wi-Fi on laptops and the built-in Ethernet on desktops. If it prints nothing (you're wired via a USB-C dongle, or on a Mac mini with Wi-Fi off), the right interface is somewhere else - try &lt;code&gt;en1&lt;/code&gt;, &lt;code&gt;en4&lt;/code&gt;, &lt;code&gt;en5&lt;/code&gt;. Quickest catch-all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ifconfig | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'inet '&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; 127.0.0.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick the &lt;code&gt;192.168.x.x&lt;/code&gt; or &lt;code&gt;10.x.x.x&lt;/code&gt; address. Ignore anything starting with &lt;code&gt;100.&lt;/code&gt; - that's Tailscale, not your LAN.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linux:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;hostname&lt;/span&gt; &lt;span class="nt"&gt;-I&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $1}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, if you want a specific interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip &lt;span class="nt"&gt;-4&lt;/span&gt; addr show wlan0 | &lt;span class="nb"&gt;grep &lt;/span&gt;inet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;ipconfig&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;findstr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IPv4&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll get something like &lt;code&gt;192.168.1.42&lt;/code&gt;. Write it down. This is what you'll paste into the Diction app in Step 8.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pin it so it doesn't drift
&lt;/h3&gt;

&lt;p&gt;Your router hands out IPs via DHCP, which means the one you just wrote down might change next time the server reboots (or when the lease expires). Two ways to keep it stable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;DHCP reservation.&lt;/strong&gt; Log into your router's admin page (usually &lt;code&gt;192.168.1.1&lt;/code&gt;, &lt;code&gt;192.168.0.1&lt;/code&gt;, or &lt;code&gt;10.0.0.1&lt;/code&gt;). Find the DHCP client list, locate your server by hostname or MAC address, and click the "reserve" / "static" option. From then on, your router will always hand out that same IP to that machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static IP on the machine.&lt;/strong&gt; On Linux, edit &lt;code&gt;/etc/netplan/&lt;/code&gt; or use your distro's network manager. On macOS, System Settings → Network → Wi-Fi → Details → TCP/IP → Configure IPv4 → Using DHCP with manual address. More work, more fragile. The router method is better.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you'd rather not deal with IPs at all and your setup is more portable (laptop moving between networks, for example), skip ahead to the "Reach It From Anywhere" section. Tailscale gives every machine a stable private address that follows it around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Add AI Cleanup (Optional but Nice)
&lt;/h2&gt;

&lt;p&gt;Skip this step and your dictation still works. You'll get raw transcription, which is usually 95% right. The remaining 5% is filler words ("um", "like"), missing commas, misheard homophones ("their" vs "there"), and sometimes a full sentence with no punctuation. AI cleanup fixes all of that before your agent ever sees it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does
&lt;/h3&gt;

&lt;p&gt;You say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;so um basically the meeting went well and uh they agreed to the timeline&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The gateway hands that to the LLM, which returns:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The meeting went well. They agreed to the timeline.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the whole feature. Any OpenAI-compatible LLM works - OpenAI's own models, Groq, Anthropic (via a compatibility proxy), Together, Fireworks, a local Ollama install, anything that speaks &lt;code&gt;POST /chat/completions&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iPhone → gateway → voice model → raw transcript
                              ↓
                    your LLM (chat/completions)
                              ↓
                    cleaned text → back to the iPhone
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The iPhone sends &lt;code&gt;?enhance=true&lt;/code&gt; on the request when the app's AI Companion toggle is on. The gateway hits &lt;code&gt;{LLM_BASE_URL}/chat/completions&lt;/code&gt; with your system prompt + the transcript. Whatever comes back gets sent to the iPhone instead of the raw transcript. If the LLM errors out or times out, the gateway falls back to raw - your dictation doesn't break because of a downstream hiccup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Config reference
&lt;/h3&gt;

&lt;p&gt;Four environment variables on the gateway:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Required&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLM_BASE_URL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;OpenAI-compatible endpoint, e.g. &lt;code&gt;https://api.openai.com/v1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLM_MODEL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;Model identifier, e.g. &lt;code&gt;gpt-4o-mini&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLM_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;Bearer token (your provider's API key). Not needed for local Ollama.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LLM_PROMPT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;System prompt. Literal string, or a file path starting with &lt;code&gt;/&lt;/code&gt; if you want a longer one mounted as a volume.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both &lt;code&gt;LLM_BASE_URL&lt;/code&gt; and &lt;code&gt;LLM_MODEL&lt;/code&gt; must be set for cleanup to turn on. Miss either one and the feature silently stays off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: OpenAI (or any OpenAI-compatible provider)
&lt;/h3&gt;

&lt;p&gt;Easiest first step. Get a key at &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;platform.openai.com/api-keys&lt;/a&gt; and add $5 of credit. For cleanup that's hundreds of hours of dictation.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;~/diction/.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OPENAI_API_KEY=sk-your-key-here"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/diction/.env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update the &lt;code&gt;gateway&lt;/code&gt; service in &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-gateway&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;whisper-small&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;small&lt;/span&gt;
      &lt;span class="na"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openai.com/v1"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${OPENAI_API_KEY}"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_PROMPT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clean&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;up&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;transcription.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Remove&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;filler&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;words&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(um,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;uh,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;like).&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;punctuation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;capitalization.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Return&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;only&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cleaned&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;text,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;nothing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;else."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker Compose reads &lt;code&gt;${OPENAI_API_KEY}&lt;/code&gt; from the &lt;code&gt;.env&lt;/code&gt; file in the same folder automatically. No extra flags needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not tied to OpenAI.&lt;/strong&gt; Every major LLM provider exposes the same OpenAI-compatible &lt;code&gt;/chat/completions&lt;/code&gt; endpoint. Swap the three &lt;code&gt;LLM_*&lt;/code&gt; URLs and keys and you're done. A few that work out of the box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.claude.com/en/api/openai-sdk" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt; - Claude models via the OpenAI SDK&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://console.groq.com/keys" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; - fastest inference on the market, generous free tier&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.together.ai/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt; - broad open-model catalog&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fireworks.ai/" rel="noopener noreferrer"&gt;Fireworks&lt;/a&gt; - tuned Llama and Mixtral hosting&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://deepinfra.com/" rel="noopener noreferrer"&gt;DeepInfra&lt;/a&gt; - pay-per-token open models&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; - one key, hundreds of models from every provider&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.mistral.ai/api/" rel="noopener noreferrer"&gt;Mistral&lt;/a&gt; - native OpenAI-compatible endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pick one, drop its &lt;code&gt;LLM_BASE_URL&lt;/code&gt; and &lt;code&gt;LLM_MODEL&lt;/code&gt; into the compose file, same shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: Local with Ollama (zero cost, fully private)
&lt;/h3&gt;

&lt;p&gt;If you've got enough RAM and want nothing leaving your house - not even the transcribed text - run the LLM locally.&lt;/p&gt;

&lt;p&gt;Add a third service to your compose file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-ollama&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama-models:/root/.ollama&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And update the &lt;code&gt;gateway&lt;/code&gt; service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-gateway&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;whisper-small&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;small&lt;/span&gt;
      &lt;span class="na"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://ollama:11434/v1"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma2:9b"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_PROMPT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clean&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;up&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;transcription.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Remove&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;filler&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;words.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;punctuation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;capitalization.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Return&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;only&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cleaned&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;text,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;nothing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;else."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the Ollama volume to the bottom of the file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;whisper-models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama-models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bring it up and pull a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker &lt;span class="nb"&gt;exec &lt;/span&gt;diction-ollama ollama pull gemma2:9b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;LLM_API_KEY&lt;/code&gt; isn't needed - Ollama doesn't check it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Which Ollama model?
&lt;/h4&gt;

&lt;p&gt;Sizes below are memory footprint - &lt;strong&gt;system RAM&lt;/strong&gt; if you run Ollama on CPU, &lt;strong&gt;VRAM&lt;/strong&gt; if you pass a GPU through to the container. Either way the number is the same.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma2:9b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;9B&lt;/td&gt;
&lt;td&gt;~6 GB&lt;/td&gt;
&lt;td&gt;Best editing quality at this size. My pick.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;qwen2.5:7b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;~5 GB&lt;/td&gt;
&lt;td&gt;Strong at following cleanup instructions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;llama3.1:8b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;~5 GB&lt;/td&gt;
&lt;td&gt;Most popular, well-tested.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma3:4b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4B&lt;/td&gt;
&lt;td&gt;~3 GB&lt;/td&gt;
&lt;td&gt;For tighter machines. Still OK for basic cleanup.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Under 7B tends to fail in a specific, annoying way: the model treats your transcript as a question and tries to answer it, instead of cleaning it up. Stick to 7B+ if you can spare the memory.&lt;/p&gt;

&lt;p&gt;If you have an NVIDIA GPU, pass it through to the Ollama container (same reservation block as the voice model GPU example further down) and you'll get 5-10x faster cleanup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Apply the changes
&lt;/h3&gt;

&lt;p&gt;Once your compose file has the &lt;code&gt;LLM_*&lt;/code&gt; variables set, restart the gateway so it picks them up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker Compose detects the env change and recreates only the gateway container. The voice model container (and its loaded model) keeps running.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test the cleanup
&lt;/h3&gt;

&lt;p&gt;Same voice memo as before, with &lt;code&gt;?enhance=true&lt;/code&gt; appended:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"http://localhost:8080/v1/audio/transcriptions?enhance=true"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@voice-memo.m4a"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"model=small"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;?enhance=true&lt;/code&gt; you get the raw transcription. With it, the gateway sends the transcript through the LLM before returning. Quickest sanity check: record yourself saying some filler words ("um, this is uh a test like") and watch them disappear.&lt;/p&gt;

&lt;p&gt;To confirm the LLM is actually running (and wasn't silently disabled because of a missing env var), check the response headers for &lt;code&gt;X-Diction-LLM-Ms&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; - &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"http://localhost:8080/v1/audio/transcriptions?enhance=true"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@voice-memo.m4a"&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"model=small"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; diction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see both &lt;code&gt;X-Diction-Whisper-Ms&lt;/code&gt; and &lt;code&gt;X-Diction-LLM-Ms&lt;/code&gt; in the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dialing in the prompt
&lt;/h3&gt;

&lt;p&gt;The default prompt above is fine for generic cleanup. Adjust it to your taste. Some real prompts I've tried:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conservative cleaner&lt;/strong&gt; (preserves your voice, just fixes obvious errors):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Clean up this voice transcription. Fix punctuation and obvious typos only.
Do not rephrase or change word choice. Return only the cleaned text.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Email-ready rewriter&lt;/strong&gt; (turns rambling into something you could actually send):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rewrite this voice note as a short professional email. Keep the meaning intact.
Return only the rewritten text, no greeting or sign-off.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bullet-pointer&lt;/strong&gt; (for dumping meeting notes):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Convert this voice note into a bulleted list of the key points.
One bullet per idea. Return only the list.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Translator&lt;/strong&gt; (I dictate in English, send in German):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Translate this English voice note into natural German. Return only the translation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Long prompts via a file
&lt;/h3&gt;

&lt;p&gt;If your prompt is more than a one-liner, mount it as a file. Create &lt;code&gt;~/diction/cleanup-prompt.txt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a transcript cleaner.

Rules:
- Remove filler words (um, uh, er, like, you know).
- Fix grammar and punctuation.
- Preserve the speaker's voice and meaning.
- Common speech-to-text errors: "there / their / they're", "affect / effect".
- Do not add a preamble.
- Return only the cleaned text.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mount it into the container and point &lt;code&gt;LLM_PROMPT&lt;/code&gt; at the file path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="c1"&gt;# ... rest of config&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./cleanup-prompt.txt:/config/cleanup-prompt.txt:ro&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openai.com/v1"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${OPENAI_API_KEY}"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_PROMPT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/config/cleanup-prompt.txt"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;LLM_PROMPT&lt;/code&gt; starts with &lt;code&gt;/&lt;/code&gt;, the gateway reads it as a file path. Otherwise it uses the string directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why gpt-4o-mini or a 7B local model instead of something bigger
&lt;/h3&gt;

&lt;p&gt;Cleanup is a simple task. The LLM only needs to polish, not reason. A frontier-tier model is overkill and slower. &lt;code&gt;gpt-4o-mini&lt;/code&gt; (cloud) or &lt;code&gt;gemma2:9b&lt;/code&gt; (local) hit the sweet spot for this workload. Save the expensive models for your actual conversations with the agent downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 8: Install Diction and Point It at Your Server
&lt;/h2&gt;

&lt;p&gt;Server's ready. Time to put the keyboard in front of it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install the app
&lt;/h3&gt;

&lt;p&gt;On your iPhone, open the App Store and install &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Diction&lt;/a&gt;. It's free to download, and the modes you need for self-hosting (the entire point of this article) are free forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  First run
&lt;/h3&gt;

&lt;p&gt;Open the app. It walks you through three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add the keyboard.&lt;/strong&gt; iOS requires you to manually add any third-party keyboard. The app sends you to Settings → General → Keyboard → Keyboards → Add New Keyboard → Diction. Tap "Diction", then go back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Allow Full Access.&lt;/strong&gt; Back in Keyboards, tap "Diction" in the list and flip "Allow Full Access" on. iOS will show a scary-sounding warning. It's required for any keyboard that makes network requests, which Diction has to do (it sends audio to your server). Diction has no QWERTY input, no text logging, and no analytics - there's nothing to capture even if it wanted to. Only the mic audio leaves the phone, and only to the endpoint you configure below. The source for the gateway is on GitHub, so you can audit exactly what the server does with the audio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grant microphone access.&lt;/strong&gt; Back in the app, it asks for mic permission. Yes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Point it at your server
&lt;/h3&gt;

&lt;p&gt;Inside the Diction app:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;Settings&lt;/strong&gt; (gear icon, top right).&lt;/li&gt;
&lt;li&gt;Tap &lt;strong&gt;Mode&lt;/strong&gt;. Choose &lt;strong&gt;Self-Hosted&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Tap &lt;strong&gt;Endpoint&lt;/strong&gt;. Enter &lt;code&gt;http://192.168.1.42:8080&lt;/code&gt; (substituting your server's IP from Step 6).&lt;/li&gt;
&lt;li&gt;Scroll down. If you configured AI cleanup in Step 7, toggle &lt;strong&gt;AI Companion&lt;/strong&gt; on.&lt;/li&gt;
&lt;li&gt;Tap &lt;strong&gt;Test connection&lt;/strong&gt;. You should see a green check within a second or two. If not, see the troubleshooting below.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Take it for a spin
&lt;/h3&gt;

&lt;p&gt;Open any app that accepts text - Telegram, Messages, Notes, Mail, the Safari address bar, whatever. Tap to bring up the keyboard. Long-press the globe icon (bottom-left of the default keyboard) to switch keyboards. Pick Diction.&lt;/p&gt;

&lt;p&gt;You'll see one big mic button. Tap it, talk, release. The audio streams to your server. The transcription arrives back in about as much time as it takes for you to take your finger off the button.&lt;/p&gt;

&lt;p&gt;On a local network, end-to-end latency for a short sentence is typically under a second. Good enough that you stop thinking about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  If it doesn't connect
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Server not running? &lt;code&gt;docker compose ps&lt;/code&gt; on the server.&lt;/li&gt;
&lt;li&gt;iPhone not on the same WiFi as the server.&lt;/li&gt;
&lt;li&gt;IP address typo - re-check what Step 6 returned.&lt;/li&gt;
&lt;li&gt;Firewall blocking port 8080. On Linux with &lt;code&gt;ufw&lt;/code&gt;: &lt;code&gt;sudo ufw allow from 192.168.0.0/16 to any port 8080&lt;/code&gt;. On macOS, System Settings → Network → Firewall. Docker Desktop adds itself to the allow list on install, so inbound on published ports normally works - but if you've previously clicked "Deny" on a firewall prompt for Docker, that choice sticks. Flip it back under "Options…", or temporarily turn the firewall off to confirm that's the cause.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quickest sanity check: open Safari on the iPhone and try &lt;code&gt;http://192.168.1.42:8080/health&lt;/code&gt;. If the browser can't reach it, the app can't either.&lt;/p&gt;

&lt;h3&gt;
  
  
  Now dictate into your agent
&lt;/h3&gt;

&lt;p&gt;Open Telegram. Tap your agent's chat. Tap the globe to switch to the Diction keyboard. Tap the mic. Talk. Release. Your server transcribes, the LLM cleans it up, and the message lands in the composer ready to send. Hit send. Your agent replies. Loop.&lt;/p&gt;

&lt;p&gt;That's the whole point of the exercise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reach It From Anywhere (Not Just Home WiFi)
&lt;/h2&gt;

&lt;p&gt;Right now your dictation only works on your home network. The moment you walk out the door, the iPhone can't reach &lt;code&gt;192.168.1.42&lt;/code&gt; anymore. Three clean ways to fix this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tailscale (my pick)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://tailscale.com/" rel="noopener noreferrer"&gt;Tailscale&lt;/a&gt; builds a private mesh network between your devices over WireGuard. Install it on the server and on the iPhone, sign in to the same account on both, and your phone gets a stable &lt;code&gt;100.x.x.x&lt;/code&gt; address it can use to reach the server from anywhere - cellular, coffee shop WiFi, a plane with WiFi, wherever.&lt;/p&gt;

&lt;p&gt;Server side (Linux):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://tailscale.com/install.sh | sh
&lt;span class="nb"&gt;sudo &lt;/span&gt;tailscale up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On macOS, download the app and run it.&lt;/p&gt;

&lt;p&gt;iPhone side: install the Tailscale app from the App Store, sign in.&lt;/p&gt;

&lt;p&gt;On the server, grab the tailnet IP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tailscale ip &lt;span class="nt"&gt;-4&lt;/span&gt;
&lt;span class="c"&gt;# → 100.64.1.42&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Back in the Diction app, change the Endpoint from &lt;code&gt;http://192.168.1.42:8080&lt;/code&gt; to &lt;code&gt;http://100.64.1.42:8080&lt;/code&gt;. Your dictation now works wherever you've got signal. Free for personal use (up to 100 devices).&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Tunnel (public URL, no port forwarding)
&lt;/h3&gt;

&lt;p&gt;If you'd rather have a pretty URL and don't want to install anything on the phone, &lt;a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/" rel="noopener noreferrer"&gt;Cloudflare Tunnel&lt;/a&gt; gives you an outbound tunnel from your server to Cloudflare's edge. No router config, no exposed ports.&lt;/p&gt;

&lt;p&gt;Add this service to your compose file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;cloudflared&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cloudflare/cloudflared:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-cloudflared&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tunnel --no-autoupdate run&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;TUNNEL_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${CLOUDFLARE_TUNNEL_TOKEN}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the tunnel in the Cloudflare Zero Trust dashboard, grab the token, paste it into your &lt;code&gt;.env&lt;/code&gt;, set the public hostname to route to &lt;code&gt;http://gateway:8080&lt;/code&gt;. Done. Dictate over &lt;code&gt;https://dictation.yourdomain.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Free tier. Works great. Only caveat: your transcriptions pass through Cloudflare's network on the way. That's not plaintext (HTTPS all the way), but if "no third party in the path" is the whole reason you set this up, stick to Tailscale.&lt;/p&gt;

&lt;h3&gt;
  
  
  ngrok (testing / temporary)
&lt;/h3&gt;

&lt;p&gt;For quick testing, &lt;a href="https://ngrok.com/" rel="noopener noreferrer"&gt;ngrok&lt;/a&gt; gives you a public URL in one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ngrok http 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It prints a &lt;code&gt;https://xxx.ngrok-free.app&lt;/code&gt; URL. Paste that into the Diction app. Good for a demo or a five-minute test. Free tier URLs change every restart, which is annoying for permanent use. Also adds latency because your audio makes a round trip through ngrok's edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which one?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personal use, only you reach it:&lt;/strong&gt; Tailscale. Fast, private, no external hostnames.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Family / small team reaches the same server:&lt;/strong&gt; Cloudflare Tunnel. Pretty URL, TLS, one password.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Just testing:&lt;/strong&gt; ngrok.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Already Have a Voice Model Server?
&lt;/h2&gt;

&lt;p&gt;If you've already got a voice model server running somewhere - a self-hosted &lt;code&gt;faster-whisper-server&lt;/code&gt;, a colleague's LocalAI instance, your employer's internal speech API - keep it. You don't need the voice model container from Step 3.&lt;/p&gt;

&lt;p&gt;What you still need is the Diction Gateway. The iPhone app talks to it for WebSocket streaming and the end-to-end encryption handshake - neither of which a plain OpenAI-compatible transcription server exposes. Point the gateway at your existing server with &lt;code&gt;CUSTOM_BACKEND_URL&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-gateway&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://your-existing-server:8000&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-small&lt;/span&gt;
      &lt;span class="c1"&gt;# Optional LLM cleanup (Step 7):&lt;/span&gt;
      &lt;span class="na"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openai.com/v1"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${OPENAI_API_KEY}"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini"&lt;/span&gt;
      &lt;span class="na"&gt;LLM_PROMPT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clean&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;up&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;transcription..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two extra knobs the &lt;code&gt;CUSTOM_BACKEND_*&lt;/code&gt; path supports if you need them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CUSTOM_BACKEND_AUTH: "Bearer sk-whatever"&lt;/code&gt;. Sent as the &lt;code&gt;Authorization&lt;/code&gt; header to your backend. For instances you've put an auth proxy in front of, or anything hosted that requires a token.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CUSTOM_BACKEND_NEEDS_WAV: "true"&lt;/code&gt;. Some backends (Canary, Parakeet) only accept WAV. The gateway transparently converts incoming audio with ffmpeg before forwarding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Point the iPhone at the gateway (&lt;code&gt;http://your-server:8080&lt;/code&gt;), leave your existing voice model server where it is, and get streaming plus LLM cleanup on top.&lt;/p&gt;

&lt;h2&gt;
  
  
  Swap the Speech Model
&lt;/h2&gt;

&lt;p&gt;The starter compose file runs &lt;code&gt;small&lt;/code&gt;. That's a choice, not a commitment. Swapping to a different voice model size is two lines in your compose file plus a &lt;code&gt;docker compose up -d&lt;/code&gt;. The gateway has a short name for each model it knows how to route to:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Short name (&lt;code&gt;DEFAULT_MODEL&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;Service hostname&lt;/th&gt;
&lt;th&gt;Full model ID&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;whisper-tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Systran/faster-whisper-tiny&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;small&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;whisper-small&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Systran/faster-whisper-small&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;whisper-medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Systran/faster-whisper-medium&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;large-v3-turbo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;whisper-large-turbo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;deepdml/faster-whisper-large-v3-turbo-ct2&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;parakeet-v3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parakeet&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;nvidia/parakeet-tdt-0.6b-v3&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To swap from &lt;code&gt;small&lt;/code&gt; to &lt;code&gt;medium&lt;/code&gt;, rewrite your compose file so the whisper service is named &lt;code&gt;whisper-medium&lt;/code&gt;, uses &lt;code&gt;WHISPER__MODEL: Systran/faster-whisper-medium&lt;/code&gt;, and the gateway's &lt;code&gt;DEFAULT_MODEL&lt;/code&gt; is &lt;code&gt;medium&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If the service name doesn't match the short name the gateway expects, you'll see &lt;code&gt;404 model not found&lt;/code&gt; on every request. That's the #1 reason people get stuck when upgrading.&lt;/p&gt;

&lt;p&gt;Running multiple models at once? Add more services (&lt;code&gt;whisper-small&lt;/code&gt; + &lt;code&gt;whisper-medium&lt;/code&gt; side by side) and the app can switch between them per-request by setting the &lt;code&gt;model&lt;/code&gt; field in the request body. &lt;code&gt;DEFAULT_MODEL&lt;/code&gt; only applies when the request doesn't specify one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Cost Me
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The machine: whatever you already have idling at home&lt;/li&gt;
&lt;li&gt;Electricity: the speech model at idle is effectively zero. Spikes briefly when you dictate.&lt;/li&gt;
&lt;li&gt;OpenAI: &lt;code&gt;gpt-4o-mini&lt;/code&gt; is the cheap model. An hour of dictation costs roughly a cent. Five dollars of credit lasts months.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Got an NVIDIA GPU Sitting Idle?
&lt;/h2&gt;

&lt;p&gt;If the box you're setting this up on has an NVIDIA card in it, you can skip the &lt;code&gt;small&lt;/code&gt; model and run something that's genuinely state of the art. CPU-only is fine for dictation. GPU unlocks the models that the paid services are running - often faster than those services, because there's no network round trip.&lt;/p&gt;

&lt;p&gt;Two options. Pick one.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Parakeet TDT 0.6B v3&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;large-v3-turbo&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best at&lt;/td&gt;
&lt;td&gt;Speed + accuracy on European languages&lt;/td&gt;
&lt;td&gt;Multilingual breadth (99 languages)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WER (English)&lt;/td&gt;
&lt;td&gt;~6.3%&lt;/td&gt;
&lt;td&gt;~7.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;Sub-second&lt;/td&gt;
&lt;td&gt;Under 2s on consumer GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VRAM (INT8)&lt;/td&gt;
&lt;td&gt;~2 GB&lt;/td&gt;
&lt;td&gt;~2.3 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Languages&lt;/td&gt;
&lt;td&gt;25 European&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio format&lt;/td&gt;
&lt;td&gt;WAV only (gateway converts)&lt;/td&gt;
&lt;td&gt;Anything (voice model handles it)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Option A: Parakeet (fastest, 25 European languages)
&lt;/h3&gt;

&lt;p&gt;NVIDIA's Parakeet TDT 0.6B v3. On a recent consumer GPU (think RTX 3060 or better) it transcribes a 5-second clip in well under a second. Accuracy on clean English audio beats the large-v3 voice model on most benchmarks, at a fraction of the size and latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supported languages:&lt;/strong&gt; English, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, Ukrainian. If you dictate in any of these, Parakeet is the better engine.&lt;/p&gt;

&lt;p&gt;If you dictate in Japanese, Mandarin, Arabic, Korean, or anything outside that list, use Option B.&lt;/p&gt;

&lt;p&gt;Replace the &lt;code&gt;whisper-small&lt;/code&gt; service in &lt;code&gt;docker-compose.yml&lt;/code&gt; with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parakeet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/achetronic/parakeet:latest-int8&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-parakeet&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5092:5092"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-gateway&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;parakeet&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parakeet-v3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway already knows how to speak to a service named &lt;code&gt;parakeet&lt;/code&gt; on port 5092. No extra wiring needed. Test it exactly the same way as before.&lt;/p&gt;

&lt;p&gt;You'll need the NVIDIA Container Toolkit installed on the host so Docker can pass the GPU through. &lt;a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt" rel="noopener noreferrer"&gt;One-line install&lt;/a&gt; if you haven't done it yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: large-v3-turbo voice model (multilingual, frontier-tier)
&lt;/h3&gt;

&lt;p&gt;The biggest model in this family, GPU-accelerated. This is what the paid cloud transcription services charge real money for. Runs great on any GPU with 6GB+ of VRAM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;whisper-large&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cuda&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-whisper-large&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;whisper-models:/root/.cache/huggingface&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-large-v3-turbo&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__INFERENCE_DEVICE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cuda&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__COMPUTE_TYPE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;float16&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diction-gateway&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;whisper-large&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;large-v3-turbo&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;whisper-models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First boot pulls about 1.6GB of model weights. After that it's warm and fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  What About NVIDIA Canary 1B?
&lt;/h3&gt;

&lt;p&gt;If you've been reading up on speech models recently, you've probably seen Canary 1B at the top of the accuracy benchmarks. Yes, it's better than both options above on paper. The catch: NVIDIA ships it through NeMo, not as a turnkey OpenAI-compatible container. Getting it wrapped in the API the gateway expects is real work. You'll end up writing a small serving layer yourself. I run one of those internally for the Diction cloud, but I'm not going to pretend you can copy-paste a compose block for it. If you're willing to build that wrapper, point the gateway at it via &lt;code&gt;CUSTOM_BACKEND_URL&lt;/code&gt; (see the next section) and you're set.&lt;/p&gt;

&lt;p&gt;For everyone else: Parakeet or large-v3-turbo is already better than what most cloud services give you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenAI-Compatible API You Just Installed
&lt;/h2&gt;

&lt;p&gt;The gateway speaks the OpenAI audio transcription API. That means anything that knows how to talk to &lt;code&gt;api.openai.com/v1/audio/transcriptions&lt;/code&gt; also knows how to talk to your server. You spun up the iPhone keyboard client of this API, but you can also point laptops, scripts, or other services at the same URL.&lt;/p&gt;

&lt;p&gt;Quick Python example using the official OpenAI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://192.168.1.42:8080/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anything&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# not checked by default
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting.m4a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same thing works for the Node SDK, LangChain, Flowise, n8n, anything. Treat it as a local stand-in for OpenAI's hosted API.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's supported
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /v1/audio/transcriptions&lt;/code&gt; with &lt;code&gt;file&lt;/code&gt;, &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;language&lt;/code&gt;, &lt;code&gt;prompt&lt;/code&gt;, &lt;code&gt;response_format=json|text&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /v1/models&lt;/code&gt; - lists the speech engines and models the gateway can route to. Response shape is Diction's own (&lt;code&gt;{"providers": [{"id": "whisper", "models": [...]}, ...]}&lt;/code&gt;), not OpenAI's flat &lt;code&gt;data&lt;/code&gt; array, so OpenAI SDK &lt;code&gt;.models.list()&lt;/code&gt; calls won't parse it cleanly. Hit it directly with &lt;code&gt;curl&lt;/code&gt; if you want to see what's available.&lt;/li&gt;
&lt;li&gt;Multiple short-name aliases: &lt;code&gt;small&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;large-v3-turbo&lt;/code&gt;, &lt;code&gt;parakeet-v3&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;HuggingFace-style IDs: &lt;code&gt;Systran/faster-whisper-small&lt;/code&gt;, &lt;code&gt;nvidia/parakeet-tdt-0.6b-v3&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What's not supported
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Text-to-speech (&lt;code&gt;/v1/audio/speech&lt;/code&gt;). This is transcription only.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;response_format=verbose_json | srt | vtt&lt;/code&gt;. No word-level timestamps.&lt;/li&gt;
&lt;li&gt;Server-Sent Events streaming on the REST endpoint. Use the WebSocket &lt;code&gt;/v1/audio/stream&lt;/code&gt; for streaming.&lt;/li&gt;
&lt;li&gt;OpenAI's Realtime API (&lt;code&gt;/v1/realtime&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Authentication
&lt;/h3&gt;

&lt;p&gt;By default the gateway has &lt;code&gt;AUTH_ENABLED=false&lt;/code&gt;. Pass any non-empty string as the API key - nothing's checked. If you want to lock it down (e.g. exposing via Cloudflare Tunnel), set &lt;code&gt;AUTH_ENABLED=true&lt;/code&gt; and configure the token in your gateway env. The server/docker-compose.yml in the public repo has a more elaborate example if you want to see it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveat: error response shape
&lt;/h3&gt;

&lt;p&gt;Diction's gateway returns errors as &lt;code&gt;{"error":"message"}&lt;/code&gt;, not OpenAI's nested &lt;code&gt;{"error":{"message":"...","type":"..."}}&lt;/code&gt;. Most SDKs surface these as a raw &lt;code&gt;HTTPError&lt;/code&gt; rather than a parsed &lt;code&gt;APIError&lt;/code&gt;. Catch both if you're writing something defensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Privacy: What Actually Happens to Your Audio
&lt;/h2&gt;

&lt;p&gt;The whole reason most people set this up is not paying a random SaaS to process their voice. Worth being precise about what this stack does and doesn't do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What leaves your iPhone:&lt;/strong&gt; raw audio, encoded as Opus (over WebSocket stream) or WAV (over REST), heading to the server endpoint you configured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In transit:&lt;/strong&gt; HTTP by default. Plain text audio over your LAN. That's fine on a trusted home network. If you expose the gateway over the internet (Cloudflare Tunnel, ngrok, your own reverse proxy), put TLS in front of it. Tailscale wraps everything in WireGuard so you don't need to think about TLS at all - that's part of why I prefer it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What your server does with the audio:&lt;/strong&gt; feeds it to the voice model container. The voice model transcribes. Returns text. Audio gets thrown away - neither the gateway nor &lt;code&gt;faster-whisper-server&lt;/code&gt; persists audio anywhere. &lt;code&gt;docker compose logs&lt;/code&gt; contains request metadata (latency, model used, text length) but not the audio or the transcript. You can verify yourself: &lt;code&gt;docker exec diction-whisper-small ls -la /tmp&lt;/code&gt; is essentially empty between requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If cleanup is enabled:&lt;/strong&gt; the transcript (plain text, no audio) gets sent to your configured LLM endpoint. That's the only point where data leaves your server. If you pick a local Ollama, nothing leaves the house at all. If you pick OpenAI/Groq/whatever, the transcript passes through their infrastructure. Their data policies apply to that leg - read them if it matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the Diction app does with your audio:&lt;/strong&gt; nothing. The keyboard's only job is to stream to your endpoint and insert the response. No analytics, no tracking, no background uploads. The app has no QWERTY input, so there's literally nothing to log even if it wanted to. Source for the server-side code is on GitHub (the iOS app itself isn't open source, but the data flow on the wire is straightforward: one POST per dictation, to the endpoint you configured).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full Access permission:&lt;/strong&gt; iOS requires this for any keyboard that touches the network. It's a coarse switch that also grants things like pasteboard access. Diction uses the network part and nothing else - again, no typed input, no pasteboard monitoring. If you'd rather not trust that claim, run the setup from this article and point Wireshark at the gateway's port. You'll see exactly one connection per dictation, to your endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Small Thing About "AI Companion"
&lt;/h2&gt;

&lt;p&gt;If you dig around the Diction app's settings you'll find an "AI Companion" toggle with its own prompt field. Worth knowing how that interacts with what you just built.&lt;/p&gt;

&lt;p&gt;The toggle is what tells the app to ask for cleanup (&lt;code&gt;?enhance=true&lt;/code&gt; in the request). It's the on/off switch. But the actual prompt the LLM sees is whatever you put in &lt;code&gt;LLM_PROMPT&lt;/code&gt; in your compose file. The in-app prompt field is used by the hosted Diction Cloud setup. On your own server, your env var wins. Every time.&lt;/p&gt;

&lt;p&gt;So: flip AI Companion on in the app if you want cleanup to run. Tune the prompt by editing &lt;code&gt;docker-compose.yml&lt;/code&gt; and running &lt;code&gt;docker compose up -d&lt;/code&gt; again. Nothing else to configure.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's Open Source. Go Wild.
&lt;/h2&gt;

&lt;p&gt;The gateway is on GitHub at &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;omachala/diction&lt;/a&gt; under an open-source license. If there's a behavior you want that it doesn't have, fork it. If you hit a bug or add something other people would benefit from, I'd love a pull request. The codebase is small and deliberately boring Go. You don't need to be an expert to find your way around.&lt;/p&gt;

&lt;p&gt;Some things I know people want and haven't built yet: per-app routing (different models for different apps), a richer context API, swappable post-processing pipelines. If any of those scratch your itch, the code's right there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Heard of Speaches?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/speaches-ai/speaches" rel="noopener noreferrer"&gt;Speaches&lt;/a&gt; is the nearest neighbor - an OpenAI-compatible self-hosted speech server with transcription, TTS, and a realtime API. Good project for a general-purpose endpoint. It won't drive the Diction keyboard, though: the app opens a WebSocket at &lt;code&gt;/v1/audio/stream&lt;/code&gt; and does an X25519 + AES-GCM handshake on every request, and Speaches streams transcription over SSE on the REST endpoint with no knowledge of that handshake. That's why I wrote Diction Gateway - the keyboard's protocol baked in, end-to-end encrypted transcripts by default, BYO LLM cleanup in a single env var, and a thin wrapper mode (&lt;code&gt;CUSTOM_BACKEND_URL&lt;/code&gt;) so you can put it in front of any existing speech server. Even outside the keyboard use case, if you want a minimal OpenAI-compatible speech gateway with an LLM cleanup step wired in, reach for this one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Go Next
&lt;/h2&gt;

&lt;p&gt;Some directions once the base setup is working:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ditch the cloud LLM for a local model.&lt;/strong&gt; You already saw the Ollama option in Step 7. Uncomment it in your compose file, &lt;code&gt;ollama pull gemma2:9b&lt;/code&gt;, done. Nothing leaves your house. I've got a &lt;a href="https://dev.to/omachala/i-plugged-ollama-into-my-iphone-keyboard-heres-the-full-self-hosted-stack-1ii8"&gt;full walkthrough of the Ollama side here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move off home WiFi.&lt;/strong&gt; Tailscale (Reach It From Anywhere section above) is the easy answer. Five minutes to set up, dictation works at the café.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upgrade the speech model.&lt;/strong&gt; Start with &lt;code&gt;small&lt;/code&gt;, move to &lt;code&gt;medium&lt;/code&gt; once you notice misheard words, jump to &lt;code&gt;large-v3-turbo&lt;/code&gt; if you've got a GPU. Model accuracy climbs noticeably between each tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dictate in another language.&lt;/strong&gt; The voice model autodetects, so you don't have to do anything. If you're mostly in a European language and have a GPU, switch to Parakeet - it's meaningfully more accurate for those.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune the cleanup prompt.&lt;/strong&gt; The default prompt fixes filler words and punctuation. Try the email-ready rewriter, the bullet-pointer, or your own variant. See the prompt library in Step 7.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a second gateway.&lt;/strong&gt; Run one on your home server (high quality, slow connection over VPN) and one on a dev laptop (lower quality, instant local). Switch per-network.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plug the gateway into other things.&lt;/strong&gt; It's an OpenAI-compatible speech endpoint. Any transcription workflow - meeting notes, voice memos pipeline, automatic subtitling - can point at it instead of OpenAI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribute.&lt;/strong&gt; If you build something useful on top of this, PR it to &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;omachala/diction&lt;/a&gt;. Better prompts, better docs, new backends, whatever.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The keyboard is in the &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;App Store&lt;/a&gt;. You can self-host, use the Diction Cloud, or both. The app lets you switch per-app - self-host your Telegram dictation, use the cloud when you're offline from your tailnet, on-device only mode for the really sensitive stuff. Mix and match.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing the Thread
&lt;/h2&gt;

&lt;p&gt;What I like about this setup: I can talk to OpenClaw and the rest of my agents without worrying about who else is listening on the way in. The keyboard's as fast as the built-in one. Short dictations land in under a second. The only thing I pay is whatever my cleanup LLM costs - pennies on OpenAI, zero on local Ollama. The rest stays on my hardware.&lt;/p&gt;

&lt;p&gt;The project is still quite new, but the feedback from people using it daily has been genuinely amazing. I'm adding features almost every week and making the whole thing more rock solid with each release. If there's something missing for your workflow, say so - good chance it's on its way or can be.&lt;/p&gt;

&lt;p&gt;If you found this useful, a GitHub star on &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;omachala/diction&lt;/a&gt; would be a lovely token of appreciation - it's the easiest way to tell me this stuff is worth building more of. Try the app, tell someone else who'd find it useful, and if you hit something that's broken or confusing in this walkthrough, ping me. I'll fix it.&lt;/p&gt;

&lt;p&gt;Happy dictating.&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>docker</category>
      <category>ios</category>
      <category>privacy</category>
    </item>
    <item>
      <title>I Plugged Ollama Into My iPhone Keyboard. Here's the Full Self-Hosted Stack.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 16 Apr 2026 09:00:00 +0000</pubDate>
      <link>https://dev.to/omachala/i-plugged-ollama-into-my-iphone-keyboard-heres-the-full-self-hosted-stack-1ii8</link>
      <guid>https://dev.to/omachala/i-plugged-ollama-into-my-iphone-keyboard-heres-the-full-self-hosted-stack-1ii8</guid>
      <description>&lt;p&gt;You've got Ollama running on your home server. Your iPhone keyboard is still phoning home.&lt;/p&gt;

&lt;p&gt;That gap bothered me enough to close it. Diction is an open-source iOS keyboard that speaks the standard OpenAI transcription API, &lt;code&gt;POST /v1/audio/transcriptions&lt;/code&gt;. Its gateway is open source Go. You configure it entirely with environment variables. The latest update adds something I wanted from the start: plug in your own LLM for post-processing. Ollama, OpenAI, anything with an OpenAI-compatible endpoint.&lt;/p&gt;

&lt;p&gt;Here's the full stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Each Piece Does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The speech engine&lt;/strong&gt; does the transcription. I ran Whisper for months. But there's a faster option if you dictate in English or another European language: NVIDIA's Parakeet model, available via the &lt;code&gt;achetronic/parakeet&lt;/code&gt; Docker image. It supports 25 languages. On CPU it's roughly 10x faster than Whisper, uses about 2GB RAM, and edges out Whisper large-v3 on accuracy for English. Models are baked into the image, so no first-run download.&lt;/p&gt;

&lt;p&gt;If you need Asian languages, Arabic, or anything outside those 25, use Whisper instead. Everything below works with either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The gateway&lt;/strong&gt; sits in front of the speech engine. It handles WebSocket streaming, so your phone streams audio live while you're still talking. By the time you stop, the transcript is mostly ready. It also handles the new LLM post-processing step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ollama&lt;/strong&gt; cleans up the transcript after transcription. The gateway calls it with a system prompt you write, and the cleaned text is what gets inserted into the app. Your model, your prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;Three containers. No GPU required. Under 3GB RAM without Ollama, 8-10GB with a 9B model loaded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parakeet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/achetronic/parakeet:latest-int8&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9006:5092"&lt;/span&gt;

  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;parakeet&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parakeet-v3&lt;/span&gt;
      &lt;span class="na"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://ollama:11434/v1&lt;/span&gt;
      &lt;span class="na"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
      &lt;span class="na"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma2:9b&lt;/span&gt;
      &lt;span class="na"&gt;LLM_PROMPT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clean&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;up&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;transcription.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Remove&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;filler&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;words&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(um,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;uh,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;like).&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;punctuation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;grammar.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Return&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;only&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cleaned&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;text."&lt;/span&gt;

  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama-data:/root/.ollama&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama-data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull gemma2:9b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skip the Ollama block entirely if you don't want AI cleanup. The gateway checks for &lt;code&gt;LLM_BASE_URL&lt;/code&gt; on startup. If it's not set, transcriptions come back raw.&lt;/p&gt;

&lt;p&gt;If you already have Ollama running on a different machine, point &lt;code&gt;LLM_BASE_URL&lt;/code&gt; at it. Works with any model you've already pulled.&lt;/p&gt;

&lt;p&gt;For Whisper instead of Parakeet, swap the &lt;code&gt;parakeet&lt;/code&gt; service for &lt;code&gt;fedirz/faster-whisper-server:latest-cpu&lt;/code&gt; and set &lt;code&gt;DEFAULT_MODEL: small&lt;/code&gt; (or &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;large-turbo&lt;/code&gt;) on the gateway.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting the App
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Install Diction from the App Store&lt;/li&gt;
&lt;li&gt;In iPhone Settings: General → Keyboard → Keyboards → Add New Keyboard → Diction&lt;/li&gt;
&lt;li&gt;Open the Diction app, switch to &lt;strong&gt;Self-Hosted&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Paste your server address: &lt;code&gt;http://192.168.1.100:8080&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A green dot confirms the connection&lt;/li&gt;
&lt;li&gt;Enable AI Enhancement in app settings (requires a Diction One subscription to unlock the toggle — the processing runs on your server, not Diction's)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The keyboard is now using your server. Audio goes from your phone to your server, Parakeet transcribes it, Ollama cleans the result, text lands in whatever app you're typing in. Nothing leaves your network.&lt;/p&gt;

&lt;p&gt;If you're away from home, Tailscale or a Cloudflare Tunnel connects your phone without opening router ports.&lt;/p&gt;




&lt;h2&gt;
  
  
  Writing a Prompt That Works
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;LLM_PROMPT&lt;/code&gt; env var is a single system prompt. The gateway sends it with every transcription request. The transcript is the user message. You control both.&lt;/p&gt;

&lt;p&gt;A few starting points:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# General dictation
Remove filler words (um, uh, like, you know). Fix punctuation and grammar.
Preserve meaning and tone. Return only the cleaned result.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Technical / developer notes
Fix transcription errors. Preserve technical terms, command names, and file paths
exactly as spoken. Remove filler words. Return cleaned text only.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Medical or domain-specific
Fix transcription errors. Preserve all domain-specific terminology exactly as spoken.
Fix grammar and punctuation only. Return the corrected text.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One practical note: models under 7B parameters often answer the transcript rather than clean it. Gemma2 9B is reliable. Qwen2.5 7B is borderline on this task. Anything 9B+ behaves predictably.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Get
&lt;/h2&gt;

&lt;p&gt;Audio from your phone to your server. Parakeet or Whisper transcribes it. Ollama cleans it. Text inserted. No third party, no word limits. The Diction gateway is fully open source on GitHub — inspect every line that runs in your network.&lt;/p&gt;

&lt;p&gt;Full setup docs at &lt;a href="https://diction.one/self-hosted" rel="noopener noreferrer"&gt;diction.one/self-hosted&lt;/a&gt;. If you're already running a homelab with Ollama, the marginal effort is a single compose file and 10 minutes.&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>docker</category>
      <category>ios</category>
      <category>privacy</category>
    </item>
    <item>
      <title>What's New in Diction 5.0</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:18:46 +0000</pubDate>
      <link>https://dev.to/omachala/whats-new-in-diction-50-1k0l</link>
      <guid>https://dev.to/omachala/whats-new-in-diction-50-1k0l</guid>
      <description>&lt;p&gt;Diction 5.0 is out. Three things changed in a meaningful way: the cloud is rebuilt from scratch, AI Companion can now edit text by voice, and the app is fully localized in 13 languages. Everything else is polish on top of that foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diction One is a different thing now
&lt;/h2&gt;

&lt;p&gt;I spent several weeks working with physical hardware for this. Dedicated speech models for each language family, better audio processing, faster response across the board. If you tried cloud mode in an earlier version and found it slow or inaccurate, I'd ask you to try it again.&lt;/p&gt;

&lt;p&gt;The mic is always warm now. Tap and start talking immediately. No noticeable gap between the button and your voice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your voice can now edit
&lt;/h2&gt;

&lt;p&gt;AI Companion in v4 could clean up what you said. In v5, it can edit what's already written.&lt;/p&gt;

&lt;p&gt;You're writing an email. You wrote "let's meet Tuesday" but the meeting moved. Place your cursor anywhere in that sentence and say "change Tuesday to Thursday." No selecting, no deleting, no retyping.&lt;/p&gt;

&lt;p&gt;Or select a paragraph that's too stiff and say "make this more casual." Diction rewrites the selection. The action bar turns indigo when text is selected so you always know what mode you're in.&lt;/p&gt;

&lt;p&gt;Long-press the action bar to rewrite text around your cursor without selecting anything first. Say what's wrong, Diction fixes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  13 languages, properly localized
&lt;/h2&gt;

&lt;p&gt;Every screen now adapts to your system language: Settings, History, Insights, everything. Switch the UI language live from the picker without restarting the keyboard.&lt;/p&gt;

&lt;p&gt;Before this release, everyone got English regardless of what their iPhone was set to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other changes worth knowing
&lt;/h2&gt;

&lt;p&gt;Insights is redesigned. Your typing speed multiplier is the main number, with daily average, words per minute, days used, and time saved all visible at once.&lt;/p&gt;

&lt;p&gt;New mic release options. "After dictation" drops the mic the moment the transcription finishes. Ten-second and 30-second options for music and podcast listeners.&lt;/p&gt;

&lt;p&gt;AirPods work correctly now. Music stays in stereo while Diction uses the built-in mic. Nothing ducks.&lt;/p&gt;

&lt;p&gt;AI Companion is smarter about keeping your voice. It preserves natural speech patterns, writes numbers as digits, and doesn't drop sentences on longer recordings.&lt;/p&gt;

&lt;p&gt;For self-hosters: support for bringing your own language model for AI Companion, a one-command setup covering 25 European languages, and smart routing that picks the right speech model per language with a health-checked fallback. Open source at github.com/omachala/diction.&lt;/p&gt;

&lt;p&gt;Diction 5.0 is on the App Store now: &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;https://apps.apple.com/app/id6759807364&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>How Diction handles privacy</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 02 Apr 2026 09:00:00 +0000</pubDate>
      <link>https://dev.to/omachala/how-diction-handles-privacy-52mf</link>
      <guid>https://dev.to/omachala/how-diction-handles-privacy-52mf</guid>
      <description>&lt;p&gt;Voice keyboards sit in an uncomfortable position. Every app you use, every message you send, every search you type — the keyboard is there. It sees all of it.&lt;/p&gt;

&lt;p&gt;Most people install a keyboard and never think about this. I did, because I was building one.&lt;/p&gt;




&lt;p&gt;Earlier this year, someone reverse-engineered a popular voice keyboard and posted their findings. The app was collecting full browser URLs, names of focused apps, on-screen text scraped via the Accessibility API, clipboard contents including data copied from password managers, and sending it all back to a server. There was a function in the binary called &lt;code&gt;sendTrackResultToServer&lt;/code&gt;. None of this was in the privacy policy.&lt;/p&gt;

&lt;p&gt;This is not a hypothetical. It happened. And the only reason anyone found out is because the app was installed on a machine where someone was curious enough to look.&lt;/p&gt;

&lt;p&gt;That is the problem with closed-source software and privileged access: you cannot verify the claims. A privacy policy is a document. The code is what runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Access and what it actually enables
&lt;/h2&gt;

&lt;p&gt;When iOS asks if you want to allow Full Access for a keyboard, the permission is broader than most people realise. It enables network access (how keyboards send audio for transcription or sync dictionaries). But in the wrong hands it also means the keyboard code runs in a context where it could read clipboard data, monitor app usage patterns, or transmit information alongside its legitimate function.&lt;/p&gt;

&lt;p&gt;Diction has no QWERTY keys. There is nothing to type into it, so nothing to log in that sense. But I wanted to go further than just "we don't do the bad thing." I wanted to build it so you can verify we don't.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I built Diction with this in mind
&lt;/h2&gt;

&lt;p&gt;There are three ways Diction can process your audio, and I picked each one with this threat model in mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On-device&lt;/strong&gt; is the cleanest answer. Your audio never leaves your iPhone. A local speech model handles transcription, the result comes back, and that is it. No server, no transmission, no policy to read. If you want absolute certainty, this is the mode for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted&lt;/strong&gt; is for people who want cloud-quality transcription but on infrastructure they control. You point the app at your own server. Your audio goes there and nowhere else. I have no access to what you say or what gets transcribed. The server software is open source. You can read exactly what it does before you run it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diction One&lt;/strong&gt; is the hosted cloud option. Here I had to think carefully. Audio is processed in memory and discarded immediately after transcription. Nothing is written to disk. No transcriptions are stored or logged. And every transcription is encrypted with AES-256-GCM using a fresh X25519 key per request. The same standards WireGuard uses. I am not asking you to trust the policy. The implementation is in the open-source server code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The app itself
&lt;/h2&gt;

&lt;p&gt;The Diction app contains no analytics and no tracking code. No device identifiers, no usage events, no behavioural monitoring. The App Store privacy label reads "Data Not Collected." I can say that confidently because I wrote every line and there is nothing there.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you can actually verify
&lt;/h2&gt;

&lt;p&gt;The server code is public at &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;. You can read the transcription handler and confirm that audio is not written anywhere. You can read the encryption implementation. If you run on-device mode, you can point a network inspector at the app and confirm no requests leave it.&lt;/p&gt;

&lt;p&gt;I built it this way because I wanted to use this keyboard myself. And I was not willing to just trust a policy page written by someone else.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Download on the App Store&lt;/a&gt; · &lt;a href="https://diction.one/privacy-first" rel="noopener noreferrer"&gt;diction.one/privacy-first&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>privacy</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What's New in Diction 4.0</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Wed, 01 Apr 2026 15:04:42 +0000</pubDate>
      <link>https://dev.to/omachala/whats-new-in-diction-40-2609</link>
      <guid>https://dev.to/omachala/whats-new-in-diction-40-2609</guid>
      <description>&lt;p&gt;Diction 4.0 is the biggest update since launch. The theme is straightforward: do more with your voice, with fewer rough edges. This release took hundreds of commits and more testing rounds than I want to count. Here is what landed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speak to Edit
&lt;/h2&gt;

&lt;p&gt;This is the headline feature. Select any text in any app, tap the mic, and say what you want changed.&lt;/p&gt;

&lt;p&gt;You are editing an email. You select "Wednesday works for me" and say "Thursday actually." Diction replaces the selection.&lt;/p&gt;

&lt;p&gt;It also handles instructions. Select a paragraph and say "translate to Czech." Or "make this shorter." Or "more formal." Diction figures out whether you are giving a literal replacement or an editing instruction and acts accordingly.&lt;/p&gt;

&lt;p&gt;Before this, voice keyboards were append-only. You could dictate new text, but editing meant switching to the regular keyboard. Now you stay in voice the whole time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Words Improve Transcription Directly
&lt;/h2&gt;

&lt;p&gt;In 3.0, custom words (My Words) only helped during AI Enhancement cleanup. Now they feed directly into the speech model as vocabulary hints.&lt;/p&gt;

&lt;p&gt;Your coworker's name is Kaelith. Your product is called Nexaro. You added both to My Words. Now the raw transcription gets them right on the first pass, even with AI Enhancement turned off.&lt;/p&gt;

&lt;p&gt;This matters most for anyone dictating technical terms, brand names, or anything the base model has never seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Long Recordings That Actually Finish
&lt;/h2&gt;

&lt;p&gt;Previous versions could cut off or lose the end of longer dictations. 4.0 rewrites the recording pipeline to handle long sessions without dropping audio.&lt;/p&gt;

&lt;p&gt;If you are dictating meeting notes, a long email, or journal entries, the transcript comes back complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Profile
&lt;/h2&gt;

&lt;p&gt;Tell Diction who you are and how you write. "I'm a software engineer. I write in short, direct sentences. I use American English."&lt;/p&gt;

&lt;p&gt;AI Enhancement uses your profile to match your style. Instead of generic cleanup, it produces text that sounds like you actually wrote it. The profile persists across all your dictations, so you set it once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guided Onboarding
&lt;/h2&gt;

&lt;p&gt;First launch used to throw permission dialogs at you and hope you figured it out. Now there is a step-by-step walkthrough: keyboard installation, permissions, first dictation. You know exactly where you are and what to do next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Better On-Device Setup
&lt;/h2&gt;

&lt;p&gt;Downloading speech models should not be confusing. The download flow is smoother now, preparation is faster, and the model is ready to use as soon as it finishes. No extra steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  No More Phantom Orange Dot
&lt;/h2&gt;

&lt;p&gt;Opening the Diction app used to activate the microphone, which lit up the iOS orange dot even though you were not dictating. Fixed. The mic only activates when you actually start a dictation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the Hood
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI Enhancement accuracy improved across apps&lt;/li&gt;
&lt;li&gt;UI polish across the keyboard, history, tones, and settings&lt;/li&gt;
&lt;li&gt;Stability improvements throughout. 4.0 is a significantly more stable release than 3.0.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Diction is a voice keyboard for iPhone. Tap the mic, speak, text appears wherever your cursor is. On-device, cloud, or self-hosted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;App Store&lt;/a&gt; / &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://diction.one" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>I Use NanoClaw in Telegram All Day. I Stopped Typing to It.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Sat, 28 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://dev.to/omachala/i-run-an-ai-agent-in-telegram-all-day-i-stopped-typing-to-it-3g7o</link>
      <guid>https://dev.to/omachala/i-run-an-ai-agent-in-telegram-all-day-i-stopped-typing-to-it-3g7o</guid>
      <description>&lt;p&gt;I have NanoClaw connected to my Telegram. Throughout the day I send it things. Translate this. Summarise that article. What time is it in Tokyo. Draft a reply to this message. It responds in the same thread, without me leaving the app.&lt;/p&gt;

&lt;p&gt;It runs on my own machine, inside a container. The agent only has access to what you explicitly give it. Setup took about fifteen minutes: clone the repo, run Claude Code, type &lt;code&gt;/setup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I didn't anticipate: I was still typing everything. Long questions. Multi-sentence requests. Context I had to spell out carefully. The assistant was right there in Telegram, but slow input was still slowing me down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Diction comes in
&lt;/h2&gt;

&lt;p&gt;I built Diction to fix exactly this. It's an iOS keyboard extension. In Telegram, you switch to it, tap the mic, speak, and the text appears in the compose field. Send it like any message.&lt;/p&gt;

&lt;p&gt;I dictate to NanoClaw now. "Can you draft a short reply to this email, keep it friendly but firm." Things that would take a minute to type take ten seconds to say. NanoClaw gets the same message either way.&lt;/p&gt;

&lt;p&gt;Diction has an on-device mode that runs locally on your iPhone. Nothing leaves the device. For a setup where the whole point is keeping your data on your own hardware, that felt like the right match.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup, end to end
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;NanoClaw:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clone &lt;a href="https://github.com/qwibitai/nanoclaw" rel="noopener noreferrer"&gt;github.com/qwibitai/nanoclaw&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run Claude Code in the repo directory&lt;/li&gt;
&lt;li&gt;Type &lt;code&gt;/setup&lt;/code&gt; — Claude Code handles everything&lt;/li&gt;
&lt;li&gt;Connect your Telegram bot token when prompted&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Diction:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install from the &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;App Store&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Settings → General → Keyboard → Add New Keyboard → Diction&lt;/li&gt;
&lt;li&gt;Switch to it in Telegram, tap the mic&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the whole stack. A personal assistant on your own hardware, voice input on your own device.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/qwibitai/nanoclaw" rel="noopener noreferrer"&gt;NanoClaw on GitHub&lt;/a&gt; | &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Diction on the App Store&lt;/a&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>selfhosted</category>
      <category>ios</category>
      <category>ai</category>
    </item>
    <item>
      <title>Self-Host Speech-to-Text and Use It as Your iPhone Keyboard in 3 Commands</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 26 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://dev.to/omachala/self-host-whisper-and-use-it-as-your-iphone-keyboard-in-3-commands-dp6</link>
      <guid>https://dev.to/omachala/self-host-whisper-and-use-it-as-your-iphone-keyboard-in-3-commands-dp6</guid>
      <description>&lt;p&gt;If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.&lt;/p&gt;

&lt;p&gt;Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.&lt;/p&gt;

&lt;p&gt;Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 3 Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/omachala/diction
&lt;span class="nb"&gt;cd &lt;/span&gt;diction
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the server running. Now install &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Diction&lt;/a&gt; on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Actually Running
&lt;/h2&gt;

&lt;p&gt;The Docker Compose setup spins up two services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;

  &lt;span class="na"&gt;whisper-small&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-small&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__INFERENCE_DEVICE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;whisper-small&lt;/strong&gt;: the transcription engine — runs open-source Whisper via a REST API. CPU works fine for real-time dictation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;gateway&lt;/strong&gt;: a small open-source Go service that handles communication between the iOS app and the transcription backend. It accepts WebSocket connections from the phone, buffers audio frames, and forwards them to Whisper. This is what makes dictation feel instant instead of "record, upload, wait."&lt;/p&gt;

&lt;p&gt;The gateway exposes port 8080. That's the URL you put into the Diction app.&lt;/p&gt;




&lt;h2&gt;
  
  
  Making It Accessible From Your Phone
&lt;/h2&gt;

&lt;p&gt;Your phone needs to reach your server. A few options depending on your setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tailscale&lt;/strong&gt; (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://100.x.x.x:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reverse proxy&lt;/strong&gt; (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://diction.yourdomain.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Direct LAN&lt;/strong&gt; (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://192.168.1.100:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Choosing a Model
&lt;/h2&gt;

&lt;p&gt;Swap the transcription model by changing &lt;code&gt;WHISPER__MODEL&lt;/code&gt;. The model downloads automatically on first use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Speed (CPU)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tiny&lt;/td&gt;
&lt;td&gt;~350MB&lt;/td&gt;
&lt;td&gt;~1-2s&lt;/td&gt;
&lt;td&gt;Lower accuracy, great for low-power hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;~800MB&lt;/td&gt;
&lt;td&gt;~3-4s&lt;/td&gt;
&lt;td&gt;Good default for everyday dictation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;~1.8GB&lt;/td&gt;
&lt;td&gt;~8-12s&lt;/td&gt;
&lt;td&gt;Better with accents and background noise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large&lt;/td&gt;
&lt;td&gt;~3.5GB&lt;/td&gt;
&lt;td&gt;~20-30s&lt;/td&gt;
&lt;td&gt;Highest accuracy, benefits from GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most home servers, the small model hits the sweet spot — fast enough to feel real-time, accurate enough for messages and notes.&lt;/p&gt;

&lt;p&gt;Swap it in your compose file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;whisper-small&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-medium&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__INFERENCE_DEVICE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Running on Lower-Power Hardware
&lt;/h2&gt;

&lt;p&gt;The small model runs well on modern CPUs. For a NAS or Raspberry Pi, try &lt;code&gt;tiny&lt;/code&gt; — less RAM (~350MB), faster responses, some accuracy trade-off. For real-time keyboard use, aim for sub-3 second round-trip. Small on a modern CPU or tiny on lower-power hardware gets you there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Already Running a Whisper Server?
&lt;/h2&gt;

&lt;p&gt;If you already have a speech-to-text container running, you don't need to spin up another one. Just run the gateway and point it at your existing server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://your-server:8000&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-model-name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More details on connecting to existing servers in &lt;a href="https://dev.to/omachala/you-already-have-a-speech-server-your-iphone-keyboard-should-use-it-7oh"&gt;this post&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;The server and gateway are fully open source: &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>docker</category>
      <category>ios</category>
      <category>productivity</category>
    </item>
    <item>
      <title>You Already Have a Speech Server. Your iPhone Keyboard Should Use It.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Wed, 18 Mar 2026 09:36:35 +0000</pubDate>
      <link>https://dev.to/omachala/you-already-have-a-speech-server-your-iphone-keyboard-should-use-it-7oh</link>
      <guid>https://dev.to/omachala/you-already-have-a-speech-server-your-iphone-keyboard-should-use-it-7oh</guid>
      <description>&lt;p&gt;Someone posted on our GitHub Discussions this week. They'd been running a speech-to-text container on their homelab for months. Found Diction, an open-source iOS voice keyboard. Pointed the app at their server. Got a server error. The settings screen even said "endpoint reachable."&lt;/p&gt;

&lt;p&gt;Here's what was going wrong, and how two lines of config fixes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why direct connection fails
&lt;/h2&gt;

&lt;p&gt;Diction doesn't talk directly to speech servers. It connects through a lightweight gateway first.&lt;/p&gt;

&lt;p&gt;The reason is WebSockets. When you tap the mic, the app opens a WebSocket and streams raw PCM audio to the gateway in real time as you speak. When you're done, the gateway POSTs the full audio to your speech server, gets the transcript, and sends it back. The whole exchange happens in the time it takes to stop speaking.&lt;/p&gt;

&lt;p&gt;Without this, the alternative is: record the whole thing, send a file, wait. You'd feel every pause. The WebSocket is what makes it feel instant.&lt;/p&gt;

&lt;p&gt;The "endpoint reachable" check passes because the iOS app pings &lt;code&gt;/health&lt;/code&gt; or &lt;code&gt;/v1/models&lt;/code&gt;. Most speech servers expose these. But the actual transcription uses the WebSocket endpoint, which only the gateway handles. No gateway, no streaming.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;You don't need to run our speech containers. Just the gateway, pointed at yours.&lt;/p&gt;

&lt;p&gt;If your server is at &lt;code&gt;http://192.168.1.50:8000&lt;/code&gt;, this is your entire &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://192.168.1.50:8000&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-model-name-here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open Diction, go to &lt;strong&gt;Self-Hosted&lt;/strong&gt;, paste &lt;code&gt;http://192.168.1.50:8080&lt;/code&gt;. Done.&lt;/p&gt;

&lt;p&gt;Your model stays where it is. The gateway handles the WebSocket layer, audio buffering, and forwarding. Audio still only goes to your server.&lt;/p&gt;

&lt;h2&gt;
  
  
  CUSTOM_BACKEND_MODEL
&lt;/h2&gt;

&lt;p&gt;One thing to get right: the model name.&lt;/p&gt;

&lt;p&gt;Most speech servers that follow the OpenAI-compatible API format expect a &lt;code&gt;model&lt;/code&gt; field in the transcription request to know which model to load. Without it, some return an error.&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;CUSTOM_BACKEND_MODEL&lt;/code&gt; to whatever name your server expects. Check your server's docs or the model you started it with. If your server only runs one model and ignores the field, you can omit it entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  WAV-only servers
&lt;/h2&gt;

&lt;p&gt;Some speech servers only accept WAV audio input. The gateway handles conversion automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://192.168.1.50:5092&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_NEEDS_WAV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this set, the gateway converts audio to 16kHz mono WAV via ffmpeg before forwarding. Your server gets the format it expects.&lt;/p&gt;

&lt;h2&gt;
  
  
  API key protection
&lt;/h2&gt;

&lt;p&gt;If your server is behind an API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://my-server:8000&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-model&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_AUTH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sk-your-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway injects the Authorization header on every request to your backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency on a local network
&lt;/h2&gt;

&lt;p&gt;I tested this end-to-end: generated a speech WAV, sent it through the gateway to a real speech container, got the transcript back correctly. On a local network with a CPU-only container, the round trip was under 5 seconds. With a dedicated GPU, it's near instant.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the gateway actually does
&lt;/h2&gt;

&lt;p&gt;The gateway is open source at &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;. It's a small Go service that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accepts WebSocket connections from the iOS app&lt;/li&gt;
&lt;li&gt;Buffers incoming PCM audio frames&lt;/li&gt;
&lt;li&gt;Wraps them in a WAV header and POSTs to your speech backend&lt;/li&gt;
&lt;li&gt;Returns the transcript over the WebSocket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No cloud calls. No telemetry. The full source is in &lt;code&gt;/gateway/core/&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;If you're running a Whisper server and want to get started from scratch, the &lt;a href="https://dev.to/omachala/self-host-whisper-and-use-it-as-your-iphone-keyboard-in-3-commands-dp6"&gt;3-command setup guide&lt;/a&gt; covers the full stack.&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>docker</category>
      <category>ios</category>
      <category>opensource</category>
    </item>
    <item>
      <title>You Wrote 14 Playwright Scripts Just to Screenshot Your Own App</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Tue, 17 Mar 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/omachala/you-wrote-14-playwright-scripts-just-to-screenshot-your-own-app-2ckf</link>
      <guid>https://dev.to/omachala/you-wrote-14-playwright-scripts-just-to-screenshot-your-own-app-2ckf</guid>
      <description>&lt;p&gt;It started simple. One Playwright script to capture the homepage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://myapp.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;homepage.png&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the team needed the pricing page. So I added another script. Then the dashboard (which needs login first). Then the settings page (which needs a specific tab clicked). Then mobile versions.&lt;/p&gt;

&lt;p&gt;Two months later I had 14 Playwright scripts. Some shared a login helper. Some had hardcoded waits. One had a try-catch that silently swallowed errors because the cookie banner sometimes loaded and sometimes didn't.&lt;/p&gt;

&lt;p&gt;I was maintaining a bespoke test suite, except it wasn't testing anything. It was just taking pictures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config, not code
&lt;/h2&gt;

&lt;p&gt;Here's what those 14 scripts look like as config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hiddenElements"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"myapp.com"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;".cookie-banner"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".chat-widget"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"screenshots"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"homepage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".hero"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pricing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/pricing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".pricing-grid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".dashboard"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".settings-panel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"click"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[data-tab='notifications']"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All 14 screenshots. One command. No scripts to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cookie banner problem
&lt;/h2&gt;

&lt;p&gt;In my scripts, I had this pattern everywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.cookie-banner .dismiss&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* maybe it didn't show */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;a href="https://github.com/omachala/heroshot" rel="noopener noreferrer"&gt;heroshot&lt;/a&gt;, you define hidden elements once per domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hiddenElements"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"myapp.com"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;".cookie-banner"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"docs.myapp.com"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;".cookie-banner"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".announcement-bar"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every screenshot on that domain hides those elements automatically. No try-catch. No timeouts. No "maybe it showed, maybe it didn't."&lt;/p&gt;

&lt;h2&gt;
  
  
  Actions for complex state
&lt;/h2&gt;

&lt;p&gt;The settings page needed a tab clicked first. In Playwright, that's 5 lines of setup. In config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"click"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[data-tab='notifications']"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wait"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Email preferences"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are 14 action types: click, type, hover, select_option, press_key, drag, wait, navigate, evaluate, fill_form, handle_dialog, file_upload, resize, and hide. Covers pretty much every pre-screenshot setup I've needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use Playwright directly
&lt;/h2&gt;

&lt;p&gt;If you need complex conditional logic, dynamic data generation, or integration with a test framework, raw Playwright is the right tool.&lt;/p&gt;

&lt;p&gt;But if you're just taking pictures of known pages at known states, config is simpler, more readable, and doesn't break when someone renames a helper function.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>automation</category>
      <category>documentation</category>
    </item>
    <item>
      <title>I Stopped Paying $15/Month for Wispr Flow. Here's the Open-Source Replacement.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Mon, 16 Mar 2026 08:58:03 +0000</pubDate>
      <link>https://dev.to/omachala/i-stopped-paying-15month-for-wispr-flow-heres-the-open-source-replacement-313i</link>
      <guid>https://dev.to/omachala/i-stopped-paying-15month-for-wispr-flow-heres-the-open-source-replacement-313i</guid>
      <description>&lt;p&gt;I paid for Wispr Flow for five months.&lt;/p&gt;

&lt;p&gt;A monthly subscription. Every month. For voice-to-text on my iPhone.&lt;/p&gt;

&lt;p&gt;It's a good product. The AI editing layer is genuinely impressive — it strips filler words, fixes grammar, adapts to how you write. That part works. If you want the best cloud-based dictation and don't mind paying, Wispr delivers.&lt;/p&gt;

&lt;p&gt;But every time I used it, the same thought: &lt;em&gt;my voice is going to their cloud. Not my cloud. Theirs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I already run a home server. Docker Compose, Tailscale, the usual homelab stack. I had faster-whisper running for other things. The transcription engine was already there. I just didn't have a way to use it from my phone.&lt;/p&gt;

&lt;p&gt;So I built one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the switch actually looked like
&lt;/h2&gt;

&lt;p&gt;The server side was easy. I already had the transcription container. I wrote a small Go gateway to handle WebSocket streaming from the phone, and wrapped both in a compose file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;transcription&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;models:/root/.cache/huggingface&lt;/span&gt;

  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;small&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;transcription&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;docker compose up -d&lt;/code&gt; and it's running.&lt;/p&gt;

&lt;p&gt;The hard part was the iOS keyboard. Keyboard extensions on iOS run in a sandbox with a 48MB memory ceiling, no direct mic access without Full Access, and a text proxy that behaves differently in every app. That took months, not hours.&lt;/p&gt;

&lt;p&gt;The result is &lt;a href="https://diction.one" rel="noopener noreferrer"&gt;Diction&lt;/a&gt; — a voice keyboard that connects to whatever transcription server you point it at.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's honestly worse
&lt;/h2&gt;

&lt;p&gt;Wispr's AI editing layer is better than raw transcription. It doesn't just transcribe — it rewrites. Filler words vanish, punctuation lands correctly, and it matches your tone. Diction transcribes what you say. It has optional AI cleanup now, but Wispr's has had years of refinement.&lt;/p&gt;

&lt;p&gt;Wispr also has a personal dictionary that learns your vocabulary over time. Diction has custom dictionaries too, but they're newer and simpler.&lt;/p&gt;

&lt;p&gt;If you don't want to think about infrastructure and just want the best cloud experience, Wispr is still a strong choice.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's better
&lt;/h2&gt;

&lt;p&gt;My audio stays on my network. I can verify that because the server code is open source — there's nothing to trust on faith.&lt;/p&gt;

&lt;p&gt;No word limits. Wispr's free tier caps you at 1,000 words/week on iOS. Self-hosted Diction has no caps, no subscription, no catch.&lt;/p&gt;

&lt;p&gt;Latency on a local network is excellent. The small Whisper model on a modern CPU returns transcriptions in 2-4 seconds. With a GPU, it's near instant.&lt;/p&gt;

&lt;p&gt;And when my internet goes down, on-device mode keeps working. Wispr is cloud-only — no connection, no transcription.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest trade-off
&lt;/h2&gt;

&lt;p&gt;I traded polish for control. Wispr is more refined. Diction gives me ownership of the entire pipeline, from the mic to the model, and it's getting better with every release.&lt;/p&gt;

&lt;p&gt;If you're already running Docker at home and the idea of sending every word you speak to someone else's server bothers you, the self-hosted setup takes about 10 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>privacy</category>
    </item>
    <item>
      <title>Astro Docs Without a Single Manual Screenshot</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Sat, 14 Mar 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/omachala/astro-docs-without-a-single-manual-screenshot-3fi8</link>
      <guid>https://dev.to/omachala/astro-docs-without-a-single-manual-screenshot-3fi8</guid>
      <description>&lt;p&gt;I set up an Astro docs site last week. Content collections, MDX pages, Starlight theme. Beautiful.&lt;/p&gt;

&lt;p&gt;Then I needed screenshots. The dashboard page, the settings panel, the onboarding flow. Three screenshots, light and dark mode each, so six images total.&lt;/p&gt;

&lt;p&gt;I spent an hour taking them by hand. Opened the app, resized the browser, captured, cropped, saved, repeated. Six times.&lt;/p&gt;

&lt;p&gt;The next sprint, the onboarding flow changed. Screenshots were already stale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config-driven screenshots
&lt;/h2&gt;

&lt;p&gt;Instead of capturing manually, define what you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputDirectory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/assets/screenshots"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"screenshots"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".dashboard-grid"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".settings-panel"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"onboarding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/onboarding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".onboarding-wizard"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three config entries, six files. Light and dark variants are included automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using them in Astro
&lt;/h2&gt;

&lt;p&gt;In your MDX file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { Image } from 'astro:assets';
import dashboard from '../../assets/screenshots/dashboard-light.png';

&amp;lt;Image src={dashboard} alt="Dashboard overview" /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or if you want automatic dark mode switching, use a &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; tag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;source&lt;/span&gt;
    &lt;span class="na"&gt;srcset=&lt;/span&gt;&lt;span class="s"&gt;"/screenshots/dashboard-dark.png"&lt;/span&gt;
    &lt;span class="na"&gt;media=&lt;/span&gt;&lt;span class="s"&gt;"(prefers-color-scheme: dark)"&lt;/span&gt;
  &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"/screenshots/dashboard-light.png"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"Dashboard"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Heroshot has a shortcut for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot snippet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It generates the &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; markup for every screenshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Astro + Starlight
&lt;/h2&gt;

&lt;p&gt;If you're using Starlight (Astro's docs theme), screenshots go in &lt;code&gt;src/assets/&lt;/code&gt; so Astro optimizes them at build time. Set the output directory accordingly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputDirectory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/assets/screenshots"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Starlight handles responsive images, lazy loading, and format conversion automatically. You just provide the source PNGs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping them fresh
&lt;/h2&gt;

&lt;p&gt;Add to your workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# After every deploy&lt;/span&gt;
npx heroshot

&lt;span class="c"&gt;# Check if anything changed&lt;/span&gt;
git diff &lt;span class="nt"&gt;--name-only&lt;/span&gt; src/assets/screenshots/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If screenshots changed, commit them. If nothing changed, move on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/omachala/heroshot" rel="noopener noreferrer"&gt;Heroshot&lt;/a&gt; is open source and works with any static site generator. Astro, Next.js, VitePress, whatever. The config is framework-agnostic.&lt;/p&gt;

</description>
      <category>astro</category>
      <category>documentation</category>
      <category>webdev</category>
      <category>automation</category>
    </item>
    <item>
      <title>Keep Your MkDocs Screenshots Up to Date (Material Theme)</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 29 Jan 2026 08:51:37 +0000</pubDate>
      <link>https://dev.to/omachala/keep-your-mkdocs-screenshots-up-to-date-material-theme-4h1i</link>
      <guid>https://dev.to/omachala/keep-your-mkdocs-screenshots-up-to-date-material-theme-4h1i</guid>
      <description>&lt;p&gt;You've got MkDocs with Material theme. Dark mode works. Code blocks adapt. But your screenshots? Still stuck in light mode.&lt;/p&gt;

&lt;p&gt;Here's how to fix that with &lt;a href="https://heroshot.sh" rel="noopener noreferrer"&gt;Heroshot&lt;/a&gt; - a CLI that captures screenshots and handles light/dark variants automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install the CLI
&lt;/h2&gt;

&lt;p&gt;Pick your preferred method:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;curl (standalone binary):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://heroshot.sh/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Homebrew:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;omachala/heroshot/heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;npm:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Docker:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker pull heroshot/heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Install the Python package
&lt;/h2&gt;

&lt;p&gt;The CLI captures screenshots. The Python package provides the MkDocs macro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Configure output directory
&lt;/h2&gt;

&lt;p&gt;MkDocs serves from &lt;code&gt;docs/&lt;/code&gt;. Create &lt;code&gt;.heroshot/config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputDirectory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docs/heroshots"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-project/
├── docs/
│   ├── index.md
│   └── heroshots/    # screenshots go here
├── mkdocs.yml
└── .heroshot/
    └── config.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Capture screenshots
&lt;/h2&gt;

&lt;p&gt;Start your MkDocs dev server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mkdocs serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another terminal, run heroshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Or &lt;code&gt;npx heroshot&lt;/code&gt; / &lt;code&gt;docker run --rm -v $(pwd):/work heroshot/heroshot&lt;/code&gt; depending on install method)&lt;/p&gt;

&lt;p&gt;A browser opens with a visual picker. Navigate to &lt;code&gt;localhost:8000&lt;/code&gt;, click on elements you want to capture, name them. Close when done.&lt;/p&gt;

&lt;p&gt;You'll get two files per screenshot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dashboard-light.png&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dashboard-dark.png&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: Add the macro
&lt;/h2&gt;

&lt;p&gt;Update your &lt;code&gt;mkdocs.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;macros&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;heroshot&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use it in your markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jinja"&gt;&lt;code&gt;&lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;heroshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard overview"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The macro expands to Material's &lt;code&gt;#only-light&lt;/code&gt; / &lt;code&gt;#only-dark&lt;/code&gt; syntax. When readers toggle the theme, screenshots swap automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Keep them fresh
&lt;/h2&gt;

&lt;p&gt;When your UI changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All configured screenshots regenerate. No manual cropping, no file hunting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Screenshots without theme variants
&lt;/h2&gt;

&lt;p&gt;For diagrams or architecture images that don't need dark mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jinja"&gt;&lt;code&gt;&lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;heroshot_single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"architecture"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"System architecture"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Renders a simple &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag.&lt;/p&gt;




&lt;p&gt;Full docs: &lt;a href="https://heroshot.sh/docs/integrations/mkdocs" rel="noopener noreferrer"&gt;heroshot.sh/docs/integrations/mkdocs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example repo: &lt;a href="https://github.com/omachala/heroshot/tree/main/integrations/examples/mkdocs" rel="noopener noreferrer"&gt;github.com/omachala/heroshot/tree/main/integrations/examples/mkdocs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>documentation</category>
      <category>mkdocs</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
