Your iPhone Keyboard Is Phoning Home. Stop It in 3 Commands.

#selfhosted #docker #ios #productivity

If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.

Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.

Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.

The 3 Commands

git clone https://github.com/omachala/diction
cd diction
docker compose --profile small up -d

That's the server running. Now install Diction on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.

The --profile small flag picks which model to run. The repo also ships medium, large, and parakeet profiles — more on those below.

What's Actually Running

The compose file spins up two services:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"
    environment:
      DEFAULT_MODEL: small

  whisper-small:
    image: fedirz/faster-whisper-server:latest-cpu
    profiles: ["small"]
    environment:
      WHISPER__MODEL: Systran/faster-whisper-small
      WHISPER__INFERENCE_DEVICE: cpu

whisper-small: the transcription engine — runs open-source Whisper via a REST API. CPU works fine for real-time dictation.

gateway: a small open-source Go service that handles communication between the iOS app and the transcription backend. It accepts WebSocket connections from the phone, buffers audio frames, and forwards them to Whisper. This is what makes dictation feel instant instead of "record, upload, wait."

The gateway exposes port 8080. That's the URL you put into the Diction app.

Making It Accessible From Your Phone

Your phone needs to reach your server. A few options depending on your setup:

Tailscale (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.

http://100.x.x.x:8080

Reverse proxy (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080.

https://diction.yourdomain.com

Direct LAN (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.

http://192.168.1.100:8080

Choosing a Model

The repo ships four profiles. Pick one at docker compose --profile <name> up -d:

Profile	Model	RAM	Speed (CPU)	Notes
`small`	Whisper small	~850 MB	~3-4s	Good default for everyday dictation
`medium`	Whisper medium	~2.1 GB	~8-12s	Better with accents and background noise
`large`	Whisper large-v3-turbo	~2.3 GB	<2s on GPU	Highest accuracy, benefits from GPU
`parakeet`	NVIDIA Parakeet TDT v3	~2 GB	~10x faster than Whisper	25 European languages, more accurate than Whisper for English

For most home servers running English or mixed multilingual, small hits the sweet spot. If you dictate mostly in a European language (German, French, Spanish, Italian, Polish, Czech, …), Parakeet is the better engine.

Switching profiles: tear down and bring up with the new profile. The DEFAULT_MODEL on the gateway is already wired in the compose file, so no extra config needed.

docker compose down
docker compose --profile parakeet up -d

Models download automatically on first start and cache in a shared Docker volume.

Already Running a Whisper Server?

If you already have a speech-to-text container running, you don't need to spin up another one. Just run the gateway and point it at your existing server:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"
    environment:
      CUSTOM_BACKEND_URL: http://your-server:8000
      CUSTOM_BACKEND_MODEL: your-model-name