If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.
Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.
Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.
The 3 Commands
git clone https://github.com/omachala/diction
cd diction
docker compose up -d
That's the server running. Now install Diction on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.
What's Actually Running
The Docker Compose setup spins up two services:
services:
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
whisper-small:
image: fedirz/faster-whisper-server:latest-cpu
environment:
WHISPER__MODEL: Systran/faster-whisper-small
WHISPER__INFERENCE_DEVICE: cpu
whisper-small: the transcription engine — runs open-source Whisper via a REST API. CPU works fine for real-time dictation.
gateway: a small open-source Go service that handles communication between the iOS app and the transcription backend. It accepts WebSocket connections from the phone, buffers audio frames, and forwards them to Whisper. This is what makes dictation feel instant instead of "record, upload, wait."
The gateway exposes port 8080. That's the URL you put into the Diction app.
Making It Accessible From Your Phone
Your phone needs to reach your server. A few options depending on your setup:
Tailscale (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.
http://100.x.x.x:8080
Reverse proxy (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080.
https://diction.yourdomain.com
Direct LAN (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.
http://192.168.1.100:8080
Choosing a Model
Swap the transcription model by changing WHISPER__MODEL. The model downloads automatically on first use.
| Size | RAM | Speed (CPU) | Notes |
|---|---|---|---|
| Tiny | ~350MB | ~1-2s | Lower accuracy, great for low-power hardware |
| Small | ~800MB | ~3-4s | Good default for everyday dictation |
| Medium | ~1.8GB | ~8-12s | Better with accents and background noise |
| Large | ~3.5GB | ~20-30s | Highest accuracy, benefits from GPU |
For most home servers, the small model hits the sweet spot — fast enough to feel real-time, accurate enough for messages and notes.
Swap it in your compose file:
whisper-small:
image: fedirz/faster-whisper-server:latest-cpu
environment:
WHISPER__MODEL: Systran/faster-whisper-medium
WHISPER__INFERENCE_DEVICE: cpu
Running on Lower-Power Hardware
The small model runs well on modern CPUs. For a NAS or Raspberry Pi, try tiny — less RAM (~350MB), faster responses, some accuracy trade-off. For real-time keyboard use, aim for sub-3 second round-trip. Small on a modern CPU or tiny on lower-power hardware gets you there.
Already Running a Whisper Server?
If you already have a speech-to-text container running, you don't need to spin up another one. Just run the gateway and point it at your existing server:
services:
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
environment:
CUSTOM_BACKEND_URL: http://your-server:8000
CUSTOM_BACKEND_MODEL: your-model-name
More details on connecting to existing servers in this post.
The server and gateway are fully open source: github.com/omachala/diction
Top comments (0)