If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.
Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.
Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.
The 3 Commands
git clone https://github.com/omachala/diction
cd diction
docker compose --profile small up -d
That's the server running. Now install Diction on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.
The --profile small flag picks which model to run. The repo also ships medium, large, and parakeet profiles — more on those below.
What's Actually Running
The compose file spins up two services:
services:
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
environment:
DEFAULT_MODEL: small
whisper-small:
image: fedirz/faster-whisper-server:latest-cpu
profiles: ["small"]
environment:
WHISPER__MODEL: Systran/faster-whisper-small
WHISPER__INFERENCE_DEVICE: cpu
whisper-small: the transcription engine — runs open-source Whisper via a REST API. CPU works fine for real-time dictation.
gateway: a small open-source Go service that handles communication between the iOS app and the transcription backend. It accepts WebSocket connections from the phone, buffers audio frames, and forwards them to Whisper. This is what makes dictation feel instant instead of "record, upload, wait."
The gateway exposes port 8080. That's the URL you put into the Diction app.
Making It Accessible From Your Phone
Your phone needs to reach your server. A few options depending on your setup:
Tailscale (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.
http://100.x.x.x:8080
Reverse proxy (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080.
https://diction.yourdomain.com
Direct LAN (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.
http://192.168.1.100:8080
Choosing a Model
The repo ships four profiles. Pick one at docker compose --profile <name> up -d:
| Profile | Model | RAM | Speed (CPU) | Notes |
|---|---|---|---|---|
small |
Whisper small | ~850 MB | ~3-4s | Good default for everyday dictation |
medium |
Whisper medium | ~2.1 GB | ~8-12s | Better with accents and background noise |
large |
Whisper large-v3-turbo | ~2.3 GB | <2s on GPU | Highest accuracy, benefits from GPU |
parakeet |
NVIDIA Parakeet TDT v3 | ~2 GB | ~10x faster than Whisper | 25 European languages, more accurate than Whisper for English |
For most home servers running English or mixed multilingual, small hits the sweet spot. If you dictate mostly in a European language (German, French, Spanish, Italian, Polish, Czech, …), Parakeet is the better engine.
Switching profiles: tear down and bring up with the new profile. The DEFAULT_MODEL on the gateway is already wired in the compose file, so no extra config needed.
docker compose down
docker compose --profile parakeet up -d
Models download automatically on first start and cache in a shared Docker volume.
Already Running a Whisper Server?
If you already have a speech-to-text container running, you don't need to spin up another one. Just run the gateway and point it at your existing server:
services:
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
environment:
CUSTOM_BACKEND_URL: http://your-server:8000
CUSTOM_BACKEND_MODEL: your-model-name
More details on connecting to existing servers in this post.
The server and gateway are fully open source: github.com/omachala/diction
Top comments (0)