If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.
Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.
Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.
The 3 Commands
git clone https://github.com/omachala/diction
cd diction
docker compose up -d
That's the server running. Now install Diction on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.
What's Actually Running
The Docker Compose setup spins up two services:
services:
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
whisper-small:
image: fedirz/faster-whisper-server:latest-cpu
environment:
WHISPER__MODEL: Systran/faster-whisper-small
WHISPER__INFERENCE_DEVICE: cpu
whisper-small: the transcription engine — runs open-source speech-to-text via a REST API. CPU works fine for real-time dictation.
gateway: Diction's open-source Go gateway. Handles communication between the iOS app and the transcription backend — routing, streaming, audio format conversion.
The gateway exposes port 8080. That's the URL you put into the Diction app.
Making It Accessible From Your Phone
Your phone needs to reach your server. A few options:
Tailscale (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.
http://100.x.x.x:8080
Reverse proxy (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080. Get a subdomain, TLS, the whole thing.
https://diction.yourdomain.com
Direct LAN (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.
http://192.168.1.100:8080
Configuring Diction
Open the Diction app → Settings → Backend → Self-Hosted.
Enter your server URL. The app tests the connection and shows a green indicator when it's reachable.
That's it. The keyboard is now connected to your server.
Choosing a Model
You can swap the transcription model by changing the WHISPER__MODEL environment variable on the speech-to-text container. The model downloads automatically on first use.
| Size | RAM | Speed | Notes |
|---|---|---|---|
| Small | ~800MB | ~3-4s | Good default for everyday dictation |
| Medium | ~1.8GB | ~8-12s | Better with accents and background noise |
| Large | ~3.5GB | ~20-30s | Highest accuracy, needs a beefy CPU or GPU |
For a home server or NAS, the small model is a solid starting point. If you have a GPU, the large model gives you transcription quality that rivals most paid services.
Swap the model in your compose file:
whisper-small:
image: fedirz/faster-whisper-server:latest-cpu
environment:
WHISPER__MODEL: Systran/faster-whisper-medium
WHISPER__INFERENCE_DEVICE: cpu
Or add multiple model containers and set DEFAULT_MODEL on the gateway to pick which one handles requests.
Running on Lower-Power Hardware
The small model runs well on modern CPUs. For lower-power devices like a NAS or a Raspberry Pi, try the tiny model variant — it uses less RAM (~350MB) and responds faster, at the cost of some accuracy.
For real-time keyboard use, aim for sub-3 second latency. The small model on a modern CPU or the tiny model on lower-power hardware gets you there.
The App
Diction is an iOS voice keyboard. Download it, enable it in Settings → General → Keyboard → Keyboards, grant Full Access, and connect to your server.
Self-hosted mode is free with no word limits or restrictions. There's also a cloud option (Diction One) for users who don't want to run their own server.
Download on the App Store · diction.one · github.com/omachala/diction
Top comments (0)