I paid for Wispr Flow for five months.
A monthly subscription. Every month. For voice-to-text on my iPhone.
It's a good product. The AI editing layer is genuinely impressive — it strips filler words, fixes grammar, adapts to how you write. That part works. If you want the best cloud-based dictation and don't mind paying, Wispr delivers.
But every time I used it, the same thought: my voice is going to their cloud. Not my cloud. Theirs.
I already run a home server. Docker Compose, Tailscale, the usual homelab stack. I had faster-whisper running for other things. The transcription engine was already there. I just didn't have a way to use it from my phone.
So I built one.
What the switch actually looked like
The server side was easy. I already had the transcription container. I wrote a small Go gateway to handle WebSocket streaming from the phone, and wrapped both in a compose file:
services:
transcription:
image: fedirz/faster-whisper-server:latest-cpu
volumes:
- models:/root/.cache/huggingface
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
environment:
DEFAULT_MODEL: small
depends_on:
- transcription
volumes:
models:
docker compose up -d and it's running.
The hard part was the iOS keyboard. Keyboard extensions on iOS run in a sandbox with a 48MB memory ceiling, no direct mic access without Full Access, and a text proxy that behaves differently in every app. That took months, not hours.
The result is Diction — a voice keyboard that connects to whatever transcription server you point it at.
What's honestly worse
Wispr's AI editing layer is better than raw transcription. It doesn't just transcribe — it rewrites. Filler words vanish, punctuation lands correctly, and it matches your tone. Diction transcribes what you say. It has optional AI cleanup now, but Wispr's has had years of refinement.
Wispr also has a personal dictionary that learns your vocabulary over time. Diction has custom dictionaries too, but they're newer and simpler.
If you don't want to think about infrastructure and just want the best cloud experience, Wispr is still a strong choice.
What's better
My audio stays on my network. I can verify that because the server code is open source — there's nothing to trust on faith.
No word limits. Wispr's free tier caps you at 1,000 words/week on iOS. Self-hosted Diction has no caps, no subscription, no catch.
Latency on a local network is excellent. The small Whisper model on a modern CPU returns transcriptions in 2-4 seconds. With a GPU, it's near instant.
And when my internet goes down, on-device mode keeps working. Wispr is cloud-only — no connection, no transcription.
The honest trade-off
I traded polish for control. Wispr is more refined. Diction gives me ownership of the entire pipeline, from the mic to the model, and it's getting better with every release.
If you're already running Docker at home and the idea of sending every word you speak to someone else's server bothers you, the self-hosted setup takes about 10 minutes.
Top comments (0)