I paid for Wispr Flow for five months.
A monthly subscription. Every month. For voice-to-text on my iPhone.
It's a good product. But every time I used it, the same thought: my voice is going to their cloud. Not my cloud. Theirs.
That was enough.
I built Diction instead.
It's an iOS keyboard powered by open-source speech recognition running on your own server. Your audio goes from your phone to your machine and back. Nothing else touches it.
docker compose up -d
That's the server. Then install the keyboard, point it at your URL, and start talking.
Why Voice Keyboards Charge a Subscription
Wispr Flow runs speech-to-text on their infrastructure. Every transcription goes to their cloud. They have servers to pay for, so they charge you.
That's not a criticism — it's just the model. Cloud infrastructure costs money.
But if you already have a home server, a NAS, or a cheap VPS, you can run open-source transcription models yourself. For free. Forever.
Diction's server setup is a single docker-compose.yml:
services:
transcription:
image: fedirz/faster-whisper-server:latest-cpu
volumes:
- models:/root/.cache/huggingface
gateway:
image: ghcr.io/omachala/diction-gateway:latest
ports:
- "8080:8080"
environment:
DEFAULT_MODEL: small
depends_on:
- transcription
volumes:
models:
docker compose up -d and your transcription server is running. The gateway handles iOS-to-server communication.
The Keyboard
Diction is a keyboard extension — it replaces your standard keyboard entirely. One button. Tap it, speak, text is inserted into whatever app you're in.
It doesn't have a QWERTY layout. It doesn't have autocorrect. It does one thing: transcribe what you say, accurately.
Three modes:
- Self-Hosted — connects to your own transcription server over the internet
- On-Device — runs speech recognition locally on your iPhone, no server needed
- Diction One — our hosted cloud option if you'd rather not run Docker
For anyone already running a homelab, the self-hosted setup takes about 10 minutes.
The Quality Comparison
Wispr Flow uses cloud-based speech recognition + their own AI editing layer. The editing layer is genuinely good — it removes filler words, fixes grammar, adapts to your writing style.
Diction is pure speech-to-text transcription by default. What you say is what you get, minus obvious transcription errors.
If you want AI cleanup on your transcriptions, Diction supports that too — but it's optional, not forced.
For most use cases — emails, messages, Slack, notes — raw transcription accuracy is excellent. The AI editing layer is a nice-to-have, not a must-have.
Open Source
The server and gateway are fully open source: github.com/omachala/diction
The iOS keyboard extension has zero third-party dependencies — pure Swift and native frameworks only. The server is Go + Docker. Everything that touches your audio is auditable.
If you're paying a monthly subscription for voice-to-text, it's worth spending 10 minutes trying the self-hosted version.
Download on the App Store · diction.one · github.com/omachala/diction
Top comments (0)