DEV Community

Cover image for I Stopped Paying $15/Month for Wispr Flow. Here's the Open-Source Replacement.
Ondrej Machala
Ondrej Machala

Posted on

I Stopped Paying $15/Month for Wispr Flow. Here's the Open-Source Replacement.

I paid for Wispr Flow for five months.

A monthly subscription. Every month. For voice-to-text on my iPhone.

It's a good product. But every time I used it, the same thought: my voice is going to their cloud. Not my cloud. Theirs.

That was enough.


I built Diction instead.

It's an iOS keyboard powered by open-source speech recognition running on your own server. Your audio goes from your phone to your machine and back. Nothing else touches it.

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

That's the server. Then install the keyboard, point it at your URL, and start talking.


Why Voice Keyboards Charge a Subscription

Wispr Flow runs speech-to-text on their infrastructure. Every transcription goes to their cloud. They have servers to pay for, so they charge you.

That's not a criticism — it's just the model. Cloud infrastructure costs money.

But if you already have a home server, a NAS, or a cheap VPS, you can run open-source transcription models yourself. For free. Forever.

Diction's server setup is a single docker-compose.yml:

services:
  transcription:
    image: fedirz/faster-whisper-server:latest-cpu
    volumes:
      - models:/root/.cache/huggingface

  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"
    environment:
      DEFAULT_MODEL: small
    depends_on:
      - transcription

volumes:
  models:
Enter fullscreen mode Exit fullscreen mode

docker compose up -d and your transcription server is running. The gateway handles iOS-to-server communication.


The Keyboard

Diction is a keyboard extension — it replaces your standard keyboard entirely. One button. Tap it, speak, text is inserted into whatever app you're in.

It doesn't have a QWERTY layout. It doesn't have autocorrect. It does one thing: transcribe what you say, accurately.

Three modes:

  • Self-Hosted — connects to your own transcription server over the internet
  • On-Device — runs speech recognition locally on your iPhone, no server needed
  • Diction One — our hosted cloud option if you'd rather not run Docker

For anyone already running a homelab, the self-hosted setup takes about 10 minutes.


The Quality Comparison

Wispr Flow uses cloud-based speech recognition + their own AI editing layer. The editing layer is genuinely good — it removes filler words, fixes grammar, adapts to your writing style.

Diction is pure speech-to-text transcription by default. What you say is what you get, minus obvious transcription errors.

If you want AI cleanup on your transcriptions, Diction supports that too — but it's optional, not forced.

For most use cases — emails, messages, Slack, notes — raw transcription accuracy is excellent. The AI editing layer is a nice-to-have, not a must-have.


Open Source

The server and gateway are fully open source: github.com/omachala/diction

The iOS keyboard extension has zero third-party dependencies — pure Swift and native frameworks only. The server is Go + Docker. Everything that touches your audio is auditable.


If you're paying a monthly subscription for voice-to-text, it's worth spending 10 minutes trying the self-hosted version.

Download on the App Store · diction.one · github.com/omachala/diction

Top comments (0)