Apple Dictation Is Fine. Until You Try to Use It for Real Work.

#ios #swift #productivity #opensource

Apple Dictation is fine for short bursts.

A quick text. A search query. A one-sentence reminder. For those, it works.

But try dictating a long email, a paragraph of notes, or anything with technical terms, and you'll hit its limits fast. Word substitutions. Missed words at sentence boundaries. Foreign names garbled beyond recognition.

It's not that Apple Dictation is bad. It's that modern open-source speech recognition has gotten significantly better — and Apple's built-in option hasn't kept up.

So I built an alternative.

The Problem With Apple Dictation

Apple Dictation is free and built-in. Those are its best features.

Everything else is a compromise:

Quality: Apple's on-device model lags behind modern open-source alternatives significantly. The best open-source speech recognition models were trained on hundreds of thousands of hours of audio across dozens of languages. Apple's model is optimized for power efficiency on-device — not accuracy.

Context loss: Apple Dictation doesn't persist context between sessions. Every time you tap the mic, it starts fresh. No speaker adaptation, no learning your vocabulary.

Integration limits: Apple Dictation only activates in specific spots. The keyboard mic button. The search bar. Some text fields. Not all text fields, not all apps, not all contexts.

No customization: You get one model. You can't upgrade it, replace it, or augment it.

What I Built

Diction is a replacement keyboard — a keyboard extension that lives in the keyboard picker alongside your other keyboards.

It has one button. The mic.

Tap it. Speak. Text appears in whatever app you're using.

Under the hood, it uses state-of-the-art open-source speech recognition — the same kind of models that power most of the paid voice keyboard apps. But I wanted control over where the audio goes, so I built a self-hosted server setup alongside it.

Why Open-Source Speech Recognition Is Better

Modern open-source speech recognition takes a fundamentally different approach.

Apple's model is designed to run in real-time on constrained hardware, making continuous small decisions. Open-source models process whole audio chunks — they get the full context of what you said before committing to a transcription.

The practical difference: these models handle sentence endings, mid-sentence pauses, and foreign words significantly better. If you dictate a long complex sentence, open-source models get more of it right.

They also handle accents substantially better. Apple Dictation has well-documented struggles with non-American-English accents. The best open-source models were trained on multilingual data and generalize better across speakers.

Three Modes

On-Device: Download a model to your iPhone. No server needed. No network. Works on a plane. The Standard model (142MB) is free and fast. Advanced (632MB) is free with higher accuracy.

Self-Hosted: Run your own transcription server. Docker Compose setup is included in the public repo. Your audio goes to your hardware — not anyone else's.

docker compose up -d

Point Diction at your server URL. Done.

Diction One: If you don't want to manage infrastructure, our hosted cloud option handles everything for you.

Building a Keyboard Extension in Swift

Keyboard extensions in iOS are oddly limited. They run in a sandboxed extension process with strict memory limits. And critically, iOS completely blocks microphone access from keyboard extensions — you can't record audio in a keyboard, full stop.

Diction works around this by running the main app in the background — the keyboard extension communicates with the main app via App Group shared storage and Darwin notifications. The app handles the actual recording, transcription, and network calls, then sends results back to the keyboard for insertion.

It's more complex than it looks. But it means the keyboard works reliably across iOS 17 and 18.

Open Source

The full server setup is open source at github.com/omachala/diction.

Docker Compose file, gateway code, API spec — all of it. Run it on your own hardware, modify it, improve it.

The iOS app is pure Swift with zero analytics, zero tracking, and zero telemetry. The keyboard extension itself has no third-party dependencies at all.

If you're tired of Apple Dictation mangling your words, there's a better option — and the core experience is free.

Download on the App Store · diction.one · github.com/omachala/diction