Voice keyboards sit in an uncomfortable position. Every app you use, every message you send, every search you type โ the keyboard is there. It sees all of it.
Most people install a keyboard and never think about this. I did, because I was building one.
Earlier this year, someone reverse-engineered a popular voice keyboard and posted their findings. The app was collecting full browser URLs, names of focused apps, on-screen text scraped via the Accessibility API, clipboard contents including data copied from password managers, and sending it all back to a server. There was a function in the binary called sendTrackResultToServer. None of this was in the privacy policy.
This is not a hypothetical. It happened. And the only reason anyone found out is because the app was installed on a machine where someone was curious enough to look.
That is the problem with closed-source software and privileged access: you cannot verify the claims. A privacy policy is a document. The code is what runs.
Full Access and what it actually enables
When iOS asks if you want to allow Full Access for a keyboard, the permission is broader than most people realise. It enables network access (how keyboards send audio for transcription or sync dictionaries). But in the wrong hands it also means the keyboard code runs in a context where it could read clipboard data, monitor app usage patterns, or transmit information alongside its legitimate function.
Diction has no QWERTY keys. There is nothing to type into it, so nothing to log in that sense. But I wanted to go further than just "we don't do the bad thing." I wanted to build it so you can verify we don't.
How I built Diction with this in mind
There are three ways Diction can process your audio, and I picked each one with this threat model in mind.
On-device is the cleanest answer. Your audio never leaves your iPhone. A local speech model handles transcription, the result comes back, and that is it. No server, no transmission, no policy to read. If you want absolute certainty, this is the mode for you.
Self-hosted is for people who want cloud-quality transcription but on infrastructure they control. You point the app at your own server. Your audio goes there and nowhere else. I have no access to what you say or what gets transcribed. The server software is open source. You can read exactly what it does before you run it.
Diction One is the hosted cloud option. Here I had to think carefully. Audio is processed in memory and discarded immediately after transcription. Nothing is written to disk. No transcriptions are stored or logged. And every transcription is encrypted with AES-256-GCM using a fresh X25519 key per request. The same standards WireGuard uses. I am not asking you to trust the policy. The implementation is in the open-source server code.
The app itself
The Diction app contains no analytics and no tracking code. No device identifiers, no usage events, no behavioural monitoring. The App Store privacy label reads "Data Not Collected." I can say that confidently because I wrote every line and there is nothing there.
What you can actually verify
The server code is public at github.com/omachala/diction. You can read the transcription handler and confirm that audio is not written anywhere. You can read the encryption implementation. If you run on-device mode, you can point a network inspector at the app and confirm no requests leave it.
I built it this way because I wanted to use this keyboard myself. And I was not willing to just trust a policy page written by someone else.
Top comments (0)