DEV Community

Cover image for Self-Host Speech-to-Text and Use It as Your iPhone Keyboard in 3 Commands
Ondrej Machala
Ondrej Machala

Posted on

Self-Host Speech-to-Text and Use It as Your iPhone Keyboard in 3 Commands

If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.

Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.

Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.


The 3 Commands

git clone https://github.com/omachala/diction
cd diction
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

That's the server running. Now install Diction on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.


What's Actually Running

The Docker Compose setup spins up two services:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"

  whisper-small:
    image: fedirz/faster-whisper-server:latest-cpu
    environment:
      WHISPER__MODEL: Systran/faster-whisper-small
      WHISPER__INFERENCE_DEVICE: cpu
Enter fullscreen mode Exit fullscreen mode

whisper-small: the transcription engine — runs open-source speech-to-text via a REST API. CPU works fine for real-time dictation.

gateway: Diction's open-source Go gateway. Handles communication between the iOS app and the transcription backend — routing, streaming, audio format conversion.

The gateway exposes port 8080. That's the URL you put into the Diction app.


Making It Accessible From Your Phone

Your phone needs to reach your server. A few options:

Tailscale (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.

http://100.x.x.x:8080
Enter fullscreen mode Exit fullscreen mode

Reverse proxy (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080. Get a subdomain, TLS, the whole thing.

https://diction.yourdomain.com
Enter fullscreen mode Exit fullscreen mode

Direct LAN (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.

http://192.168.1.100:8080
Enter fullscreen mode Exit fullscreen mode

Configuring Diction

Open the Diction app → Settings → Backend → Self-Hosted.

Enter your server URL. The app tests the connection and shows a green indicator when it's reachable.

That's it. The keyboard is now connected to your server.


Choosing a Model

You can swap the transcription model by changing the WHISPER__MODEL environment variable on the speech-to-text container. The model downloads automatically on first use.

Size RAM Speed Notes
Small ~800MB ~3-4s Good default for everyday dictation
Medium ~1.8GB ~8-12s Better with accents and background noise
Large ~3.5GB ~20-30s Highest accuracy, needs a beefy CPU or GPU

For a home server or NAS, the small model is a solid starting point. If you have a GPU, the large model gives you transcription quality that rivals most paid services.

Swap the model in your compose file:

  whisper-small:
    image: fedirz/faster-whisper-server:latest-cpu
    environment:
      WHISPER__MODEL: Systran/faster-whisper-medium
      WHISPER__INFERENCE_DEVICE: cpu
Enter fullscreen mode Exit fullscreen mode

Or add multiple model containers and set DEFAULT_MODEL on the gateway to pick which one handles requests.


Running on Lower-Power Hardware

The small model runs well on modern CPUs. For lower-power devices like a NAS or a Raspberry Pi, try the tiny model variant — it uses less RAM (~350MB) and responds faster, at the cost of some accuracy.

For real-time keyboard use, aim for sub-3 second latency. The small model on a modern CPU or the tiny model on lower-power hardware gets you there.


The App

Diction is an iOS voice keyboard. Download it, enable it in Settings → General → Keyboard → Keyboards, grant Full Access, and connect to your server.

Self-hosted mode is free with no word limits or restrictions. There's also a cloud option (Diction One) for users who don't want to run their own server.

Download on the App Store · diction.one · github.com/omachala/diction

Top comments (0)