DEV Community

Cover image for Self-Host Speech-to-Text and Use It as Your iPhone Keyboard in 3 Commands
Ondrej Machala
Ondrej Machala

Posted on • Edited on

Self-Host Speech-to-Text and Use It as Your iPhone Keyboard in 3 Commands

If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.

Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.

Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.


The 3 Commands

git clone https://github.com/omachala/diction
cd diction
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

That's the server running. Now install Diction on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.


What's Actually Running

The Docker Compose setup spins up two services:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"

  whisper-small:
    image: fedirz/faster-whisper-server:latest-cpu
    environment:
      WHISPER__MODEL: Systran/faster-whisper-small
      WHISPER__INFERENCE_DEVICE: cpu
Enter fullscreen mode Exit fullscreen mode

whisper-small: the transcription engine — runs open-source Whisper via a REST API. CPU works fine for real-time dictation.

gateway: a small open-source Go service that handles communication between the iOS app and the transcription backend. It accepts WebSocket connections from the phone, buffers audio frames, and forwards them to Whisper. This is what makes dictation feel instant instead of "record, upload, wait."

The gateway exposes port 8080. That's the URL you put into the Diction app.


Making It Accessible From Your Phone

Your phone needs to reach your server. A few options depending on your setup:

Tailscale (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.

http://100.x.x.x:8080
Enter fullscreen mode Exit fullscreen mode

Reverse proxy (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080.

https://diction.yourdomain.com
Enter fullscreen mode Exit fullscreen mode

Direct LAN (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.

http://192.168.1.100:8080
Enter fullscreen mode Exit fullscreen mode

Choosing a Model

Swap the transcription model by changing WHISPER__MODEL. The model downloads automatically on first use.

Size RAM Speed (CPU) Notes
Tiny ~350MB ~1-2s Lower accuracy, great for low-power hardware
Small ~800MB ~3-4s Good default for everyday dictation
Medium ~1.8GB ~8-12s Better with accents and background noise
Large ~3.5GB ~20-30s Highest accuracy, benefits from GPU

For most home servers, the small model hits the sweet spot — fast enough to feel real-time, accurate enough for messages and notes.

Swap it in your compose file:

  whisper-small:
    image: fedirz/faster-whisper-server:latest-cpu
    environment:
      WHISPER__MODEL: Systran/faster-whisper-medium
      WHISPER__INFERENCE_DEVICE: cpu
Enter fullscreen mode Exit fullscreen mode

Running on Lower-Power Hardware

The small model runs well on modern CPUs. For a NAS or Raspberry Pi, try tiny — less RAM (~350MB), faster responses, some accuracy trade-off. For real-time keyboard use, aim for sub-3 second round-trip. Small on a modern CPU or tiny on lower-power hardware gets you there.


Already Running a Whisper Server?

If you already have a speech-to-text container running, you don't need to spin up another one. Just run the gateway and point it at your existing server:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"
    environment:
      CUSTOM_BACKEND_URL: http://your-server:8000
      CUSTOM_BACKEND_MODEL: your-model-name
Enter fullscreen mode Exit fullscreen mode

More details on connecting to existing servers in this post.


The server and gateway are fully open source: github.com/omachala/diction

Top comments (0)