DEV Community

Andreas Paradisiotis
Andreas Paradisiotis

Posted on

I built a local screen reader that reads your screen aloud — no cloud, no API keys

I got tired of switching between reading and listening, so I built sttts — a local pipeline that watches any region of your screen, OCRs it, and speaks it aloud in real time. Everything runs on your own machine.

Demo

What it does

  1. 🖱️ You draw a rectangle on any part of your screen
  2. 📸 It snapshots that region every N seconds
  3. 🔍 Pixel diff check — skips frames where nothing changed
  4. 🧠 LightOnOCR-2-1B reads the text (runs on AMD GPU via ROCm)
  5. 🗣️ Kokoro-82M speaks it through your speakers (runs on CPU)
🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speaker
Enter fullscreen mode Exit fullscreen mode

The killer feature — auto page-turn

You can draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC — it reads the entire book hands-free, turning pages automatically.

# Draw OCR region, then draw the next-page button
uv run python capture.py --next-btn -i 2
Enter fullscreen mode Exit fullscreen mode

Models used

Both download automatically from HuggingFace on first run. No API keys, no subscriptions.

Smart idle detection

Pixel-level diff comparison means OCR and TTS only fire when something actually changed. Reading a static page? Silent. New content loaded? Speaks immediately.

# Only trigger OCR when >1% of pixels changed
uv run python capture.py --diff-threshold 1.0
Enter fullscreen mode Exit fullscreen mode

Quick start

# Install system deps
sudo apt-get install -y slop xdotool libportaudio2 libsndfile1

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and run
git clone https://github.com/paradisecy/sttts
cd sttts
uv sync
uv run python capture.py
Enter fullscreen mode Exit fullscreen mode

Use cases

  • 📖 Hands-free ebook reading (Kindle, epub readers, PDFs)
  • 📊 Financial dashboards spoken aloud as they update
  • ♿ Accessibility tool for any app that lacks screen reader support
  • 💻 Read terminal output or logs aloud while working
  • 🌐 Listen to any webpage without a browser extension

Tech stack

  • Python 3.13
  • PyTorch 2.8 + ROCm 6.3 (AMD GPU)
  • mss for fast screen capture
  • transformers for OCR
  • kokoro for TTS
  • sounddevice for audio playback
  • slop + xdotool for region selection and mouse clicks

⭐ GitHub: paradisecy/sttts

Top comments (0)