I built a local screen reader that reads your screen aloud — no cloud, no API keys

Andreas Paradisiotis — Sat, 11 Apr 2026 12:30:17 +0000

I got tired of switching between reading and listening, so I built sttts — a local pipeline that watches any region of your screen, OCRs it, and speaks it aloud in real time. Everything runs on your own machine.

Demo

What it does

🖱️ You draw a rectangle on any part of your screen
📸 It snapshots that region every N seconds
🔍 Pixel diff check — skips frames where nothing changed
🧠 LightOnOCR-2-1B reads the text (runs on AMD GPU via ROCm)
🗣️ Kokoro-82M speaks it through your speakers (runs on CPU)

🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speaker

The killer feature — auto page-turn

You can draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC — it reads the entire book hands-free, turning pages automatically.

# Draw OCR region, then draw the next-page button
uv run python capture.py --next-btn -i 2

Models used

OCR: LightOnOCR-2-1B — fast, accurate, runs on AMD GPU via ROCm
TTS: Kokoro-82M — high quality, ~100ms latency on CPU

Both download automatically from HuggingFace on first run. No API keys, no subscriptions.

Smart idle detection

Pixel-level diff comparison means OCR and TTS only fire when something actually changed. Reading a static page? Silent. New content loaded? Speaks immediately.

# Only trigger OCR when >1% of pixels changed
uv run python capture.py --diff-threshold 1.0

Quick start

# Install system deps
sudo apt-get install -y slop xdotool libportaudio2 libsndfile1

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and run
git clone https://github.com/paradisecy/sttts
cd sttts
uv sync
uv run python capture.py

Use cases

📖 Hands-free ebook reading (Kindle, epub readers, PDFs)
📊 Financial dashboards spoken aloud as they update
♿ Accessibility tool for any app that lacks screen reader support
💻 Read terminal output or logs aloud while working
🌐 Listen to any webpage without a browser extension

Tech stack

Python 3.13
PyTorch 2.8 + ROCm 6.3 (AMD GPU)
mss for fast screen capture
transformers for OCR
kokoro for TTS
sounddevice for audio playback
slop + xdotool for region selection and mouse clicks

⭐ GitHub: paradisecy/sttts

DEV Community: Andreas Paradisiotis