<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andreas Paradisiotis</title>
    <description>The latest articles on DEV Community by Andreas Paradisiotis (@paradisecy).</description>
    <link>https://dev.to/paradisecy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873495%2Fecce284d-4090-4227-bd33-ae632794df07.png</url>
      <title>DEV Community: Andreas Paradisiotis</title>
      <link>https://dev.to/paradisecy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/paradisecy"/>
    <language>en</language>
    <item>
      <title>I built a local screen reader that reads your screen aloud — no cloud, no API keys</title>
      <dc:creator>Andreas Paradisiotis</dc:creator>
      <pubDate>Sat, 11 Apr 2026 12:30:17 +0000</pubDate>
      <link>https://dev.to/paradisecy/i-built-a-local-screen-reader-that-reads-your-screen-aloud-no-cloud-no-api-keys-m9</link>
      <guid>https://dev.to/paradisecy/i-built-a-local-screen-reader-that-reads-your-screen-aloud-no-cloud-no-api-keys-m9</guid>
      <description>&lt;p&gt;I got tired of switching between reading and listening, so I built &lt;strong&gt;sttts&lt;/strong&gt; — a local pipeline that watches any region of your screen, OCRs it, and speaks it aloud in real time. Everything runs on your own machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/nfkXIqK8Llg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;🖱️ You draw a rectangle on any part of your screen&lt;/li&gt;
&lt;li&gt;📸 It snapshots that region every N seconds&lt;/li&gt;
&lt;li&gt;🔍 Pixel diff check — skips frames where nothing changed&lt;/li&gt;
&lt;li&gt;🧠 LightOnOCR-2-1B reads the text (runs on AMD GPU via ROCm)&lt;/li&gt;
&lt;li&gt;🗣️ Kokoro-82M speaks it through your speakers (runs on CPU)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speaker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The killer feature — auto page-turn
&lt;/h2&gt;

&lt;p&gt;You can draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC — it reads the entire book hands-free, turning pages automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Draw OCR region, then draw the next-page button&lt;/span&gt;
uv run python capture.py &lt;span class="nt"&gt;--next-btn&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Models used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OCR&lt;/strong&gt;: &lt;a href="https://huggingface.co/lightonai/LightOnOCR-2-1B" rel="noopener noreferrer"&gt;LightOnOCR-2-1B&lt;/a&gt; — fast, accurate, runs on AMD GPU via ROCm&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TTS&lt;/strong&gt;: &lt;a href="https://huggingface.co/hexgrad/Kokoro-82M" rel="noopener noreferrer"&gt;Kokoro-82M&lt;/a&gt; — high quality, ~100ms latency on CPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both download automatically from HuggingFace on first run. No API keys, no subscriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Smart idle detection
&lt;/h2&gt;

&lt;p&gt;Pixel-level diff comparison means OCR and TTS only fire when something &lt;strong&gt;actually changed&lt;/strong&gt;. Reading a static page? Silent. New content loaded? Speaks immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Only trigger OCR when &amp;gt;1% of pixels changed&lt;/span&gt;
uv run python capture.py &lt;span class="nt"&gt;--diff-threshold&lt;/span&gt; 1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install system deps&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; slop xdotool libportaudio2 libsndfile1

&lt;span class="c"&gt;# Install uv&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh

&lt;span class="c"&gt;# Clone and run&lt;/span&gt;
git clone https://github.com/paradisecy/sttts
&lt;span class="nb"&gt;cd &lt;/span&gt;sttts
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run python capture.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📖 Hands-free ebook reading (Kindle, epub readers, PDFs)&lt;/li&gt;
&lt;li&gt;📊 Financial dashboards spoken aloud as they update&lt;/li&gt;
&lt;li&gt;♿ Accessibility tool for any app that lacks screen reader support&lt;/li&gt;
&lt;li&gt;💻 Read terminal output or logs aloud while working&lt;/li&gt;
&lt;li&gt;🌐 Listen to any webpage without a browser extension&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.13&lt;/li&gt;
&lt;li&gt;PyTorch 2.8 + ROCm 6.3 (AMD GPU)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mss&lt;/code&gt; for fast screen capture&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;transformers&lt;/code&gt; for OCR&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kokoro&lt;/code&gt; for TTS&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sounddevice&lt;/code&gt; for audio playback&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;slop&lt;/code&gt; + &lt;code&gt;xdotool&lt;/code&gt; for region selection and mouse clicks&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;⭐ GitHub: &lt;a href="https://github.com/paradisecy/sttts" rel="noopener noreferrer"&gt;paradisecy/sttts&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>a11y</category>
    </item>
  </channel>
</rss>
