<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Julia</title>
    <description>The latest articles on DEV Community by Julia (@julia_kafarska).</description>
    <link>https://dev.to/julia_kafarska</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3503874%2F2e36ee8e-96a8-4bc5-9ad1-891ac80eda63.jpg</url>
      <title>DEV Community: Julia</title>
      <link>https://dev.to/julia_kafarska</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/julia_kafarska"/>
    <language>en</language>
    <item>
      <title>I was annoyed at how much TTS costs, so I built my own and It's actually good.</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Tue, 16 Jun 2026 15:20:56 +0000</pubDate>
      <link>https://dev.to/julia_kafarska/i-was-annoyed-at-how-much-tts-costs-so-i-built-my-own-and-its-actually-good-1bl</link>
      <guid>https://dev.to/julia_kafarska/i-was-annoyed-at-how-much-tts-costs-so-i-built-my-own-and-its-actually-good-1bl</guid>
      <description>&lt;p&gt;I read a lot through text-to-speech. Articles, docs, my own drafts.&lt;/p&gt;

&lt;p&gt;Every decent app had the same two problems: a subscription, and a cloud, and to use it you upload whatever you're reading to its servers first.&lt;/p&gt;

&lt;p&gt;That second part bugged me more than the price. Reading is about as private as it gets, and it felt backwards to send all of it to someone else's machine just to hear it out loud.&lt;/p&gt;

&lt;p&gt;So I built my own. It's called Out Loud, it's MIT licensed, and it runs entirely on your device. No cloud, no account. The surprising part is that the voices are genuinely good now, good enough that I use it every day instead of the paid app I was on before.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcd56fekmayfj7oenv6i6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcd56fekmayfj7oenv6i6.png" alt="text-to-speach" width="800" height="990"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A year ago I would have assumed natural-sounding TTS needed a big cloud model. It doesn't anymore. Open-weight models like Kokoro-82M are small enough to run on a normal laptop and still sound natural rather than robotic. The hard part already existed and was open. Nobody had wrapped it into something you could just download and use.&lt;/p&gt;

&lt;p&gt;The way it's put together is the part I think is interesting. It's an Electron app. The main process owns the ONNX model, and inference runs in a worker thread so the UI never blocks while audio is being generated. The main process then exposes a local HTTP API on &lt;code&gt;127.0.0.1&lt;/code&gt; port &lt;code&gt;51730&lt;/code&gt;. The Chrome and Safari extensions don't bundle the model at all, they just send text to that local API, which keeps them tiny and means there's a single engine to maintain. &lt;code&gt;espeak-ng&lt;/code&gt; handles &lt;code&gt;phonemization&lt;/code&gt; and &lt;code&gt;onnxruntime-node&lt;/code&gt; does the on-device inference. Nothing in that chain touches the network.&lt;/p&gt;

&lt;p&gt;The API is the same whether it's the desktop UI, an extension, or a curl command hitting it. You can list voices with a GET to &lt;code&gt;/api/v1/audio/voices&lt;/code&gt;, or POST text and a voice name to &lt;code&gt;/api/v1/audio/speech&lt;/code&gt; and get a wav file back. The repo is about 99% JS and TS. The UI is React and Vite, the engine and main process are TypeScript.&lt;/p&gt;

&lt;p&gt;It's past 1,700+ downloads now, and the thing people keep bringing up is the offline and private side rather than the price or the voice count. That was the useful signal. The itch wasn't only mine. There's a real group of people who would rather run things locally than rent them.&lt;/p&gt;

&lt;p&gt;There are 50+ voices across 8 languages: English (US and UK), Japanese, Chinese, Spanish, Brazilian Portuguese, Italian, and Hindi. It runs on macOS, Windows, and Linux, plus the Chrome extension.&lt;/p&gt;

&lt;p&gt;You can hear the voices in your browser before installing anything: &lt;a href="https://out-loud.io" rel="noopener noreferrer"&gt;https://out-loud.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Code is here: &lt;a href="https://github.com/light-cloud-com/out-loud" rel="noopener noreferrer"&gt;https://github.com/light-cloud-com/out-loud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy listening!&lt;/p&gt;

</description>
      <category>tts</category>
      <category>opensource</category>
      <category>offline</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
