<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: justin chen</title>
    <description>The latest articles on DEV Community by justin chen (@chendrizzy).</description>
    <link>https://dev.to/chendrizzy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4002632%2F6fd1ba67-a962-445d-91a8-a49a83fa1bcb.png</url>
      <title>DEV Community: justin chen</title>
      <link>https://dev.to/chendrizzy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chendrizzy"/>
    <language>en</language>
    <item>
      <title>How I Built a Local TTS Daemon That Actually Knows When to Shut Up (claude-tts v0.1.0)</title>
      <dc:creator>justin chen</dc:creator>
      <pubDate>Thu, 25 Jun 2026 15:41:30 +0000</pubDate>
      <link>https://dev.to/chendrizzy/how-i-built-a-local-tts-daemon-that-actually-knows-when-to-shut-up-claude-tts-v010-2431</link>
      <guid>https://dev.to/chendrizzy/how-i-built-a-local-tts-daemon-that-actually-knows-when-to-shut-up-claude-tts-v010-2431</guid>
      <description>&lt;p&gt;I run Claude Code — Anthropic's CLI coding agent — for long builds and test runs. The agent does real work: it edits files, runs tests, reads errors, tries again. The problem is I need to be somewhere else. I can't watch the terminal.&lt;/p&gt;

&lt;p&gt;The obvious answer is text-to-speech. The naive implementation is catastrophic. Five minutes of listening to your computer narrate &lt;code&gt;====&amp;gt;  eslint  --ext .ts --ignore-path .gitignore .  &amp;amp;&amp;amp;  tsc --noEmit  |  grep  -E 'error TS'&lt;/code&gt; will make you never try this again.&lt;/p&gt;

&lt;p&gt;So the real problem — the engineering problem — is filtering. Not speaking everything. Not speaking nothing. Speaking the right slice: status pivots, errors, final answers. And staying quiet through the noise.&lt;/p&gt;

&lt;p&gt;I spent a few weeks building this as a Claude Code plugin. It's called &lt;a href="https://github.com/chendrizzy/claude-tts" rel="noopener noreferrer"&gt;claude-tts&lt;/a&gt;, it's v0.1.0, MIT licensed, and this is how it works.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Four Moving Parts
&lt;/h2&gt;

&lt;p&gt;The system is four pipeline stages connected over a Unix socket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code hooks
      │  (raw agent event, JSON over socket)
      ▼
ContentRouter (filter brain)
      │  should_speak? → text to speak
      ▼
GenerateStage (TTS synthesis)
      │  audio file
      ▼
PlaybackStage (OS audio)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code fires hooks at &lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, and &lt;code&gt;Stop&lt;/code&gt;. Each hook sends an event payload to the daemon over a Unix socket (&lt;code&gt;~/.local/share/claude-tts/claude-tts.sock&lt;/code&gt; on XDG systems, &lt;code&gt;/tmp/claude-tts.sock&lt;/code&gt; as fallback). The daemon processes these asynchronously; hooks return immediately and don't block the agent.&lt;/p&gt;

&lt;p&gt;The audio side is swappable: Kokoro MLX on Apple Silicon for local neural TTS, &lt;code&gt;edge-tts&lt;/code&gt; for Azure voices (needs internet), or the zero-dependency fallback — macOS &lt;code&gt;say&lt;/code&gt; / Linux &lt;code&gt;espeak&lt;/code&gt;. The LLM side is equally swappable: Ollama by default, any OpenAI-compatible endpoint (LM Studio, llama.cpp server, vLLM, Groq), or &lt;code&gt;null&lt;/code&gt; for fully deterministic operation with no model at all.&lt;/p&gt;

&lt;p&gt;But the interesting part is the filter.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Filter Brain: Why Deterministic Rules Alone Produce Spoken Gibberish
&lt;/h2&gt;

&lt;p&gt;My first pass was purely rule-based. I wrote regexes: speak lines that look like test results (&lt;code&gt;N passed&lt;/code&gt;, &lt;code&gt;N failed&lt;/code&gt;), speak lines that look like errors, drop everything else.&lt;/p&gt;

&lt;p&gt;The result was still gibberish. Not because the rules were wrong — they correctly classified the &lt;em&gt;class&lt;/em&gt; of content — but because individual tool outputs routinely contain content that passes classification while being unspeakable as audio.&lt;/p&gt;

&lt;p&gt;Here's a concrete example. A Bash tool output from a linting run might classify as "error output" (correct — it contains errors) and pass the should-speak gate, then get handed to TTS as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;src/auth/middleware.ts:47:12 - error TS2345: Argument of type 'string | undefined'
is not assignable to parameter of type 'string'.

&lt;/span&gt;&lt;span class="gp"&gt;46    const token = headers['authorization']?.split(' ')[1];&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not speakable. It's a syntax-dense, whitespace-dependent error format. TTS will read it character by character, including the tildes.&lt;/p&gt;

&lt;p&gt;But more insidious: a lot of agent output that &lt;em&gt;looks&lt;/em&gt; like structured data to a human eye is actually gibberish tokens to a TTS engine. Git commit SHAs (&lt;code&gt;a3f9b2c1d4e5&lt;/code&gt;), UUIDs (&lt;code&gt;f47ac10b-58cc-4372-a567-0e02b2c3d479&lt;/code&gt;), base64 blobs, hex color codes, diff hunk headers (&lt;code&gt;@@ -23,7 +28,4 @@&lt;/code&gt;), &lt;code&gt;ls -l&lt;/code&gt; permission triplets — these all slip through regex classifiers because they look like "output" but sound like keyboard mashing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Two-Stage Filter
&lt;/h3&gt;

&lt;p&gt;The solution is two gates in sequence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 1: Should we speak this event at all?&lt;/strong&gt; This is the &lt;code&gt;ContentRouter&lt;/code&gt;. For structured signals — test counts, error totals, build success/failure, final agent answers — the answer is deterministic. The router knows what a test result line looks like (&lt;code&gt;\d+ passed&lt;/code&gt;, &lt;code&gt;\d+ failed&lt;/code&gt;). It knows a &lt;code&gt;Stop&lt;/code&gt; hook event is the agent's final answer and should almost always be spoken. It knows a &lt;code&gt;Read&lt;/code&gt; tool invocation is never worth narrating.&lt;/p&gt;

&lt;p&gt;For the ambiguous middle — a Bash tool that ran something you can't immediately classify — the router consults a local LLM judge. The judge receives the raw stdout alongside compact context built from signals already on the event: which tool ran, a short target hint (e.g. "ran the tests in test_router"), and whether similar output was recently spoken this session. The verdict must be exactly &lt;code&gt;SPEAK&lt;/code&gt;; anything else is treated as &lt;code&gt;SKIP&lt;/code&gt;. This means model weirdness degrades to silence, not spoken garbage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 2: Is the text actually speakable?&lt;/strong&gt; Even after gate 1 passes, the text gets a speakability check in &lt;code&gt;is_speakable()&lt;/code&gt; in &lt;code&gt;daemon/text_utils.py&lt;/code&gt;. Several things happen here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Strip code artifacts: &lt;code&gt;===&lt;/code&gt; → &lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; → &lt;code&gt;and&lt;/code&gt;, &lt;code&gt;!=&lt;/code&gt; → &lt;code&gt;not equal&lt;/code&gt;, &lt;code&gt;=&amp;gt;&lt;/code&gt; → &lt;code&gt;to&lt;/code&gt;. Drop git SHAs (hex strings mixing letters and digits in a run of 7+), UUIDs, base64 blobs, &lt;code&gt;@@...@@&lt;/code&gt; diff hunks, ISO-8601 timestamps, env-var assignment dumps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check that the normalized text contains real words. The daemon loads &lt;code&gt;/usr/share/dict/words&lt;/code&gt; on macOS; on Linux (where the system dictionary is absent by default) it falls back to a bundled public-domain wordlist (&lt;code&gt;daemon/data/words.txt.gz&lt;/code&gt;). A conservative inflectional stemmer handles the fact that most system dictionaries store base forms only — "passed" isn't in the dictionary, but strip &lt;code&gt;-ed&lt;/code&gt; and check "pass" is.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Drop text where the real-word ratio is too low, or where a vowelless non-acronym token dominates in a low-real-word context.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The precision metric across roughly 4,500 real speak decisions in my production shadow log — after iterating through nine rounds of filter refinement — reached 0% spoken code-artifact gibberish. Getting there took nine rounds of iteration: early passes got the markup class to zero but were blind to non-markup gibberish (orphan punctuation, non-word tokens from ps output and agent IDs); later rounds addressed those.&lt;/p&gt;

&lt;p&gt;Here's a terminal transcript showing the pipeline in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Agent runs: pytest tests/ -v&lt;/span&gt;

&lt;span class="c"&gt;# [daemon receives PostToolUse event for Bash]&lt;/span&gt;
&lt;span class="c"&gt;# ContentRouter:&lt;/span&gt;
&lt;span class="c"&gt;#   tool=Bash, cmd="pytest tests/ -v"&lt;/span&gt;
&lt;span class="c"&gt;#   → _is_test_command() True&lt;/span&gt;
&lt;span class="c"&gt;#   → extracts "23 passed, 2 failed" from stdout&lt;/span&gt;
&lt;span class="c"&gt;#   → context_hint: "test result from the test suite"&lt;/span&gt;
&lt;span class="c"&gt;#   → should_speak: True, route: test_result&lt;/span&gt;
&lt;span class="c"&gt;# GenerateStage: synthesize "In the test suite: 23 passed, 2 failed" → /tmp/tts_chunk_482.wav&lt;/span&gt;
&lt;span class="c"&gt;# PlaybackStage: afplay /tmp/tts_chunk_482.wav&lt;/span&gt;

&lt;span class="c"&gt;# What you actually hear:&lt;/span&gt;
&lt;span class="s2"&gt;"In the test suite: 23 passed, 2 failed"&lt;/span&gt;

&lt;span class="c"&gt;# Agent runs: cat package.json | grep '"version"'&lt;/span&gt;
&lt;span class="c"&gt;# ContentRouter: tool=Bash, short stdout, no test pattern → SKIP (silent)&lt;/span&gt;

&lt;span class="c"&gt;# Agent final answer (Stop hook):&lt;/span&gt;
&lt;span class="c"&gt;# ContentRouter: stop_event → PRIORITY_HIGH → summarize if long&lt;/span&gt;
&lt;span class="c"&gt;# Heard: "Done. Updated the auth middleware, fixed the token null check, all 23 tests passing."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The macOS afplay Bug: A Concrete Engineering Anecdote
&lt;/h2&gt;

&lt;p&gt;When I added the &lt;code&gt;say&lt;/code&gt;/&lt;code&gt;espeak&lt;/code&gt; fallback engine (the zero-dependency path that works with no ML and no network), I ran into a silent failure that took some digging to understand.&lt;/p&gt;

&lt;p&gt;The TTS pipeline works like this: &lt;code&gt;GenerateStage&lt;/code&gt; calls &lt;code&gt;engine.synthesize(text, audio_path)&lt;/code&gt; which writes an audio file, then returns &lt;code&gt;True&lt;/code&gt; on success. &lt;code&gt;PlaybackStage&lt;/code&gt; then calls &lt;code&gt;afplay &amp;lt;audio_path&amp;gt;&lt;/code&gt; separately. The two stages are decoupled intentionally — generation and playback are different concerns.&lt;/p&gt;

&lt;p&gt;The bug: &lt;code&gt;GenerateStage&lt;/code&gt; was naming all non-Kokoro outputs with a &lt;code&gt;.mp3&lt;/code&gt; extension. That's fine for &lt;code&gt;edge-tts&lt;/code&gt;, which actually writes MP3 bytes. But &lt;code&gt;SystemTTSEngine&lt;/code&gt; wraps macOS &lt;code&gt;say&lt;/code&gt;, which writes WAVE/AIFF output. So the pipeline was writing RIFF/WAVE bytes into a file called &lt;code&gt;tts_chunk_482.mp3&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;On Linux, &lt;code&gt;mpv&lt;/code&gt; and &lt;code&gt;ffplay&lt;/code&gt; content-sniff the file header. They play WAVE bytes regardless of what the filename says. The tests passed. The CI on macOS also passed because the tests used mocked subprocess calls.&lt;/p&gt;

&lt;p&gt;The production failure looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;afplay /tmp/tts_chunk_482.mp3
Error: AudioFileOpen failed &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'dta?'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exit code 1. No audio. The daemon logged a PlaybackStage failure and moved on. &lt;code&gt;synthesize()&lt;/code&gt; had returned &lt;code&gt;True&lt;/code&gt; — the file existed, had nonzero size, &lt;code&gt;say&lt;/code&gt; exited 0. The failure was invisible to the generation stage.&lt;/p&gt;

&lt;p&gt;The root cause: macOS &lt;code&gt;afplay&lt;/code&gt; (and &lt;code&gt;AudioToolbox&lt;/code&gt; generally) selects the audio decoder &lt;strong&gt;from the file extension&lt;/strong&gt;, not the file's byte content. WAVE bytes in a &lt;code&gt;.mp3&lt;/code&gt; file fail to open. The same bytes in a &lt;code&gt;.wav&lt;/code&gt; file play fine.&lt;/p&gt;

&lt;p&gt;The fix is &lt;code&gt;_audio_ext_for(engine)&lt;/code&gt; in &lt;code&gt;generate_stage.py&lt;/code&gt; — it returns &lt;code&gt;"wav"&lt;/code&gt; for any engine in &lt;code&gt;_WAV_ENGINES&lt;/code&gt; (kokoro, say, espeak, system) and &lt;code&gt;"mp3"&lt;/code&gt; for edge-tts. But the &lt;em&gt;lesson&lt;/em&gt; is more interesting: mocked subprocess tests cannot catch format/extension mismatches. The real check is an integration test that runs the actual &lt;code&gt;say&lt;/code&gt; binary and then calls &lt;code&gt;afinfo &amp;lt;output_path&amp;gt;&lt;/code&gt; to verify AudioToolbox can open it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.mark.skipif&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;system&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Darwin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;which&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;say&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;which&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;afinfo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;macOS + say + afinfo required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_system_real_say_writes_afplay_openable_wav&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;real.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SystemTTSEngine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;synthesize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plan 3d regression check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RIFF&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WAVE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;rc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;afinfo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
                        &lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEVNULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEVNULL&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;rc&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test lives in CI and runs on macOS runners. It would have caught the original bug.&lt;/p&gt;




&lt;h2&gt;
  
  
  Graceful Degradation: The LLM Is an Upgrade, Not a Requirement
&lt;/h2&gt;

&lt;p&gt;One design decision I'm most glad I made: the LLM is optional.&lt;/p&gt;

&lt;p&gt;The naive approach to "smart content filtering" is to make the LLM a hard dependency. The problem is that this means the daemon fails or produces no output if Ollama isn't running, the model isn't downloaded, or the model returns garbage.&lt;/p&gt;

&lt;p&gt;Instead, the daemon has three tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: Full LLM mode.&lt;/strong&gt; Ollama (or any OpenAI-compatible endpoint) is configured and reachable. The judge classifies ambiguous content; the summarizer condenses long output. The model we recommend is &lt;code&gt;qwen2.5-coder:1.5b&lt;/code&gt; — 986MB, fast on CPU, good at following strict single-word verdict instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2: Deterministic mode (&lt;code&gt;llm_provider.type = "null"&lt;/code&gt;).&lt;/strong&gt; No model. The router still runs all deterministic rules: test counts, error lines, build status, final answers. Long content is truncated at a character threshold rather than summarized. You lose the judgment on ambiguous Bash output, but you still get the important signals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3: Zero-dependency audio.&lt;/strong&gt; Even if you have no Ollama and no Kokoro, &lt;code&gt;say&lt;/code&gt; on macOS and &lt;code&gt;espeak&lt;/code&gt;/&lt;code&gt;espeak-ng&lt;/code&gt; on Linux are usually already installed. No Python ML deps, no model downloads, no network.&lt;/p&gt;

&lt;p&gt;The configuration looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"llm_provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen2.5-coder:1.5b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"base_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:11434"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"voice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kokoro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bf_emma"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To drop to deterministic mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"llm_provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"voice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"say"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;null&lt;/code&gt; provider still participates in the &lt;code&gt;LLMProvider&lt;/code&gt; interface — it returns deterministic outputs from &lt;code&gt;provider.judge()&lt;/code&gt; and &lt;code&gt;provider.summarize()&lt;/code&gt;. Nothing downstream knows the difference.&lt;/p&gt;

&lt;p&gt;For other LLM backends, the &lt;code&gt;openai_compat&lt;/code&gt; provider takes a &lt;code&gt;base_url&lt;/code&gt; and optional API key. That's the same adapter for LM Studio, llama.cpp server, vLLM, Groq, or OpenAI itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cross-Platform Audio Format Safety
&lt;/h2&gt;

&lt;p&gt;The platform layer abstracts OS-specific audio playback. macOS uses &lt;code&gt;afplay&lt;/code&gt;. Linux uses a decoder-first chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LINUX_PLAYER_CHAIN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffplay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# container-agnostic decoder
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mpv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# container-agnostic decoder
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pw-play&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# PipeWire: WAV only
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paplay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# PulseAudio: WAV only
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aplay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# ALSA: WAV only
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The order matters. &lt;code&gt;ffplay&lt;/code&gt; and &lt;code&gt;mpv&lt;/code&gt; content-sniff and handle any container. &lt;code&gt;pw-play&lt;/code&gt;, &lt;code&gt;paplay&lt;/code&gt;, and &lt;code&gt;aplay&lt;/code&gt; are WAV-only. By probing &lt;code&gt;shutil.which()&lt;/code&gt; in decoder-first order, the daemon uses the most capable player available. On a minimal system with only &lt;code&gt;aplay&lt;/code&gt;, it also defaults the engine to &lt;code&gt;say&lt;/code&gt;/&lt;code&gt;espeak&lt;/code&gt; (which writes WAV), so the format always matches the player.&lt;/p&gt;

&lt;p&gt;Service installation is platform-native: &lt;code&gt;launchd&lt;/code&gt; on macOS with a generated plist, &lt;code&gt;systemd --user&lt;/code&gt; on Linux with &lt;code&gt;loginctl enable-linger&lt;/code&gt; so the daemon survives user session logout. The &lt;code&gt;SessionStart&lt;/code&gt; hook auto-launches the daemon if it isn't running.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI Caught What My Local Gate Missed
&lt;/h2&gt;

&lt;p&gt;The Linux portability story has a good cautionary note.&lt;/p&gt;

&lt;p&gt;My speakability gate — &lt;code&gt;is_speakable()&lt;/code&gt; — loads &lt;code&gt;/usr/share/dict/words&lt;/code&gt; to check vocabulary ratios. On macOS, that file is always present (235,976 words). On Ubuntu CI runners, it's absent by default. So the gate was silently disabled on Linux: gibberish tokens that correctly dropped on macOS were kept on Linux.&lt;/p&gt;

&lt;p&gt;My local gate (663 tests, all passing) didn't catch this. My static portability audit checked for macOS syscalls (&lt;code&gt;afplay&lt;/code&gt;, &lt;code&gt;launchctl&lt;/code&gt;, &lt;code&gt;say&lt;/code&gt;) but not for filtering logic that diverges on a system data file.&lt;/p&gt;

&lt;p&gt;The CI matrix caught it on first push: all three macOS cells green, all three Ubuntu cells red on &lt;code&gt;test_is_speakable_drops_noise&lt;/code&gt;. The fix was bundling a public-domain wordlist (&lt;code&gt;daemon/data/words.txt.gz&lt;/code&gt;) as a fallback when the system dictionary isn't present, with a test that forces the bundled dict path. The lesson: a cross-platform CI matrix catches data-file and locale divergences that a syscall-level audit cannot.&lt;/p&gt;




&lt;h2&gt;
  
  
  What v0.1.0 Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;This is a first release from a solo author. The filter brain works well on my actual Claude Code sessions. The CI matrix is green on macOS and Ubuntu across Python 3.11–3.13. The engineering quality bar is real: &lt;code&gt;make verify&lt;/code&gt; fails if code-artifact gibberish leaks to speech, if classification regressions appear, or if the test count drops.&lt;/p&gt;

&lt;p&gt;Honest caveats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kokoro&lt;/strong&gt; (the local neural voice) is Apple Silicon only and requires a separate &lt;code&gt;mlx-audio&lt;/code&gt; interpreter. Point &lt;code&gt;MLX_PYTHON&lt;/code&gt; at it. &lt;code&gt;edge-tts&lt;/code&gt; works everywhere but needs internet. &lt;code&gt;say&lt;/code&gt;/&lt;code&gt;espeak&lt;/code&gt; work with no deps anywhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Volume control&lt;/strong&gt; is macOS-only right now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows&lt;/strong&gt; has no native service install — WSL2 or Docker would work, but it's not wired.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setup&lt;/strong&gt; requires manual steps currently (see below). A &lt;code&gt;/tts:setup&lt;/code&gt; command that handles calibration and service install is the next milestone — not yet in this release.&lt;/li&gt;
&lt;li&gt;This is v0.1.0. I'd love feedback, bug reports, and contributors.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Manual setup (current):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/chendrizzy/claude-tts
&lt;span class="nb"&gt;cd &lt;/span&gt;claude-tts
uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; edge
&lt;span class="nb"&gt;cp &lt;/span&gt;config.example.json config.json
&lt;span class="c"&gt;# Edit config.json to set your engine and LLM backend&lt;/span&gt;
&lt;span class="c"&gt;# Wire the hooks from hooks/hooks.json into your Claude Code settings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requires Python &amp;gt;= 3.11 and &lt;code&gt;uv&lt;/code&gt;. For Kokoro: a separate &lt;code&gt;mlx-audio&lt;/code&gt; Python environment, set &lt;code&gt;MLX_PYTHON&lt;/code&gt; to point at it. For Ollama: &lt;code&gt;ollama pull qwen2.5-coder:1.5b&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/chendrizzy/claude-tts" rel="noopener noreferrer"&gt;github.com/chendrizzy/claude-tts&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The filter brain is the part I'm most interested in improving. If you use Claude Code for long autonomous runs and have opinions on what &lt;em&gt;should&lt;/em&gt; be spoken vs. what should stay silent, I'd genuinely like to hear it — open an issue or leave a comment here.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>tts</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
