DEV Community

Adrien Cossa
Adrien Cossa

Posted on

Text-to-Speech for Claude Code — Hear What the Agent Is Doing

Claude Code can already listen to you. Run /voice and you get push-to-talk dictation — you speak, it transcribes into the prompt (docs). What it does not do is talk back. When I leave a long task running, I either babysit the terminal or miss the moment it finishes or asks a question.

So I added the other half: text-to-speech. A hook reads the agent's replies aloud. I can be in another room and still hear "done, tests pass" or "I need a decision here". This post has two parts — a small recipe anyone can paste into their config, and how I wired the same idea into my own tooling for the times I'm not at my desk.

This is a personal hack, not a Claude Code feature. It reads short text aloud after the agent stops. That's it. No wake words, no conversation, no reading code blocks (you don't want that).

The recipe: a hook + your OS speech command

Claude Code hooks run a shell command on lifecycle events. The two that matter here:

  • Stop — fires when the agent finishes responding. It receives the path to the conversation transcript on stdin.
  • Notification — fires when Claude Code wants your attention (a permission prompt, an idle nudge). It receives the notification text on stdin as a message field.

Notification is the simplest win, so start there. Every OS ships a speech command: say on macOS, spd-say or espeak-ng on Linux, and a one-line PowerShell call on Windows.

Here is a Notification hook that speaks the message. Put it in ~/.claude/settings.json:

{
  "hooks": {
    "Notification": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.message // empty' | say"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

jq reads the message field from the JSON on stdin, and say (macOS) reads piped text aloud. On Linux swap say for spd-say -e or espeak-ng, both of which also read stdin. On Windows, point the command at PowerShell:

"command": "jq -r '.message // empty' | powershell -Command \"Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak([Console]::In.ReadToEnd())\""
Enter fullscreen mode Exit fullscreen mode

That covers the "needs your attention" case. If you also want the agent to read its actual reply, add a Stop hook. The wrinkle: Stop gives you the transcript path, not the text. The transcript is JSONL (one JSON object per line), so you pull the last assistant text block out of it:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "jq -rs 'map(select(.type==\"assistant\")) | last | .message.content[]? | select(.type==\"text\") | .text' \"$(jq -r .transcript_path)\" 2>/dev/null | head -c 600 | say"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

A few honest caveats, because this is where it gets rough:

  • Cap the length. head -c 600 stops say droning through a 4 KB status report. Pick your own limit.
  • Strip markdown if you can. Read aloud, code fences and URLs are noise. The recipe above doesn't strip them — for a one-liner it's tolerable, but a real version should.
  • The transcript shape is not a stable public contract. The jq filter above matches the current JSONL layout. If Claude Code changes it, the filter breaks. Treat it as a hack, not an API.

For most people the Notification hook alone is enough, and it's the part least likely to break.

The t3 extra: speak settings

I keep my Claude Code automation in a project called teatree. It has a t3 speak command driven by one [teatree.speak] table:

[teatree.speak]
local = "dm"   # what plays on this machine's speakers: "dm" | "all" | "off"
slack = true   # attach a spoken audio file to each bot→user Slack DM
Enter fullscreen mode Exit fullscreen mode

local controls the speakers in front of you: dm reads only the bot's DMs to you, all also reads every agent turn aloud, off is silent. slack attaches a spoken audio file to each bot→user DM. The two are independent, and both default off, so it does nothing until you configure it.

Two destinations because there are two places I am. At the desk, local plays through the speakers the moment a DM lands — no clicking. Away from it, slack is what I reach for: the spoken text arrives as an audio file attached to the DM, and on the phone I press play. Not hands-free, but I can listen while moving instead of stopping to read.

Two operational notes. The voice comes from macOS say. And slack needs the bot's file-upload permission, so an existing bot has to be reinstalled once to grant it.

Where it stands

The hook recipe is the part I'd actually recommend trying — it's a few lines and it degrades gracefully. The teatree side is tied to my own setup, so take it as one way to structure the same idea rather than something to copy verbatim.

I'm still figuring out how much to read aloud. local = "all" gets chatty fast. dm is calmer but misses things. If you try this, I'd be curious what threshold works for you.

Top comments (0)