Building Audio Alerts with TTS for AI Coding Agents
Ever wish your AI coding agent could actually speak to you? I've been working on adding audio alerts with text-to-speech to Gemini CLI, and here's how it works.
The Problem
When running AI coding agents in the background, you often miss important moments:
- When the agent needs your permission to run a tool
- When an error occurs
- When a long task completes
You either have to constantly check the terminal or miss important updates entirely.
The Solution: Audio Alerts + TTS
I built a hook system that plays themed sounds followed by spoken messages:
Sounds
Each theme has its own sound:
- retro - 8-bit game sounds
- espionage - High-tech clicks
- hero - Dramatic fanfare
- portal - Magical transitions
- premium - Elegant chimes
Spoken Messages
After each sound, a TTS message plays:
| Event | Retro Theme | Espionage Theme |
|---|---|---|
| Permission needed | "Permission needed" | "Agent requesting permission" |
| Error | "Error detected" | "Critical failure detected" |
| Done | "Game over. You win" | "Mission accomplished" |
Cross-Platform Support
The implementation works on:
-
macOS: Uses built-in
saycommand andafplay -
Linux: Supports
espeak,gtts-cli, andplay - Windows: Uses PowerShell's System.Speech and SoundPlayer
How It Works
The hook intercepts Gemini CLI events:
# Play sound
play_audio "$ASSETS_DIR/done.wav"
# Wait a bit
sleep 0.3
# Speak the message
speak "Task complete"
All runs non-blocking, so the agent keeps working while you get audio feedback.
Installation
Clone the repo:
git clone https://github.com/HainanZhao/gemini-extension-audio-alerts
cd gemini-extension-audio-alerts
Set your theme:
export AUDIO_ALERTS_THEME=retro
What's Next
Possible enhancements:
- Custom voice selection
- More themes
- Sound mixing
- Volume control
The code is open source and available at: https://github.com/HainanZhao/gemini-extension-audio-alerts
This post was written with assistance from AI coding agents.
Top comments (0)