Claude has Voice Mode but it's only in their consumer app. Here's how to add voice to Claude for your own projects using local speech processing.
What We're Building
A Python app that:
- Listens for a wake word ("Hey Claude")
- Transcribes what you say
- Sends text to Claude's API
- Speaks the response back
The voice processing runs locally using Picovoice, so audio doesn't get sent to the cloud - only text goes to Claude.
Setup
You'll need:
- Python 3.9+
- A mic and speakers
- Picovoice AccessKey (free from their console)
- Claude API key
Install all required Python SDKs and dependencies with a single terminal command:
pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker anthropic
These packages include:
-
Porcupine Wake Word Python SDK:
pvporcupine -
Cheetah Streaming Speech-to-Text Python SDK:
pvcheetah -
Orca Text-to-Speech Python SDK:
pvorca -
Picovoice Python Recorder library:
pvrecorder -
Picovoice Python Speaker library:
pvspeaker -
Anthropic Python library:
anthropic- used for Claude API integration
Train a Custom Wake Word
- Sign up for a free account at console.picovoice.ai
- Navigate to the Porcupine page
- Enter your wake phrase such as "Hey Claude" and test it using the microphone button
- Click "Train", select the target platform, and download the
.ppnmodel file
For tips on designing an effective wake word, review the choosing a wake word guide.
Add Wake Word Detection
The following snippet captures audio from your default microphone and detects your custom wake word locally:
import pvporcupine
import pvrecorder
def listen_for_wake_word(access_key, wake_word_path):
porcupine = pvporcupine.create(
access_key=access_key,
keyword_paths=[wake_word_path]
)
recorder = pvrecorder.PvRecorder(
frame_length=porcupine.frame_length
)
recorder.start()
print("Listening...")
while True:
audio_frame = recorder.read()
if porcupine.process(audio_frame) >= 0:
print("Heard wake word!")
break
recorder.stop()
porcupine.delete()
Speech to Text
After wake word detection, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:
import pvcheetah
def transcribe_speech(access_key):
cheetah = pvcheetah.create(
access_key=access_key,
enable_automatic_punctuation=True
)
recorder = pvrecorder.PvRecorder(
frame_length=cheetah.frame_length
)
recorder.start()
print("Listening...")
transcript = ""
while True:
audio_frame = recorder.read()
partial_transcript, is_endpoint = cheetah.process(audio_frame)
transcript += partial_transcript
if is_endpoint:
transcript += cheetah.flush()
break
recorder.stop()
cheetah.delete()
return transcript.strip()
Each completed segment returns text, which is ready to send to Claude's API.
Send to Claude
Once speech is transcribed, send the text to Claude using Anthropic's messages endpoint:
from anthropic import Anthropic
def ask_claude(transcript, api_key):
client = Anthropic(api_key=api_key)
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=200,
messages=[
{"role": "user", "content": transcript}
]
)
return response.content[0].text
This minimal integration sends text to Claude while all speech processing remains on-device.
Text to Speech
Transform Claude's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:
import pvorca
import pvspeaker
def speak_response(text, access_key):
orca = pvorca.create(access_key=access_key)
audio = orca.synthesize(text)
speaker = pvspeaker.PvSpeaker(
sample_rate=orca.sample_rate,
bits_per_sample=16,
buffer_size_secs=10
)
speaker.start()
speaker.write(audio[0])
speaker.stop()
orca.delete()
Full Code
This implementation combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.
import pvporcupine
import pvcheetah
import pvorca
import pvrecorder
import pvspeaker
from anthropic import Anthropic
class ClaudeVoiceAssistant:
def __init__(self, picovoice_key, claude_key, wake_word_path):
self.picovoice_key = picovoice_key
self.claude_client = Anthropic(api_key=claude_key)
self.wake_word_path = wake_word_path
def listen_for_wake_word(self):
porcupine = pvporcupine.create(
access_key=self.picovoice_key,
keyword_paths=[self.wake_word_path]
)
recorder = pvrecorder.PvRecorder(frame_length=porcupine.frame_length)
recorder.start()
print("Listening for wake word...")
while True:
audio_frame = recorder.read()
if porcupine.process(audio_frame) >= 0:
break
recorder.stop()
porcupine.delete()
def transcribe_speech(self):
cheetah = pvcheetah.create(
access_key=self.picovoice_key,
enable_automatic_punctuation=True
)
recorder = pvrecorder.PvRecorder(frame_length=cheetah.frame_length)
recorder.start()
transcript = ""
while True:
audio_frame = recorder.read()
partial, is_endpoint = cheetah.process(audio_frame)
transcript += partial
if is_endpoint:
transcript += cheetah.flush()
break
recorder.stop()
cheetah.delete()
return transcript.strip()
def ask_claude(self, transcript):
response = self.claude_client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=200,
messages=[{"role": "user", "content": transcript}]
)
return response.content[0].text
def speak_response(self, text):
orca = pvorca.create(access_key=self.picovoice_key)
audio = orca.synthesize(text)
speaker = pvspeaker.PvSpeaker(
sample_rate=orca.sample_rate,
bits_per_sample=16,
buffer_size_secs=10
)
speaker.start()
speaker.write(audio[0])
speaker.stop()
orca.delete()
def run(self):
while True:
self.listen_for_wake_word()
transcript = self.transcribe_speech()
print(f"You said: {transcript}")
response = self.ask_claude(transcript)
print(f"Claude: {response}")
self.speak_response(response)
if __name__ == "__main__":
assistant = ClaudeVoiceAssistant(
picovoice_key="YOUR_PICOVOICE_KEY",
claude_key="YOUR_CLAUDE_KEY",
wake_word_path="hey_claude.ppn"
)
assistant.run()
Running the Assistant
To run the voice-enabled Claude assistant, update the model path to match your local file and have both API keys ready:
- Picovoice AccessKey (copy it from the Picovoice Console)
- Claude API key (available from the Claude Console)
python claude_voice.py \
--access_key YOUR_PICOVOICE_ACCESS_KEY \
--claude_api_key YOUR_CLAUDE_API_KEY \
--keyword_path PATH_TO_WAKE_WORD_MODEL
The Claude voice assistant is now running and ready to listen, transcribe, and respond.
Troubleshooting Audio Device Issues
- Problem: "Failed to initialize PvRecorder" or "Audio device not found"
-
Solution: Make sure to use the correct
--audio_device_indexparameter. List available audio devices:
for i, device in enumerate(pvrecorder.PvRecorder.get_available_devices()):
print(f"{i}: {device}")
# Then use: recorder = pvrecorder.PvRecorder(device_index=INDEX)
- Problem: No audio output from speaker
- Solution: Check speaker volume and connections. Verify PvSpeaker initialization:
speaker = pvspeaker.PvSpeaker(
sample_rate=22050,
bits_per_sample=16,
buffer_size_secs=10
)
speaker.start()
# Test with a simple tone or audio
speaker.stop()
Check out the original tutorial at Picovoice's Add Voice to Claude Blog.
Top comments (0)