Claude AI has Voice Mode but it's only in their consumer app. This tutorial shows how to add voice to Claude for your own projects using Picovoice's on-device models.
Unlike Cloud APIs that send audio to remote servers, Picovoice processes everything locally. This avoids network latency and makes the user interaction faster and smoother.
What We're Building:
A Python application that-
- Listens for a wake word using Porcupine Wake Word
- Transcribes what you say using Cheetah Streaming Speech-to-Text
- Sends text to Claude's API
- Speaks the response back using Orca Streaming Text-to-Speech
What you'll need:
- Python 3.9+
- A mic and speakers
- Picovoice AccessKey from the Picovoice Console
- Claude API key from the Claude Console
Step 1: Install All Required Dependencies
Install all required Python SDKs and dependencies with a single terminal command:
pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker anthropic
These packages include:
-
Porcupine Wake Word Python SDK:
pvporcupine -
Cheetah Streaming Speech-to-Text Python SDK:
pvcheetah -
Orca Text-to-Speech Python SDK:
pvorca -
Picovoice Python Recorder library:
pvrecorder -
Picovoice Python Speaker library:
pvspeaker -
Anthropic Python library:
anthropic- used for Claude API integration
Step 2: Design a Custom Wake Phrase
- Sign up for a free account at Picovoice Console.
- Navigate to the Porcupine page.
- Enter your wake phrase such as "Hey Chatbot" and test it using the microphone button.
- Click "Train", select the target platform, and download the
.ppnmodel file.
Step 3: Activate Chatbot with Wake Phrase
The code below captures input from your default microphone and identifies the custom wake phrase without any cloud dependency:
import pvporcupine
import pvrecorder
def listen_for_wake_word(access_key, wake_word_path):
porcupine = pvporcupine.create(
access_key=access_key,
keyword_paths=[wake_word_path]
)
recorder = pvrecorder.PvRecorder(
frame_length=porcupine.frame_length
)
recorder.start()
print("Listening...")
while True:
audio_frame = recorder.read()
if porcupine.process(audio_frame) >= 0:
print("Heard wake word!")
break
recorder.stop()
porcupine.delete()
Step 4: Convert Speech-to-Text
Next, transcribe the audio in real-time with Cheetah Streaming Speech-to-Text:
import pvcheetah
def transcribe_speech(access_key):
cheetah = pvcheetah.create(
access_key=access_key,
enable_automatic_punctuation=True
)
recorder = pvrecorder.PvRecorder(
frame_length=cheetah.frame_length
)
recorder.start()
print("Listening...")
transcript = ""
while True:
audio_frame = recorder.read()
partial_transcript, is_endpoint = cheetah.process(audio_frame)
transcript += partial_transcript
if is_endpoint:
transcript += cheetah.flush()
break
recorder.stop()
cheetah.delete()
return transcript.strip()
Step 5: Send text Prompts to Claude
Next, send the text to Claude using Anthropic's messages endpoint:
from anthropic import Anthropic
def ask_claude(transcript, api_key):
client = Anthropic(api_key=api_key)
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=200,
messages=[
{"role": "user", "content": transcript}
]
)
return response.content[0].text
This sends only the text to Claude while all audio is processed locally.
Step 6: Convert Text-to-Speech
Convert Claude's text response into natural speech with Orca Streaming Text-to-Speech and PvSpeaker:
import pvorca
import pvspeaker
def speak_response(text, access_key):
orca = pvorca.create(access_key=access_key)
audio = orca.synthesize(text)
speaker = pvspeaker.PvSpeaker(
sample_rate=orca.sample_rate,
bits_per_sample=16,
buffer_size_secs=10
)
speaker.start()
speaker.write(audio[0])
speaker.stop()
orca.delete()
Full Python Code
The full code uses the following Picovoice models together: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.
import pvporcupine
import pvcheetah
import pvorca
import pvrecorder
import pvspeaker
from anthropic import Anthropic
class ClaudeVoiceAssistant:
def __init__(self, picovoice_key, claude_key, wake_word_path):
self.picovoice_key = picovoice_key
self.claude_client = Anthropic(api_key=claude_key)
self.wake_word_path = wake_word_path
def listen_for_wake_word(self):
porcupine = pvporcupine.create(
access_key=self.picovoice_key,
keyword_paths=[self.wake_word_path]
)
recorder = pvrecorder.PvRecorder(frame_length=porcupine.frame_length)
recorder.start()
print("Listening for wake word...")
while True:
audio_frame = recorder.read()
if porcupine.process(audio_frame) >= 0:
break
recorder.stop()
porcupine.delete()
def transcribe_speech(self):
cheetah = pvcheetah.create(
access_key=self.picovoice_key,
enable_automatic_punctuation=True
)
recorder = pvrecorder.PvRecorder(frame_length=cheetah.frame_length)
recorder.start()
transcript = ""
while True:
audio_frame = recorder.read()
partial, is_endpoint = cheetah.process(audio_frame)
transcript += partial
if is_endpoint:
transcript += cheetah.flush()
break
recorder.stop()
cheetah.delete()
return transcript.strip()
def ask_claude(self, transcript):
response = self.claude_client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=200,
messages=[{"role": "user", "content": transcript}]
)
return response.content[0].text
def speak_response(self, text):
orca = pvorca.create(access_key=self.picovoice_key)
audio = orca.synthesize(text)
speaker = pvspeaker.PvSpeaker(
sample_rate=orca.sample_rate,
bits_per_sample=16,
buffer_size_secs=10
)
speaker.start()
speaker.write(audio[0])
speaker.stop()
orca.delete()
def run(self):
while True:
self.listen_for_wake_word()
transcript = self.transcribe_speech()
print(f"You said: {transcript}")
response = self.ask_claude(transcript)
print(f"Claude: {response}")
self.speak_response(response)
if __name__ == "__main__":
assistant = ClaudeVoiceAssistant(
picovoice_key="YOUR_PICOVOICE_KEY",
claude_key="YOUR_CLAUDE_KEY",
wake_word_path="hey_claude.ppn"
)
assistant.run()
Launching the Chatbot
To run the voice-enabled Claude chatbot, update the model path with your actual files and have both API keys ready:
- Picovoice AccessKey (copy it from the Picovoice Console)
- Claude API key (available from the Claude Console)
python claude_voice.py \
--access_key YOUR_PICOVOICE_ACCESS_KEY \
--claude_api_key YOUR_CLAUDE_API_KEY \
--keyword_path PATH_TO_WAKE_WORD_MODEL
The Claude voice chatbot is now running.
Troubleshooting Audio Device Issues
- Problem: "Failed to initialize PvRecorder" or "Audio device not found"
-
Solution: Make sure to use the correct
--audio_device_indexparameter. To check, list available audio devices with the following Python code:
from pvrecorder import PvRecorder
print(PvRecorder.get_available_devices())
- Problem: No audio output from speaker
- Solution: Check speaker volume and permissions. Verify PvSpeaker initialization with the following Python code:
from pvspeaker import PvSpeaker
print(PvSpeaker.get_available_devices())
The tutorial was originally published on Picovoice
Top comments (0)