Khushi Nakra

Posted on Nov 22

Build a Voice Assistant for Claude AI in Python

#python #ai #claude #tutorial

Claude has Voice Mode but it's only in their consumer app. Here's how to add voice to Claude for your own projects using local speech processing.

What We're Building

A Python app that:

Listens for a wake word ("Hey Claude")
Transcribes what you say
Sends text to Claude's API
Speaks the response back

The voice processing runs locally using Picovoice, so audio doesn't get sent to the cloud - only text goes to Claude.

Setup

You'll need:

Python 3.9+
A mic and speakers
Picovoice AccessKey (free from their console)
Claude API key

Install all required Python SDKs and dependencies with a single terminal command:

pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker anthropic

These packages include:

Porcupine Wake Word Python SDK: pvporcupine
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
Orca Text-to-Speech Python SDK: pvorca
Picovoice Python Recorder library: pvrecorder
Picovoice Python Speaker library: pvspeaker
Anthropic Python library: anthropic - used for Claude API integration

Train a Custom Wake Word

Sign up for a free account at console.picovoice.ai
Navigate to the Porcupine page
Enter your wake phrase such as "Hey Claude" and test it using the microphone button
Click "Train", select the target platform, and download the .ppn model file

For tips on designing an effective wake word, review the choosing a wake word guide.

Add Wake Word Detection

The following snippet captures audio from your default microphone and detects your custom wake word locally:

import pvporcupine
import pvrecorder

def listen_for_wake_word(access_key, wake_word_path):
    porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[wake_word_path]
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=porcupine.frame_length
    )
    recorder.start()

    print("Listening...")

    while True:
        audio_frame = recorder.read()
        if porcupine.process(audio_frame) >= 0:
            print("Heard wake word!")
            break

    recorder.stop()
    porcupine.delete()

Speech to Text

After wake word detection, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:

import pvcheetah

def transcribe_speech(access_key):
    cheetah = pvcheetah.create(
        access_key=access_key,
        enable_automatic_punctuation=True
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=cheetah.frame_length
    )
    recorder.start()

    print("Listening...")
    transcript = ""

    while True:
        audio_frame = recorder.read()
        partial_transcript, is_endpoint = cheetah.process(audio_frame)
        transcript += partial_transcript

        if is_endpoint:
            transcript += cheetah.flush()
            break

    recorder.stop()
    cheetah.delete()

    return transcript.strip()

Each completed segment returns text, which is ready to send to Claude's API.

Send to Claude

Once speech is transcribed, send the text to Claude using Anthropic's messages endpoint:

from anthropic import Anthropic

def ask_claude(transcript, api_key):
    client = Anthropic(api_key=api_key)

    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=200,
        messages=[
            {"role": "user", "content": transcript}
        ]
    )

    return response.content[0].text

This minimal integration sends text to Claude while all speech processing remains on-device.

Text to Speech

Transform Claude's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:

import pvorca
import pvspeaker

def speak_response(text, access_key):
    orca = pvorca.create(access_key=access_key)

    audio = orca.synthesize(text)

    speaker = pvspeaker.PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16,
        buffer_size_secs=10
    )
    speaker.start()
    speaker.write(audio[0])
    speaker.stop()

    orca.delete()

Full Code

This implementation combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.

import pvporcupine
import pvcheetah
import pvorca
import pvrecorder
import pvspeaker
from anthropic import Anthropic

class ClaudeVoiceAssistant:
    def __init__(self, picovoice_key, claude_key, wake_word_path):
        self.picovoice_key = picovoice_key
        self.claude_client = Anthropic(api_key=claude_key)
        self.wake_word_path = wake_word_path

    def listen_for_wake_word(self):
        porcupine = pvporcupine.create(
            access_key=self.picovoice_key,
            keyword_paths=[self.wake_word_path]
        )

        recorder = pvrecorder.PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print("Listening for wake word...")

        while True:
            audio_frame = recorder.read()
            if porcupine.process(audio_frame) >= 0:
                break

        recorder.stop()
        porcupine.delete()

    def transcribe_speech(self):
        cheetah = pvcheetah.create(
            access_key=self.picovoice_key,
            enable_automatic_punctuation=True
        )

        recorder = pvrecorder.PvRecorder(frame_length=cheetah.frame_length)
        recorder.start()

        transcript = ""
        while True:
            audio_frame = recorder.read()
            partial, is_endpoint = cheetah.process(audio_frame)
            transcript += partial
            if is_endpoint:
                transcript += cheetah.flush()
                break

        recorder.stop()
        cheetah.delete()
        return transcript.strip()

    def ask_claude(self, transcript):
        response = self.claude_client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=200,
            messages=[{"role": "user", "content": transcript}]
        )
        return response.content[0].text

    def speak_response(self, text):
        orca = pvorca.create(access_key=self.picovoice_key)
        audio = orca.synthesize(text)

        speaker = pvspeaker.PvSpeaker(
            sample_rate=orca.sample_rate,
            bits_per_sample=16,
            buffer_size_secs=10
        )
        speaker.start()
        speaker.write(audio[0])
        speaker.stop()
        orca.delete()

    def run(self):
        while True:
            self.listen_for_wake_word()
            transcript = self.transcribe_speech()
            print(f"You said: {transcript}")
            response = self.ask_claude(transcript)
            print(f"Claude: {response}")
            self.speak_response(response)

if __name__ == "__main__":
    assistant = ClaudeVoiceAssistant(
        picovoice_key="YOUR_PICOVOICE_KEY",
        claude_key="YOUR_CLAUDE_KEY",
        wake_word_path="hey_claude.ppn"
    )
    assistant.run()

Running the Assistant

To run the voice-enabled Claude assistant, update the model path to match your local file and have both API keys ready:

Picovoice AccessKey (copy it from the Picovoice Console)
Claude API key (available from the Claude Console)

python claude_voice.py \
  --access_key YOUR_PICOVOICE_ACCESS_KEY \
  --claude_api_key YOUR_CLAUDE_API_KEY \
  --keyword_path PATH_TO_WAKE_WORD_MODEL

The Claude voice assistant is now running and ready to listen, transcribe, and respond.

Troubleshooting Audio Device Issues

Problem: "Failed to initialize PvRecorder" or "Audio device not found"
Solution: Make sure to use the correct --audio_device_index parameter. List available audio devices:

for i, device in enumerate(pvrecorder.PvRecorder.get_available_devices()):
    print(f"{i}: {device}")
# Then use: recorder = pvrecorder.PvRecorder(device_index=INDEX)

Problem: No audio output from speaker
Solution: Check speaker volume and connections. Verify PvSpeaker initialization:

speaker = pvspeaker.PvSpeaker(
    sample_rate=22050,
    bits_per_sample=16,
    buffer_size_secs=10
)
speaker.start()
# Test with a simple tone or audio
speaker.stop()

Check out the original tutorial at Picovoice's Add Voice to Claude Blog.

DEV Community