DEV Community

Khushi Nakra
Khushi Nakra

Posted on

Build a Voice Assistant for Claude AI in Python

Claude has Voice Mode but it's only in their consumer app. Here's how to add voice to Claude for your own projects using local speech processing.

What We're Building

A Python app that:

  • Listens for a wake word ("Hey Claude")
  • Transcribes what you say
  • Sends text to Claude's API
  • Speaks the response back

The voice processing runs locally using Picovoice, so audio doesn't get sent to the cloud - only text goes to Claude.

Setup

You'll need:

  • Python 3.9+
  • A mic and speakers
  • Picovoice AccessKey (free from their console)
  • Claude API key

Install all required Python SDKs and dependencies with a single terminal command:

pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker anthropic
Enter fullscreen mode Exit fullscreen mode

These packages include:

Train a Custom Wake Word

  1. Sign up for a free account at console.picovoice.ai
  2. Navigate to the Porcupine page
  3. Enter your wake phrase such as "Hey Claude" and test it using the microphone button
  4. Click "Train", select the target platform, and download the .ppn model file

For tips on designing an effective wake word, review the choosing a wake word guide.

Add Wake Word Detection

The following snippet captures audio from your default microphone and detects your custom wake word locally:

import pvporcupine
import pvrecorder

def listen_for_wake_word(access_key, wake_word_path):
    porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[wake_word_path]
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=porcupine.frame_length
    )
    recorder.start()

    print("Listening...")

    while True:
        audio_frame = recorder.read()
        if porcupine.process(audio_frame) >= 0:
            print("Heard wake word!")
            break

    recorder.stop()
    porcupine.delete()
Enter fullscreen mode Exit fullscreen mode

Speech to Text

After wake word detection, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:

import pvcheetah

def transcribe_speech(access_key):
    cheetah = pvcheetah.create(
        access_key=access_key,
        enable_automatic_punctuation=True
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=cheetah.frame_length
    )
    recorder.start()

    print("Listening...")
    transcript = ""

    while True:
        audio_frame = recorder.read()
        partial_transcript, is_endpoint = cheetah.process(audio_frame)
        transcript += partial_transcript

        if is_endpoint:
            transcript += cheetah.flush()
            break

    recorder.stop()
    cheetah.delete()

    return transcript.strip()
Enter fullscreen mode Exit fullscreen mode

Each completed segment returns text, which is ready to send to Claude's API.

Send to Claude

Once speech is transcribed, send the text to Claude using Anthropic's messages endpoint:

from anthropic import Anthropic

def ask_claude(transcript, api_key):
    client = Anthropic(api_key=api_key)

    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=200,
        messages=[
            {"role": "user", "content": transcript}
        ]
    )

    return response.content[0].text
Enter fullscreen mode Exit fullscreen mode

This minimal integration sends text to Claude while all speech processing remains on-device.

Text to Speech

Transform Claude's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:

import pvorca
import pvspeaker

def speak_response(text, access_key):
    orca = pvorca.create(access_key=access_key)

    audio = orca.synthesize(text)

    speaker = pvspeaker.PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16,
        buffer_size_secs=10
    )
    speaker.start()
    speaker.write(audio[0])
    speaker.stop()

    orca.delete()
Enter fullscreen mode Exit fullscreen mode

Full Code

This implementation combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.

import pvporcupine
import pvcheetah
import pvorca
import pvrecorder
import pvspeaker
from anthropic import Anthropic

class ClaudeVoiceAssistant:
    def __init__(self, picovoice_key, claude_key, wake_word_path):
        self.picovoice_key = picovoice_key
        self.claude_client = Anthropic(api_key=claude_key)
        self.wake_word_path = wake_word_path

    def listen_for_wake_word(self):
        porcupine = pvporcupine.create(
            access_key=self.picovoice_key,
            keyword_paths=[self.wake_word_path]
        )

        recorder = pvrecorder.PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print("Listening for wake word...")

        while True:
            audio_frame = recorder.read()
            if porcupine.process(audio_frame) >= 0:
                break

        recorder.stop()
        porcupine.delete()

    def transcribe_speech(self):
        cheetah = pvcheetah.create(
            access_key=self.picovoice_key,
            enable_automatic_punctuation=True
        )

        recorder = pvrecorder.PvRecorder(frame_length=cheetah.frame_length)
        recorder.start()

        transcript = ""
        while True:
            audio_frame = recorder.read()
            partial, is_endpoint = cheetah.process(audio_frame)
            transcript += partial
            if is_endpoint:
                transcript += cheetah.flush()
                break

        recorder.stop()
        cheetah.delete()
        return transcript.strip()

    def ask_claude(self, transcript):
        response = self.claude_client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=200,
            messages=[{"role": "user", "content": transcript}]
        )
        return response.content[0].text

    def speak_response(self, text):
        orca = pvorca.create(access_key=self.picovoice_key)
        audio = orca.synthesize(text)

        speaker = pvspeaker.PvSpeaker(
            sample_rate=orca.sample_rate,
            bits_per_sample=16,
            buffer_size_secs=10
        )
        speaker.start()
        speaker.write(audio[0])
        speaker.stop()
        orca.delete()

    def run(self):
        while True:
            self.listen_for_wake_word()
            transcript = self.transcribe_speech()
            print(f"You said: {transcript}")
            response = self.ask_claude(transcript)
            print(f"Claude: {response}")
            self.speak_response(response)

if __name__ == "__main__":
    assistant = ClaudeVoiceAssistant(
        picovoice_key="YOUR_PICOVOICE_KEY",
        claude_key="YOUR_CLAUDE_KEY",
        wake_word_path="hey_claude.ppn"
    )
    assistant.run()
Enter fullscreen mode Exit fullscreen mode

Running the Assistant

To run the voice-enabled Claude assistant, update the model path to match your local file and have both API keys ready:

python claude_voice.py \
  --access_key YOUR_PICOVOICE_ACCESS_KEY \
  --claude_api_key YOUR_CLAUDE_API_KEY \
  --keyword_path PATH_TO_WAKE_WORD_MODEL
Enter fullscreen mode Exit fullscreen mode

The Claude voice assistant is now running and ready to listen, transcribe, and respond.

Troubleshooting Audio Device Issues

  • Problem: "Failed to initialize PvRecorder" or "Audio device not found"
  • Solution: Make sure to use the correct --audio_device_index parameter. List available audio devices:
for i, device in enumerate(pvrecorder.PvRecorder.get_available_devices()):
    print(f"{i}: {device}")
# Then use: recorder = pvrecorder.PvRecorder(device_index=INDEX)
Enter fullscreen mode Exit fullscreen mode
  • Problem: No audio output from speaker
  • Solution: Check speaker volume and connections. Verify PvSpeaker initialization:
speaker = pvspeaker.PvSpeaker(
    sample_rate=22050,
    bits_per_sample=16,
    buffer_size_secs=10
)
speaker.start()
# Test with a simple tone or audio
speaker.stop()
Enter fullscreen mode Exit fullscreen mode

Check out the original tutorial at Picovoice's Add Voice to Claude Blog.

Top comments (0)