DEV Community

Khushi Nakra
Khushi Nakra

Posted on • Edited on

Build a Voice Chatbot using Claude API and Python

Claude AI has Voice Mode but it's only in their consumer app. This tutorial shows how to add voice to Claude for your own projects using Picovoice's on-device models.

Unlike Cloud APIs that send audio to remote servers, Picovoice processes everything locally. This avoids network latency and makes the user interaction faster and smoother.

What We're Building:

A Python application that-

What you'll need:

Step 1: Install All Required Dependencies

Install all required Python SDKs and dependencies with a single terminal command:

pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker anthropic
Enter fullscreen mode Exit fullscreen mode

These packages include:

Step 2: Design a Custom Wake Phrase

  1. Sign up for a free account at Picovoice Console.
  2. Navigate to the Porcupine page.
  3. Enter your wake phrase such as "Hey Chatbot" and test it using the microphone button.
  4. Click "Train", select the target platform, and download the .ppn model file.

Step 3: Activate Chatbot with Wake Phrase

The code below captures input from your default microphone and identifies the custom wake phrase without any cloud dependency:

import pvporcupine
import pvrecorder

def listen_for_wake_word(access_key, wake_word_path):
    porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[wake_word_path]
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=porcupine.frame_length
    )
    recorder.start()

    print("Listening...")

    while True:
        audio_frame = recorder.read()
        if porcupine.process(audio_frame) >= 0:
            print("Heard wake word!")
            break

    recorder.stop()
    porcupine.delete()
Enter fullscreen mode Exit fullscreen mode

Step 4: Convert Speech-to-Text

Next, transcribe the audio in real-time with Cheetah Streaming Speech-to-Text:

import pvcheetah

def transcribe_speech(access_key):
    cheetah = pvcheetah.create(
        access_key=access_key,
        enable_automatic_punctuation=True
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=cheetah.frame_length
    )
    recorder.start()

    print("Listening...")
    transcript = ""

    while True:
        audio_frame = recorder.read()
        partial_transcript, is_endpoint = cheetah.process(audio_frame)
        transcript += partial_transcript

        if is_endpoint:
            transcript += cheetah.flush()
            break

    recorder.stop()
    cheetah.delete()

    return transcript.strip()
Enter fullscreen mode Exit fullscreen mode

Step 5: Send text Prompts to Claude

Next, send the text to Claude using Anthropic's messages endpoint:

from anthropic import Anthropic

def ask_claude(transcript, api_key):
    client = Anthropic(api_key=api_key)

    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=200,
        messages=[
            {"role": "user", "content": transcript}
        ]
    )

    return response.content[0].text
Enter fullscreen mode Exit fullscreen mode

This sends only the text to Claude while all audio is processed locally.

Step 6: Convert Text-to-Speech

Convert Claude's text response into natural speech with Orca Streaming Text-to-Speech and PvSpeaker:

import pvorca
import pvspeaker

def speak_response(text, access_key):
    orca = pvorca.create(access_key=access_key)

    audio = orca.synthesize(text)

    speaker = pvspeaker.PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16,
        buffer_size_secs=10
    )
    speaker.start()
    speaker.write(audio[0])
    speaker.stop()

    orca.delete()
Enter fullscreen mode Exit fullscreen mode

Full Python Code

The full code uses the following Picovoice models together: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.

import pvporcupine
import pvcheetah
import pvorca
import pvrecorder
import pvspeaker
from anthropic import Anthropic

class ClaudeVoiceAssistant:
    def __init__(self, picovoice_key, claude_key, wake_word_path):
        self.picovoice_key = picovoice_key
        self.claude_client = Anthropic(api_key=claude_key)
        self.wake_word_path = wake_word_path

    def listen_for_wake_word(self):
        porcupine = pvporcupine.create(
            access_key=self.picovoice_key,
            keyword_paths=[self.wake_word_path]
        )

        recorder = pvrecorder.PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print("Listening for wake word...")

        while True:
            audio_frame = recorder.read()
            if porcupine.process(audio_frame) >= 0:
                break

        recorder.stop()
        porcupine.delete()

    def transcribe_speech(self):
        cheetah = pvcheetah.create(
            access_key=self.picovoice_key,
            enable_automatic_punctuation=True
        )

        recorder = pvrecorder.PvRecorder(frame_length=cheetah.frame_length)
        recorder.start()

        transcript = ""
        while True:
            audio_frame = recorder.read()
            partial, is_endpoint = cheetah.process(audio_frame)
            transcript += partial
            if is_endpoint:
                transcript += cheetah.flush()
                break

        recorder.stop()
        cheetah.delete()
        return transcript.strip()

    def ask_claude(self, transcript):
        response = self.claude_client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=200,
            messages=[{"role": "user", "content": transcript}]
        )
        return response.content[0].text

    def speak_response(self, text):
        orca = pvorca.create(access_key=self.picovoice_key)
        audio = orca.synthesize(text)

        speaker = pvspeaker.PvSpeaker(
            sample_rate=orca.sample_rate,
            bits_per_sample=16,
            buffer_size_secs=10
        )
        speaker.start()
        speaker.write(audio[0])
        speaker.stop()
        orca.delete()

    def run(self):
        while True:
            self.listen_for_wake_word()
            transcript = self.transcribe_speech()
            print(f"You said: {transcript}")
            response = self.ask_claude(transcript)
            print(f"Claude: {response}")
            self.speak_response(response)

if __name__ == "__main__":
    assistant = ClaudeVoiceAssistant(
        picovoice_key="YOUR_PICOVOICE_KEY",
        claude_key="YOUR_CLAUDE_KEY",
        wake_word_path="hey_claude.ppn"
    )
    assistant.run()
Enter fullscreen mode Exit fullscreen mode

Launching the Chatbot

To run the voice-enabled Claude chatbot, update the model path with your actual files and have both API keys ready:

python claude_voice.py \
  --access_key YOUR_PICOVOICE_ACCESS_KEY \
  --claude_api_key YOUR_CLAUDE_API_KEY \
  --keyword_path PATH_TO_WAKE_WORD_MODEL
Enter fullscreen mode Exit fullscreen mode

The Claude voice chatbot is now running.

Troubleshooting Audio Device Issues

  • Problem: "Failed to initialize PvRecorder" or "Audio device not found"
  • Solution: Make sure to use the correct --audio_device_index parameter. To check, list available audio devices with the following Python code:
from pvrecorder import PvRecorder
print(PvRecorder.get_available_devices())
Enter fullscreen mode Exit fullscreen mode
  • Problem: No audio output from speaker
  • Solution: Check speaker volume and permissions. Verify PvSpeaker initialization with the following Python code:
from pvspeaker import PvSpeaker
print(PvSpeaker.get_available_devices())
Enter fullscreen mode Exit fullscreen mode

The tutorial was originally published on Picovoice

Top comments (0)