DEV Community: Khushi Nakra

Trying On-Device LLM Inference on Windows with Python

Khushi Nakra — Tue, 17 Feb 2026 18:23:21 +0000

Cloud-based language models are widely used, but running models on-device can help reduce latency, recurring API costs, and data privacy concerns.

Below is a minimal example of running a compressed large language model on a Windows machine using picoLLM.

Why Run Models On-Device?

Running models locally can:

keep data on the device
avoid network latency

At the same time, local inference introduces challenges such as hardware constraints and model optimization. picoLLM makes it easier to run compressed open-weight models across platforms.

Setup

Install Python:

https://www.python.org/downloads/
Install picoLLM:

pip install picollm

Get an AccessKey and download a model from: https://console.picovoice.ai/

picoLLM supports models such as Llama, Gemma, Mixtral, Mistral, and Phi, and runs across Windows, macOS, Linux, Raspberry Pi, mobile, and browsers.

Minimal Python Example

Import the package and initialize the engine:

import picollm

pllm = picollm.create(
    access_key,
    model_path
)

Generate a completion:

res = pllm.generate(prompt="what is the air-speed velocity of an unladen swallow?")
print(res.completion)

Streaming tokens:

res = pllm.generate(
    prompt="what is the air-speed velocity of an unladen swallow?",
    stream_callback=lambda x: print(x, flush=True, end="")
)

Release the engine when finished:

pllm.release()

Node.js Example

The same idea in Node.js:

const { PicoLLM } = require("@picovoice/picollm-node");

const pllm = new PicoLLM(accessKey, modelPath);

const res = await pllm.generate(
  "what is the air-speed velocity of an unladen swallow?",
  {
    streamCallback: (token) => process.stdout.write(token)
  }
);

pllm.release();

Additional Resources

Python API docs: https://picovoice.ai/docs/api/picollm-python/
Node.js API docs: https://picovoice.ai/docs/api/picollm-nodejs/
Demo repository: https://github.com/Picovoice/picollm/tree/main/demo

For a full step-by-step walkthrough and detailed explanation, see the original guide:

https://picovoice.ai/blog/how-to-run-a-local-llm-on-windows/

Trying On-Device LLM Inference with Python

Khushi Nakra — Tue, 17 Feb 2026 18:17:27 +0000

Running large language models on-device is becoming increasingly practical. Instead of sending prompts to a cloud API, models can run directly on hardware, improving privacy and reducing network dependency.

Below is a minimal example of running an LLM in Python using the picoLLM inference engine.

Install the SDK

Install the Python package:

pip install picollm

Get an AccessKey and a Model

To run a model, you need:

An AccessKey from the Picovoice Console
A downloaded model file

You can create an account and download models here:

https://console.picovoice.ai/

picoLLM supports several open-weight models such as Gemma, Llama, Mistral, Mixtral, and Phi, and runs on Linux, macOS, Windows, and Raspberry Pi with CPU or GPU inference.

Minimal Example

Import the package:

import picollm

Create an engine instance:

engine = picollm.create(access_key, model_path)

Generate a completion:

engine.generate(prompt)

Next Steps

For full API details and step-by-step demos:

Python SDK docs: https://picovoice.ai/docs/api/picollm-python/
Quick start guide: https://picovoice.ai/docs/quick-start/picollm-python/
Demo source code: https://github.com/Picovoice/picollm/tree/main/demo/python

For a full walkthrough and explanation, see the original article:

https://picovoice.ai/blog/how-to-run-llms-locally-with-python/

Why Bigger Language Models Don’t Always Perform Better

Khushi Nakra — Tue, 17 Feb 2026 18:08:53 +0000

A common assumption in machine learning is that increasing model size improves performance. As a result, language models have grown larger and increasingly dependent on powerful cloud infrastructure.

But larger models are not always more efficient or better performing, and the cost of training and running them can be substantial.

If you're new to the space, it helps to first understand what a large language model is and how modern systems are evaluated.

When Bigger Stops Helping

Training large models requires significant computing resources, often limiting development to organizations with access to large GPU clusters. Even for large companies, the operational cost of running these systems can be extremely high, and inference expenses can quickly add up.

This has led researchers to question whether simply increasing parameter count is the most effective approach, especially as more teams explore running AI locally or at the edge rather than relying entirely on cloud infrastructure.

Rethinking How Models Use Compute

The paper Training Compute-Optimal Large Language Models explored how training resources should be balanced between model size and dataset scale.

One of its key findings was that many language models had been trained with more parameters than necessary relative to the amount of data they were given. A smaller model trained on more tokens was shown to outperform significantly larger ones.

The Chinchilla model demonstrated this clearly, outperforming larger models such as Gopher and GPT-3 when trained with a more balanced compute budget.

Evidence From LLaMA

Meta’s LLaMA models reinforced similar conclusions. Smaller models trained on larger datasets achieved strong results on many benchmarks and showed that parameter count alone is not a reliable measure of performance.

Later versions improved further by increasing training data and context length rather than increasing size. These developments also influenced how researchers think about evaluating large language models, where efficiency and real-world performance matter as much as raw scale.

The Takeaway

Recent research suggests that improving training efficiency may be more effective than simply increasing model size. As language models continue to evolve, efficiency is becoming an important part of how performance is measured.

For a deeper explanation of the research behind these ideas, you can read the original article below.

*Originally published on Picovoice

How LLM Orchestration Works and Why Developers Use LangChain

Khushi Nakra — Mon, 02 Feb 2026 23:10:25 +0000

Calling an LLM API is easy. The hard part is everything around it — feeding it the right context, chaining multiple calls together, remembering previous interactions, and deciding when the model should use a tool vs. generate text. That's the problem LLM orchestration frameworks solve, and LangChain is the most widely adopted one.

Harrison Chase open-sourced LangChain in late 2022. It grew fast, attracting thousands of contributors, and Chase went on to raise $30M in seed funding to build a company around it.

The Core Idea

A standalone LLM call is stateless and isolated. You send a prompt, you get a response. But most real applications need more than that — they need to pull data from external sources, maintain conversation history, or pick between different actions depending on the input.

LangChain abstracts this into a modular system. Think of it as middleware between your application logic and the LLM. It connects models like GPT-4, LLaMA, or Claude to data sources like Google Drive, Notion, or a vector database, and orchestrates the flow between them.

The analogy that tends to stick: LangChain is to LLMs what Zapier is to SaaS apps. It connects things and automates the workflow between them.

Chains: Composing Multi-Step LLM Workflows

The central abstraction in LangChain is the chain — a sequence of operations that run in order. A simple chain might take user input, format it into a prompt, send it to an LLM, and parse the output. A more complex chain might query a database first, inject the results into the prompt, call the model, then store the response.

LangChain breaks this down into composable modules:

Models — manage prompt formatting, call the LLM, and extract structured output from responses.
Retrieval — connects the model to external data through retrieval-augmented generation (RAG). This is how you ground LLM responses in your own documents or databases.
Memory — persists state between calls so the model can handle follow-up questions and maintain conversational context.
Agents — the dynamic counterpart to chains. Instead of following a fixed sequence, agents let the model decide which tools to call and what actions to take at each step.
Callbacks — hooks for logging, monitoring, and streaming intermediate results.

LangChain ships with pre-built chains for common patterns (summarization, Q&A, conversational retrieval), but developers can also compose custom chains from individual components.

What Developers Are Building With It

The most common use cases right now are chatbots, document Q&A systems, and summarization pipelines. But as LLM capabilities expand, the range of applications is growing — code generation assistants, data analysis agents, and multi-modal workflows that combine text, voice, and structured data.

Other Frameworks Worth Knowing

LangChain isn't the only option. Depending on your use case, these alternatives might be a better fit:

Guidance (Microsoft) — template-based control over LLM outputs with constrained generation.
Haystack (deepset) — focused on building production-grade search and RAG pipelines.
Hugging Face Agents — lightweight agent framework tied to the Hugging Face ecosystem.
Griptape — emphasizes structured, predictable workflows over open-ended agent behavior.
AutoChain (Forethought) — designed specifically for conversational AI agents.

The LLM tooling space is still early. New orchestration frameworks are appearing regularly, and existing ones are evolving fast. The best choice depends on whether you need flexible agent behavior, structured pipelines, or tight integration with a specific model ecosystem.

This article was originally published on Picovoice

How to Generate Speech from Text in JavaScript

Khushi Nakra — Fri, 05 Dec 2025 12:43:22 +0000

Text-to-Speech (TTS) allows applications to produce spoken audio from text. Whether you're building a reader, an assistant, or adding simple voice output, the Orca Text-to-Speech Web SDK makes it possible to generate speech directly in the browser. This guide walks through the minimal setup needed to get it running.

What You'll Build

A minimal JavaScript setup that:

Installs the Orca Text-to-Speech Web SDK
Loads an Orca model file
Creates an OrcaWorker instance
Calls synthesize() to generate raw PCM audio from text

You can use this as a foundation for any audio playback tools.

1. Install the Orca Web SDK

npm install @picovoice/orca-web

2. Get Your Picovoice AccessKey

Log in to (or sign up for) the Picovoice Console. It is free and no credit card is required. Copy your AccessKey to the clipboard — you'll use it when initializing the SDK.

3. Add the Orca Model File

Download the Orca Text-to-Speech model for the voice you prefer and add it to your project in one of two ways.

Option A: Copy the Model to a Public Directory

cp ${ORCA_PARAMS_PATH} ${PUBLIC_DIRECTORY}/${ORCA_PARAMS}

Option B: Convert the Model to Base64

Use the pvbase64 script included in the package:

npx pvbase64 -i ${ORCA_PARAMS_PATH} -o ${OUTPUT_DIRECTORY}/${MODEL_NAME}.js

Then create an object containing the Orca model options:

import base64model from '${OUTPUT_DIRECTORY}/${MODEL_NAME}.js'

const orcaModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${ORCA_PARAMS}',

  // or

  base64: base64model,
}

4. Initialize Orca in JavaScript

import { OrcaWorker } from "@picovoice/orca-web";

const orca = await OrcaWorker.create(
  "${ACCESS_KEY}",
  orcaModel
);

5. Convert Text to Speech

// returns raw PCM
const pcm = await orca.synthesize("${TEXT}");

When you're done using Orca, release resources explicitly:

await orca.release();

Explore Further

The Orca Text-to-Speech Web SDK is open source and available on GitHub. There is also an open-source text-to-speech web demo built with Orca that you can reference or extend.

This tutorial was originally published on Picovoice.

Implement Noise Suppression in JavaScript

Khushi Nakra — Fri, 05 Dec 2025 12:18:50 +0000

Noise suppression can significantly improve the audio quality in any web app. This tutorial shows how to integrate the Koala Noise Suppression Web SDK from Picovoice and run speech enhancement directly in the browser. Since processing happens on-device, audio never leaves the user's machine, keeping voice data private with low latency.

What You'll Build

A simple webpage that accesses the microphone and applies Koala's noise-suppression model. This serves as a foundation for building voice chat tools, meeting apps, or any browser-based audio application.

1. Create a New Web Project

npm init

2. Install Dependencies

npm install @picovoice/web-voice-processor @picovoice/koala-web

Install a simple web server for testing:

npm install http-server --save-dev

3. Prepare the Noise Suppression Model

Convert the Koala model into base64 so it can be loaded in the browser:

npx pvbase64 -i ${DOWNLOADED_MODEL_PATH} -o koala_params.js

Replace ${DOWNLOADED_MODEL_PATH} with the path to your downloaded model file.

4. Create index.html

<!DOCTYPE html>
<html lang= "en">
<head>
  <script src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/koala-web/dist/iife/index.js"></script>
  <script src="koala_params.js"></script>
</head>
<body>
  <h1>Koala Web Demo</h1>
  <input type="button" id="start" value="Start Koala" onclick="startKoala()" />

  <script type= "application/javascript">
    function errorCallback(error) {
      console.log(error);
    }

    function processErrorCallback(error) {
      console.log(error);
    }

    function processCallback(enhancedPcm) {
      console.log(enhancedPcm);
    }

    async function startKoala() {
      try {
        let Koala = await KoalaWeb.KoalaWorker.create(
          ACCESS_KEY,
          processCallback,
          { base64: modelParams },
          { processErrorCallback: processErrorCallback }
        );

        await window.WebVoiceProcessor.WebVoiceProcessor.subscribe(koala);
      } catch (err) {
        errorCallback(err);
      }
    }
  </script>
</body>
</html>

Remember to replace ACCESS_KEY with your AccessKey from the Picovoice Console.

5. Run Locally

npx http-server -a localhost -p 5000

You can see the page at http://localhost:5000.

How It Works

The browser captures microphone audio.
Koala processes each audio frame and removes background noise.
The enhanced PCM is returned through processCallback.
All processing is local, giving lower latency and improved privacy.

This tutorial was originally published on Picovoice.

Speaker Diarization Frameworks in Python: Tutorial and Code Walkthrough

Khushi Nakra — Thu, 27 Nov 2025 23:57:14 +0000

Speaker diarization identifies and separates different speakers in an audio file. Think of it as automatically labeling "Speaker A spoke from 0:00-0:15, Speaker B spoke from 0:15-0:30" throughout your recording.

It is essential for applications like meeting transcription, podcast editing, call center analytics, and interview processing. Speaker diarization becomes crucial when you need to know "who" said it along with "what" was said.

This tutorial walks through four different Python frameworks for speaker diarization:

pyannote.audio
NVIDIA NeMo
Simple Diarizer
Falcon Speaker Diarization

1. pyannote.audio

Getting started with pyannote.audio for speaker diarization is
straightforward. Follow these steps:

Install the pyannote.audio package using pip:

pip3 install pyannote.audio

Obtain your authentication token to download pretrained models by visiting their Hugging Face pages.
Use the following Python code to perform speaker diarization on an audio file:

from pyannote.audio import Pipeline

# Replace "${ACCESS_TOKEN_GOES_HERE}" with your authentication token
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization",
    use_auth_token="${ACCESS_TOKEN_GOES_HERE}")

# Replace "${AUDIO_FILE_PATH}" with the path to your audio file
diarization = pipeline("${AUDIO_FILE_PATH}")

for segment, _, speaker in diarization.itertracks(yield_label=True):
    print(f'Speaker "{speaker}" - "{segment}"')

This code will perform speaker diarization and print out the identified speakers along with their corresponding segments in the audio file.

2. NVIDIA NeMo

To perform speaker diarization using NVIDIA NeMo, follow these steps:

Install dependencies:

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip3 install Cython

Install NeMo:

pip install git+https://github.com/NVIDIA/NeMo.git@r1.20.0#egg=nemo_toolkit[all]

Download the config file for the inference from the NeMo GitHub repository.
Generate and store the manifest file by running the following code:

import json
import os

from nemo.collections.asr.models import ClusteringDiarizer
from omegaconf import OmegaConf

INPUT_FILE = '/PATH/TO/AUDIO_FILE.wav'
MANIFEST_FILE = '/PATH/TO/MANIFEST_FILE.json'

meta = {
    'audio_filepath': input_file,
    'offset': 0,
    'duration': None,
    'label': 'infer',
    'text': '-',
    'num_speakers': None,
    'rttm_filepath': None,
    'uem_filepath': None
}
with open(MANIFEST_FILE, 'w') as fp:
    json.dump(meta, fp)
    fp.write('\n')

Replace /PATH/TO/AUDIO_FILE.wav with the path to your audio file and /PATH/TO/MANIFEST_FILE.json with the desired path for your manifest file.

Load the config file and define a ClusteringDiarizer object:

OUTPUT_DIR = '/PATH/TO/OUTPUT_DIR'
MODEL_CONFIG = '/PATH/TO/CONFIG_FILE.yaml'

config = OmegaConf.load(MODEL_CONFIG)
config.diarizer.manifest_filepath = MANIFEST_FILE
config.diarizer.out_dir = OUTPUT_DIR
config.diarizer.oracle_vad = False
config.diarizer.clustering.parameters.oracle_num_speakers = False

sd_model = ClusteringDiarizer(cfg=config)

Replace /PATH/TO/OUTPUT_DIR and /PATH/TO/CONFIG_FILE.yaml with the desired paths for your output directory and config file, respectively.

Perform speaker diarization on the audio file:

sd_model.diarize()

The speaker diarization output will be stored in the OUTPUT_DIR directory as a Rich Transcription Time Marked (RTTM) file.

3. Simple Diarizer

Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain. To get started with simple_diarizer, follow these steps:

Install the package using pip:

pip install simple_diarizer

Define a Diarizer object:

from simple_diarizer.diarizer import Diarizer

diarization = Diarizer(embed_model='xvec', cluster_method='sc')

Perform speaker diarization on an audio file by either passing the number of speakers:

# Replace "${AUDIO_FILE_PATH}" with the path to your audio file
segments = diarization.diarize("${AUDIO_FILE_PATH}", num_speakers=NUM_SPEAKERS)

Or by passing a threshold value:

segments = diarization.diarize("${AUDIO_FILE_PATH}", threshold=THRESHOLD)

The segment variable stores the speaker information and timing details, including start and end times for each segment.

4. Falcon Speaker Diarization

Falcon Speaker Diarization is an on-device speaker diarization engine powered by deep learning. To get started with Falcon, follow these steps:

Install the package using pip:

pip install pvfalcon

Sign up for Picovoice Console for free and copy your AccessKey.
Create an instance of the engine:

import pvfalcon

# Replace "${ACCESS_KEY}" with your Picovoice Console AccessKey
falcon = pvfalcon.create(access_key="${ACCESS_KEY}")

Perform speaker diarization on an audio file:

# Replace "${AUDIO_FILE_PATH}" with the path to your audio file
segments = falcon.process_file("${AUDIO_FILE_PATH}")
for segment in segments:
    print(
        "{speaker_tag=%d start_sec=%.2f end_sec=%.2f}"
        % (segment.speaker_tag, segment.start_sec, segment.end_sec)
    )

Each segment in the segments array includes timing information and speaker identification.

For more information about Falcon Speaker Diarization, check out the Falcon Speaker Diarization product page or refer to the Falcon Speaker Diarization Python SDK quick start guide.

Video Tutorial

This tutorial was originally published on Picovoice

Build a Voice Chatbot using Claude API and Python

Khushi Nakra — Sat, 22 Nov 2025 00:25:28 +0000

Claude AI has Voice Mode but it's only in their consumer app. This tutorial shows how to add voice to Claude for your own projects using Picovoice's on-device models.

Unlike Cloud APIs that send audio to remote servers, Picovoice processes everything locally. This avoids network latency and makes the user interaction faster and smoother.

What We're Building:

A Python application that-

Listens for a wake word using Porcupine Wake Word
Transcribes what you say using Cheetah Streaming Speech-to-Text
Sends text to Claude's API
Speaks the response back using Orca Streaming Text-to-Speech

What you'll need:

Python 3.9+
A mic and speakers
Picovoice AccessKey from the Picovoice Console
Claude API key from the Claude Console

Step 1: Install All Required Dependencies

Install all required Python SDKs and dependencies with a single terminal command:

pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker anthropic

These packages include:

Porcupine Wake Word Python SDK: pvporcupine
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
Orca Text-to-Speech Python SDK: pvorca
Picovoice Python Recorder library: pvrecorder
Picovoice Python Speaker library: pvspeaker
Anthropic Python library: anthropic - used for Claude API integration

Step 2: Design a Custom Wake Phrase

Sign up for a free account at Picovoice Console.
Navigate to the Porcupine page.
Enter your wake phrase such as "Hey Chatbot" and test it using the microphone button.
Click "Train", select the target platform, and download the .ppn model file.

Step 3: Activate Chatbot with Wake Phrase

The code below captures input from your default microphone and identifies the custom wake phrase without any cloud dependency:

import pvporcupine
import pvrecorder

def listen_for_wake_word(access_key, wake_word_path):
    porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[wake_word_path]
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=porcupine.frame_length
    )
    recorder.start()

    print("Listening...")

    while True:
        audio_frame = recorder.read()
        if porcupine.process(audio_frame) >= 0:
            print("Heard wake word!")
            break

    recorder.stop()
    porcupine.delete()

Step 4: Convert Speech-to-Text

Next, transcribe the audio in real-time with Cheetah Streaming Speech-to-Text:

import pvcheetah

def transcribe_speech(access_key):
    cheetah = pvcheetah.create(
        access_key=access_key,
        enable_automatic_punctuation=True
    )

    recorder = pvrecorder.PvRecorder(
        frame_length=cheetah.frame_length
    )
    recorder.start()

    print("Listening...")
    transcript = ""

    while True:
        audio_frame = recorder.read()
        partial_transcript, is_endpoint = cheetah.process(audio_frame)
        transcript += partial_transcript

        if is_endpoint:
            transcript += cheetah.flush()
            break

    recorder.stop()
    cheetah.delete()

    return transcript.strip()

Step 5: Send text Prompts to Claude

Next, send the text to Claude using Anthropic's messages endpoint:

from anthropic import Anthropic

def ask_claude(transcript, api_key):
    client = Anthropic(api_key=api_key)

    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=200,
        messages=[
            {"role": "user", "content": transcript}
        ]
    )

    return response.content[0].text

This sends only the text to Claude while all audio is processed locally.

Step 6: Convert Text-to-Speech

Convert Claude's text response into natural speech with Orca Streaming Text-to-Speech and PvSpeaker:

import pvorca
import pvspeaker

def speak_response(text, access_key):
    orca = pvorca.create(access_key=access_key)

    audio = orca.synthesize(text)

    speaker = pvspeaker.PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16,
        buffer_size_secs=10
    )
    speaker.start()
    speaker.write(audio[0])
    speaker.stop()

    orca.delete()

Full Python Code

The full code uses the following Picovoice models together: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.

import pvporcupine
import pvcheetah
import pvorca
import pvrecorder
import pvspeaker
from anthropic import Anthropic

class ClaudeVoiceAssistant:
    def __init__(self, picovoice_key, claude_key, wake_word_path):
        self.picovoice_key = picovoice_key
        self.claude_client = Anthropic(api_key=claude_key)
        self.wake_word_path = wake_word_path

    def listen_for_wake_word(self):
        porcupine = pvporcupine.create(
            access_key=self.picovoice_key,
            keyword_paths=[self.wake_word_path]
        )

        recorder = pvrecorder.PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print("Listening for wake word...")

        while True:
            audio_frame = recorder.read()
            if porcupine.process(audio_frame) >= 0:
                break

        recorder.stop()
        porcupine.delete()

    def transcribe_speech(self):
        cheetah = pvcheetah.create(
            access_key=self.picovoice_key,
            enable_automatic_punctuation=True
        )

        recorder = pvrecorder.PvRecorder(frame_length=cheetah.frame_length)
        recorder.start()

        transcript = ""
        while True:
            audio_frame = recorder.read()
            partial, is_endpoint = cheetah.process(audio_frame)
            transcript += partial
            if is_endpoint:
                transcript += cheetah.flush()
                break

        recorder.stop()
        cheetah.delete()
        return transcript.strip()

    def ask_claude(self, transcript):
        response = self.claude_client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=200,
            messages=[{"role": "user", "content": transcript}]
        )
        return response.content[0].text

    def speak_response(self, text):
        orca = pvorca.create(access_key=self.picovoice_key)
        audio = orca.synthesize(text)

        speaker = pvspeaker.PvSpeaker(
            sample_rate=orca.sample_rate,
            bits_per_sample=16,
            buffer_size_secs=10
        )
        speaker.start()
        speaker.write(audio[0])
        speaker.stop()
        orca.delete()

    def run(self):
        while True:
            self.listen_for_wake_word()
            transcript = self.transcribe_speech()
            print(f"You said: {transcript}")
            response = self.ask_claude(transcript)
            print(f"Claude: {response}")
            self.speak_response(response)

if __name__ == "__main__":
    assistant = ClaudeVoiceAssistant(
        picovoice_key="YOUR_PICOVOICE_KEY",
        claude_key="YOUR_CLAUDE_KEY",
        wake_word_path="hey_claude.ppn"
    )
    assistant.run()

Launching the Chatbot

To run the voice-enabled Claude chatbot, update the model path with your actual files and have both API keys ready:

Picovoice AccessKey (copy it from the Picovoice Console)
Claude API key (available from the Claude Console)

python claude_voice.py \
  --access_key YOUR_PICOVOICE_ACCESS_KEY \
  --claude_api_key YOUR_CLAUDE_API_KEY \
  --keyword_path PATH_TO_WAKE_WORD_MODEL

The Claude voice chatbot is now running.

Troubleshooting Audio Device Issues

Problem: "Failed to initialize PvRecorder" or "Audio device not found"
Solution: Make sure to use the correct --audio_device_index parameter. To check, list available audio devices with the following Python code:

from pvrecorder import PvRecorder
print(PvRecorder.get_available_devices())

Problem: No audio output from speaker
Solution: Check speaker volume and permissions. Verify PvSpeaker initialization with the following Python code:

from pvspeaker import PvSpeaker
print(PvSpeaker.get_available_devices())

The tutorial was originally published on Picovoice