<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mart Schweiger</title>
    <description>The latest articles on DEV Community by Mart Schweiger (@martschweiger).</description>
    <link>https://dev.to/martschweiger</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3802221%2Fcdb4c7a2-d4f4-444d-908e-30d6ea3bd1a7.png</url>
      <title>DEV Community: Mart Schweiger</title>
      <link>https://dev.to/martschweiger</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/martschweiger"/>
    <language>en</language>
    <item>
      <title>Best API for building a speech-to-speech voice agent in 2026</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:28:14 +0000</pubDate>
      <link>https://dev.to/martschweiger/best-api-for-building-a-speech-to-speech-voice-agent-in-2026-56np</link>
      <guid>https://dev.to/martschweiger/best-api-for-building-a-speech-to-speech-voice-agent-in-2026-56np</guid>
      <description>&lt;p&gt;A speech-to-speech voice agent API replaces the three separate components most teams used to wire together—&lt;a href="https://www.assemblyai.com/products/streaming-speech-to-text" rel="noopener noreferrer"&gt;streaming speech-to-text&lt;/a&gt;, a language model, and text-to-speech—with a single API that takes audio in and returns audio out. In 2026, that category has gone from "interesting demo" to "default way to ship a production voice agent," and the gap between providers is now measurable in latency, accuracy, and what they let you do with tool calls.&lt;/p&gt;

&lt;p&gt;This guide compares the speech-to-speech voice agent APIs developers actually pick from in 2026, what each one is best at, and how to choose between a true speech-to-speech API and a chained STT-LLM-TTS pipeline. We'll cover&lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt; AssemblyAI's Voice Agent API&lt;/a&gt;, OpenAI Realtime, Google Gemini Live, Deepgram, ElevenLabs Conversational AI, Retell, Bland, and Hume, plus where Vapi and Pipecat fit if you'd rather orchestrate the components yourself—covered in our&lt;a href="https://www.assemblyai.com/blog/orchestration-tools-ai-voice-agents" rel="noopener noreferrer"&gt; orchestration tools comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is a speech-to-speech voice agent API?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A speech-to-speech voice agent API is a single API endpoint—usually a WebSocket—that accepts a user's audio stream and returns the agent's audio response, with everything in between (transcription, reasoning, tool calls, voice synthesis) hidden behind one connection. You send mic audio in. You get the agent's voice back. You don't manage three providers, three sets of API keys, or three sets of latency budgets.&lt;/p&gt;

&lt;p&gt;That's the practical definition. Under the hood, there are two architectural patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chained (cascading) speech-to-speech APIs&lt;/strong&gt; : Internally pipe streaming STT → LLM → streaming TTS, but expose a single API. The advantage is you can swap each layer for best-in-class models. AssemblyAI's Voice Agent API is the leading example.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native speech-to-speech models&lt;/strong&gt; : A single model trained end-to-end on audio that takes audio tokens in and emits audio tokens out, with no intermediate text in some cases. OpenAI Realtime, Google Gemini Live, and Hume's EVI fall here. The pitch is lower latency and richer audio understanding (laughter, tone). The trade-off is less transparency, smaller language support, and weaker text reasoning than a frontier text LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both expose the same developer surface—one connection, audio in/audio out—so the choice is about which trade-offs match your application.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Best speech-to-speech voice agent APIs in 2026&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Speech accuracy&lt;/th&gt;
&lt;th&gt;P50 latency&lt;/th&gt;
&lt;th&gt;Tool calling&lt;/th&gt;
&lt;th&gt;Languages&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AssemblyAI Voice Agent API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chained, single WebSocket&lt;/td&gt;
&lt;td&gt;Industry-leading on phone audio, alphanumerics (16.7% missed entity rate)&lt;/td&gt;
&lt;td&gt;307ms STT + sub-second end-to-end&lt;/td&gt;
&lt;td&gt;Yes, model-routed, with intermediate speech (no silence during tool calls)&lt;/td&gt;
&lt;td&gt;6 streaming (EN/ES/FR/DE/IT/PT), expanding&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$4.50/hr&lt;/strong&gt; flat&lt;/td&gt;
&lt;td&gt;Production voice agents where speech accuracy decides whether it ships&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native speech-to-speech (GPT-4o audio)&lt;/td&gt;
&lt;td&gt;Strong on clean studio audio, weaker on telephony (23.3% missed entity rate)&lt;/td&gt;
&lt;td&gt;~500–800ms end-to-end&lt;/td&gt;
&lt;td&gt;Yes, OpenAI tool format (goes silent during tool calls)&lt;/td&gt;
&lt;td&gt;~50 (varies by feature)&lt;/td&gt;
&lt;td&gt;~$18/hr per-token billing across 30+ event types&lt;/td&gt;
&lt;td&gt;Demos, browser-first apps, conversational toys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deepgram Voice Agent API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chained, cascading&lt;/td&gt;
&lt;td&gt;Good general accuracy, weaker on entities (25.5% missed entity rate)&lt;/td&gt;
&lt;td&gt;~1–1.5 seconds end-to-end&lt;/td&gt;
&lt;td&gt;Yes, custom functions supported (goes silent during tool calls)&lt;/td&gt;
&lt;td&gt;EN, ES, NL, FR, DE, IT, JA&lt;/td&gt;
&lt;td&gt;~$4.50/hr, concurrency commitments required&lt;/td&gt;
&lt;td&gt;Teams already invested in Deepgram's ecosystem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini Live API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native speech-to-speech (Gemini 2 audio)&lt;/td&gt;
&lt;td&gt;Strong on Google's voice eval set&lt;/td&gt;
&lt;td&gt;~600–900ms end-to-end&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;30+&lt;/td&gt;
&lt;td&gt;Usage-based, varies by tier&lt;/td&gt;
&lt;td&gt;Apps already on GCP / Gemini, multimodal (vision + voice) demos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Conversational AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chained, ElevenLabs-orchestrated&lt;/td&gt;
&lt;td&gt;Depends on STT chosen (configurable)&lt;/td&gt;
&lt;td&gt;Sub-second end-to-end&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;30+&lt;/td&gt;
&lt;td&gt;Per-minute, ~$0.09–0.30/min&lt;/td&gt;
&lt;td&gt;Teams that want premium TTS as the headline and don't want to tune STT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retell&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chained, orchestrated&lt;/td&gt;
&lt;td&gt;Configurable STT&lt;/td&gt;
&lt;td&gt;Sub-500ms voice-to-voice on phone&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;20+&lt;/td&gt;
&lt;td&gt;Per-minute, ~$0.07–0.17/min&lt;/td&gt;
&lt;td&gt;Phone-first agents prioritizing turn-taking naturalness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bland&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chained, self-hostable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Sub-second&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;td&gt;Per-minute or self-hosted&lt;/td&gt;
&lt;td&gt;Enterprises with strict data residency / on-prem requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hume EVI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native speech-to-speech, emotion-aware&lt;/td&gt;
&lt;td&gt;Decent&lt;/td&gt;
&lt;td&gt;Sub-second&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;English-focused&lt;/td&gt;
&lt;td&gt;Per-minute&lt;/td&gt;
&lt;td&gt;Emotion-sensitive use cases (mental health, coaching)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vapi&lt;/td&gt;
&lt;td&gt;Orchestration (not S2S, but feels like it)&lt;/td&gt;
&lt;td&gt;Depends on chosen STT&lt;/td&gt;
&lt;td&gt;Sub-second when tuned&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Wide&lt;/td&gt;
&lt;td&gt;Per-minute + pass-through provider costs&lt;/td&gt;
&lt;td&gt;Teams that want to swap STT/LLM/TTS per-deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipecat / LiveKit Agents&lt;/td&gt;
&lt;td&gt;Open-source orchestration&lt;/td&gt;
&lt;td&gt;Depends on STT&lt;/td&gt;
&lt;td&gt;Sub-second when tuned&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Wide&lt;/td&gt;
&lt;td&gt;Compute + provider costs&lt;/td&gt;
&lt;td&gt;Teams who want full ownership of the pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things stand out in 2026. End-to-end latency under one second is now table stakes, not a differentiator—every provider on this list will get you there with a reasonable network. What separates them is &lt;strong&gt;speech accuracy on real-world audio&lt;/strong&gt; (phone calls, accents, alphanumerics), &lt;strong&gt;how tool calling behaves under load&lt;/strong&gt; , and &lt;strong&gt;whether the pricing model survives contact with a real customer base&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In our&lt;a href="https://www.assemblyai.com/voice-agent-report" rel="noopener noreferrer"&gt; Voice Agent Report&lt;/a&gt;, 76% of respondents rated speech-to-text accuracy as the single most important non-negotiable when building voice agents—above latency, cost, and integration capabilities. That finding maps directly to what we see in the comparison data: the accuracy gap between providers on real-world entities (phone numbers, emails, confirmation codes) is where production agents succeed or fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to choose the best speech-to-speech voice agent API&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The voice agent you ship depends on four decisions. Get any of them wrong and the agent feels off, even if the demo was great.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Speech-to-text accuracy on your actual audio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most providers benchmark on studio audio. Your users are on phones, in cars, in drive-thrus, and rattling off order numbers and email addresses. The two accuracy metrics that actually matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Alphanumeric accuracy&lt;/strong&gt; : How well the model captures phone numbers, confirmation codes, emails, order IDs. This is where the gap between providers shows up most clearly. In head-to-head testing, AssemblyAI's&lt;a href="https://www.assemblyai.com/universal-3-pro-streaming" rel="noopener noreferrer"&gt; Universal-3 Pro Streaming&lt;/a&gt; delivers a 16.7% alphanumeric missed error rate, compared to 23.3% for OpenAI and 25.5% for Deepgram. That's the difference between capturing "RX-7704132" correctly on the first try and hearing "dash seven seven zero four one three two." AssemblyAI's Universal-3 Pro Streaming also delivers 21% fewer alphanumeric errors and 28% better accuracy on consecutive numbers than the previous generation. This is the single most under-measured metric in voice agent demos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity accuracy on proper nouns&lt;/strong&gt; : Company names, people's names, drug names, product titles. If your agent writes "Corel" instead of "Coral" into the CRM, the lead is unreachable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Native speech-to-speech models like OpenAI Realtime and Gemini Live were trained more on clean conversational audio than on telephony, which shows up the moment you put them on a Twilio call.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Turn-taking and interruption handling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Poor turn detection is the most common reason voice agents feel unnatural. The agent either talks over the user or sits in awkward silence. The best implementations handle turn detection at the model level, not as an afterthought bolted on with a fixed silence timer.&lt;/p&gt;

&lt;p&gt;AssemblyAI's Universal-3 Pro Streaming includes acoustic turn detection built directly into the model, with semantic endpointing that combines acoustic pauses with intent signals—using a semantic + neural network + VAD approach rather than basic silence-based VAD. Retell ships its own proprietary turn-taking model. OpenAI Realtime's server VAD is competent but configurable timeouts still trip up agents on calls with hesitant speakers. Deepgram relies on traditional VAD only, without the semantic or neural layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Tool calling reliability&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Real voice agents don't just talk—they book the appointment, look up the order, charge the card. That means the underlying LLM has to call tools mid-conversation, fast enough that the silence doesn't become obvious.&lt;/p&gt;

&lt;p&gt;The bar to clear: tool calls under 500ms round-trip, structured outputs that don't hallucinate parameters, and the ability to call multiple tools in a single turn. But there's a UX dimension most teams overlook: &lt;strong&gt;what happens while the tool call is executing?&lt;/strong&gt; AssemblyAI's Voice Agent API generates intermediate speech during tool execution—the agent says something like "Let me look that up for you" rather than going silent. Both OpenAI Realtime and Deepgram go silent during tool calls, which creates an awkward dead-air gap that makes users wonder if the connection dropped.&lt;/p&gt;

&lt;p&gt;AssemblyAI's Voice Agent API exposes a clean function-calling surface that routes through the underlying model with structured-output guarantees. OpenAI Realtime supports tool calling natively. Some orchestration platforms add their own retry and validation logic on top.&lt;/p&gt;

&lt;p&gt;If your agent's job is "capture data and put it somewhere"—booking a meeting, qualifying a lead, taking an order, scheduling a callback—tool calling reliability is what decides whether the agent actually does its job.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Pricing model and unit economics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is the trap most teams fall into during pilots. Per-minute pricing looks cheap until you're running 500 simultaneous calls during a support spike. Per-token audio pricing (OpenAI Realtime) is unpredictable because audio output tokens are 10–20x text tokens and a chatty TTS voice burns through your budget.&lt;/p&gt;

&lt;p&gt;A few patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flat hourly pricing&lt;/strong&gt; : AssemblyAI's Voice Agent API at &lt;strong&gt;$4.50/hour&lt;/strong&gt; covers STT, LLM inference, TTS, and tool calling. One bill, one line of math to model what a 5-minute call costs. No separate meters for audio in, audio out, text in, text out. No concurrency commitments. Easy to forecast.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-minute, all-in&lt;/strong&gt; : Retell, Bland, ElevenLabs Conversational AI. Predictable, but adds up at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flat hourly with concurrency commitments&lt;/strong&gt; : Deepgram's voice agent API is also ~$4.50/hour, but requires concurrency-metered billing—meaning you're committing to a certain number of simultaneous sessions. That changes the economics at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-token audio&lt;/strong&gt; : OpenAI Realtime. ~$18/hour with 30+ billing event types. Best for low-volume; hard to forecast at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pass-through + platform fee&lt;/strong&gt; : Vapi, LiveKit. You pay each underlying provider plus a platform fee—flexible but more accounting overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Forecast what 100 hours of conversation actually costs across the providers you're considering. The order of magnitude is real, especially once you stop being charged for demo calls and start being charged for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;AssemblyAI Voice Agent API: one WebSocket, flat-rate, built on Universal-3 Pro Streaming&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AssemblyAI's Voice Agent API is a single WebSocket that takes user audio in and streams agent audio out, with STT, LLM, TTS, turn detection, and tool calling handled inside the connection. It replaces three separate providers with one bill, one set of logs, and one set of latency variables to tune.&lt;/p&gt;

&lt;p&gt;What makes it work as a speech-to-speech voice agent API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speech accuracy that survives phone audio.&lt;/strong&gt; The STT layer is Universal-3 Pro Streaming, the same model trusted by enterprise voice agent teams for production deployments—307ms P50 latency, native 8kHz mulaw support, immutable transcripts, and a 16.7% alphanumeric missed error rate that's measurably better than OpenAI (23.3%) and Deepgram (25.5%). When the STT is this accurate, the whole conversation is better because the agent is actually responding to what was said.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling that doesn't go silent.&lt;/strong&gt; Define your tools, the model calls them, results stream back into the conversation. Unlike OpenAI Realtime and Deepgram, the agent generates intermediate speech during tool execution—natural transition phrases like "Let me check on that"—so your users never hear dead air. Useful for the lead-qualification, appointment-setting, and structured-data-capture use cases where voice agents have the strongest product-market fit today.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mid-session updates without reconnecting.&lt;/strong&gt; Update the system prompt, voice, tools, and VAD settings mid-conversation with a JSON message—no reconnection, no redeployment. OpenAI Realtime only supports updating prompt and tools. Deepgram supports prompt and voice only. AssemblyAI is the only provider that lets you update all four mid-session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session resumption.&lt;/strong&gt; If the WebSocket drops, reconnect within 30 seconds and pick up where the conversation left off. Context is preserved. Neither OpenAI Realtime nor Deepgram offers session resumption—a dropped connection means starting over.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flat-rate pricing.&lt;/strong&gt; $4.50/hour of session time, no per-token audio surprises, no per-provider invoices, no concurrency commitments. This includes STT, LLM, TTS, turn detection, and tool calling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One API to learn.&lt;/strong&gt; The Voice Agent API is one WebSocket. You don't wire together a streaming STT WebSocket, an LLM HTTP endpoint, a TTS streaming connection, and your own turn-detection logic. The plumbing is in the API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built for production.&lt;/strong&gt; Unlimited concurrency, session resumption, structured logs per session, and the same SOC 2 / BAA-eligible infrastructure that already runs AssemblyAI's speech-to-text platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it fits in the landscape: AssemblyAI's Voice Agent API is the choice when &lt;strong&gt;speech accuracy decides whether the agent ships&lt;/strong&gt;. If your agent is taking phone calls, capturing structured data, or operating in a regulated industry where you need a BAA, this is the speech-to-speech voice agent API to build on.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;When to use a chained pipeline instead&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A speech-to-speech voice agent API is the right answer for most teams in 2026. But there are three cases where chaining the layers yourself still wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need a specific LLM&lt;/strong&gt; : A frontier text LLM like Claude or Gemini that isn't exposed inside any S2S API yet. Most S2S APIs let you choose, but if you need a model that isn't on the list, chain it yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need a specific TTS voice&lt;/strong&gt; : A cloned voice, a specific accent, or a non-standard language model. Most S2S APIs let you bring your own TTS, but if you need fine control, a chained pipeline is more flexible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have regulated data residency&lt;/strong&gt; : Some industries require every layer to run in your VPC. A chained, self-hosted pipeline (with Bland for the orchestration, or fully self-built) is the only path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're chaining, the layer that decides whether the agent works is still the streaming STT. The&lt;a href="https://www.assemblyai.com/blog/real-time-speech-to-text-best-for-voice-agents" rel="noopener noreferrer"&gt; best streaming speech-to-text model for voice agents&lt;/a&gt; discussion comes down to the same accuracy and latency criteria covered above.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Common use cases for speech-to-speech voice agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The pattern in 2026 is consistent:&lt;a href="https://www.assemblyai.com/blog/ai-voice-agents" rel="noopener noreferrer"&gt; speech-to-speech voice agents&lt;/a&gt; work best on high-volume, structured calls where the agent's job is to &lt;strong&gt;capture or look up data&lt;/strong&gt; rather than reason open-endedly. The teams shipping production agents converge on these use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lead qualification and outbound sales discovery&lt;/strong&gt; : Ask BANT questions, book qualified meetings, sync to the CRM. Turn-taking quality is the differentiator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Appointment scheduling and confirmations&lt;/strong&gt; : Medical offices, salons, service businesses. Alphanumeric accuracy on dates, times, and confirmation codes is non-negotiable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Food ordering and reservations&lt;/strong&gt; : High-accuracy data capture on menu items, special requests, payment info.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer support tier-1 deflection&lt;/strong&gt; : Order status, account questions, basic troubleshooting. Best paired with explicit escalation paths. See our guide to&lt;a href="https://www.assemblyai.com/blog/voice-ai-for-customer-service" rel="noopener noreferrer"&gt; Voice AI for customer service&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insurance verification and benefits lookup&lt;/strong&gt; : Getting plan numbers, group IDs, and member info right the first time—the same accuracy bar that drives&lt;a href="https://www.assemblyai.com/solutions/medical" rel="noopener noreferrer"&gt; voice agents in healthcare&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outbound reminders and surveys&lt;/strong&gt; : Post-visit follow-ups, payment reminders, satisfaction surveys.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common thread across all of these: the agent is capturing or retrieving specific data, the conversation has a predictable structure, and the cost of a transcription error is concrete. That's where a speech-to-speech voice agent API earns its keep over a human agent or an IVR.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to evaluate a speech-to-speech voice agent API before you commit&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Demos are unreliable. Vendor benchmarks are unreliable. Here's the evaluation loop teams actually use before signing a contract:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Record 50 real or representative calls&lt;/strong&gt; for your use case, including accents, background noise, alphanumeric content, and interruptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run them through each API's playground or trial.&lt;/strong&gt; Measure word error rate (WER) on the alphanumeric tokens specifically—phone numbers, confirmation codes, emails, dollar amounts. General WER is misleading. Look at the missed entity rates: AssemblyAI sits at 16.7%, OpenAI at 23.3%, Deepgram at 25.5%. Run your own audio to see how those numbers hold on your data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time the turn-taking.&lt;/strong&gt; Mark every "caller-stops-speaking" moment and measure how long until the agent starts responding. Sub-800ms is the threshold for natural-feeling conversation. Pay attention to how each provider handles turn detection—semantic + neural approaches outperform basic VAD on hesitant or accented speakers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test tool calling under load.&lt;/strong&gt; Define three real tools and have the agent call them mid-conversation. Measure round-trip time and error rate. Also note whether the agent speaks naturally during tool execution or goes silent—this makes a bigger UX difference than most teams expect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read every transcript.&lt;/strong&gt; You'll catch prompt failures, silently wrong transcriptions, and hallucinated tool parameters that you'd never notice by listening.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most teams skip step 2 and ship with a model that fumbles confirmation codes silently. Don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Final words&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The right speech-to-speech voice agent API in 2026 depends less on the marketing material and more on what your agent has to actually hear. If your users are on phones, capturing structured data, or operating in regulated environments, the bar is speech accuracy first, latency second, and pricing predictability third—in that order. The chained-architecture S2S APIs (with AssemblyAI's Voice Agent API as the leading example for accuracy-critical use cases) tend to outperform native speech-to-speech models on real-world telephony, even when the native models look better in studio-audio demos.&lt;/p&gt;

&lt;p&gt;For most teams shipping a production voice agent this year, the AssemblyAI Voice Agent API is the right starting point. One WebSocket, $4.50/hour, Universal-3 Pro Streaming for the parts that matter, and flat-rate pricing you can forecast. Teams that need finer control over the stack can drop our&lt;a href="https://www.assemblyai.com/products/streaming-speech-to-text" rel="noopener noreferrer"&gt; Streaming Speech-to-Text product&lt;/a&gt; into their existing&lt;a href="https://www.assemblyai.com/solutions/voice-agents" rel="noopener noreferrer"&gt; voice agent orchestrator&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Frequently asked questions&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is a speech-to-speech voice agent API?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A speech-to-speech voice agent API is a single API—usually a WebSocket—that accepts a user's audio stream and returns the agent's audio response. It hides the streaming speech-to-text, language model, tool calling, and text-to-speech behind one connection, so developers don't have to manage three separate providers, three API keys, or three latency budgets to ship a voice agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is the best speech-to-speech voice agent API in 2026?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The best speech-to-speech voice agent API in 2026 is AssemblyAI's Voice Agent API for production deployments where speech accuracy matters—it's a single WebSocket built on Universal-3 Pro Streaming with 307ms P50 latency, native phone-audio support, tool calling, and flat $4.50/hour pricing. In our Voice Agent Report, 76% of builders rated transcription accuracy as the most important non-negotiable, and AssemblyAI delivers the lowest alphanumeric missed error rate (16.7%) compared to OpenAI (23.3%) and Deepgram (25.5%). OpenAI Realtime is competitive for browser-first demos. Retell is competitive for phone-first agents prioritizing turn-taking naturalness. The right choice depends on whether your users are on phones, what data the agent has to capture, and how predictable you need pricing to be.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does a speech-to-speech voice agent API differ from chaining STT, LLM, and TTS yourself?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A speech-to-speech voice agent API gives you one API endpoint that takes audio in and returns audio out, with STT, LLM, TTS, turn detection, and tool calling handled inside the API. Chaining the layers yourself gives you full control over each component—choice of LLM, choice of TTS voice, on-prem deployment—but you own the plumbing: the WebSocket bridge, turn detection logic, retry handling, and three separate provider relationships. Most teams in 2026 default to a speech-to-speech voice agent API and only chain when they need a specific LLM, voice, or data residency setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Which speech-to-speech voice agent API is cheapest?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;AssemblyAI's Voice Agent API at &lt;strong&gt;$4.50/hour&lt;/strong&gt; flat-rate is the most predictable and one of the lowest unit costs in the category—one bill, no concurrency commitments, and you can model what a 5-minute call costs in one line of math. Per-minute APIs like Retell and ElevenLabs Conversational AI typically land between $0.07 and $0.30 per minute depending on tier, which works out to ~$4.20–$18/hour. Deepgram's voice agent API is also ~$4.50/hour but requires concurrency-metered billing, which changes the economics at scale. OpenAI Realtime runs ~$18/hour with per-token billing across 30+ event types—cheaper for low-volume but significantly more expensive and less predictable at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can I use a speech-to-speech voice agent API with Twilio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most speech-to-speech voice agent APIs can be bridged to Twilio Voice with a WebSocket server that forwards Twilio's 8kHz mulaw audio into the speech-to-speech API and streams the agent's audio response back as mulaw frames for Twilio to play. The cleanest setup uses an API that accepts mulaw natively at 8kHz—AssemblyAI's Voice Agent API and Universal-3 Pro Streaming both support this without resampling, which saves latency. Some providers like Retell ship a Twilio adapter directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Do speech-to-speech voice agent APIs support multiple languages?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes, but coverage varies widely. AssemblyAI's Voice Agent API launched with 6 streaming languages (English, Spanish, French, German, Italian, Portuguese) with native code-switching, and language coverage is expanding. OpenAI Realtime supports around 50 languages but has hallucination and language-switching issues mid-call. Google Gemini Live covers 30+. If you need a specific language combination, test with real audio in those languages before you commit—language support varies significantly between studio benchmarks and real-world phone audio.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do I evaluate which speech-to-speech voice agent API is best for my use case?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Record 50 representative calls for your use case, run them through each API's playground or trial, and measure four things: word error rate on the entities that matter (phone numbers, confirmation codes, names, emails), end-to-end turn-taking latency, tool call round-trip time, and unit cost at your expected volume. General WER and marketing benchmarks are misleading—the only evaluation that predicts production behavior is the one that uses your audio, your tools, and your scale.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>comparison</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to build a voice agent with Twilio and AssemblyAI</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:27:30 +0000</pubDate>
      <link>https://dev.to/martschweiger/how-to-build-a-voice-agent-with-twilio-and-assemblyai-439m</link>
      <guid>https://dev.to/martschweiger/how-to-build-a-voice-agent-with-twilio-and-assemblyai-439m</guid>
      <description>&lt;p&gt;Building a voice agent on Twilio with AssemblyAI takes one WebSocket server that bridges Twilio Voice Media Streams into Universal-3 Pro Streaming, your LLM of choice, and a text-to-speech model — all under an 800ms turn budget. This tutorial walks through every piece: the TwiML to open the audio stream, the FastAPI WebSocket bridge that handles 8kHz mulaw audio in both directions, the LLM loop with tool calling, and the deployment considerations that decide whether your agent feels human or obviously robotic on a real phone call.&lt;/p&gt;

&lt;p&gt;By the end of this guide, you'll have a working inbound phone-based voice agent that answers a Twilio number, transcribes the caller in real time, calls tools (order lookup, callback scheduling, human transfer), and speaks back — all with code you can fork and ship today. The full repository is at the end of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Twilio + AssemblyAI works for phone-based voice agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Twilio is the most common telephony layer for &lt;a href="https://www.assemblyai.com/solutions/voice-agents" rel="noopener noreferrer"&gt;voice agents&lt;/a&gt; because it handles the PSTN connection, gives you a phone number in minutes, and exposes the call audio as a Media Stream you can bridge into your own backend over a WebSocket. The audio comes in at 8kHz mulaw — the standard telephony format, not the 16kHz PCM most audio tools assume.&lt;/p&gt;

&lt;p&gt;AssemblyAI's &lt;a href="https://www.assemblyai.com/universal-3-pro-streaming" rel="noopener noreferrer"&gt;Universal-3 Pro Streaming&lt;/a&gt; model is built specifically for this. It accepts pcm_mulaw at sample_rate=8000 natively, so you don't pay the round-trip latency cost of resampling phone audio into 16kHz PCM and back. Combined with 307ms P50 latency, immutable transcripts, and 21% fewer alphanumeric errors than the previous generation of streaming speech-to-text models, it's the speech-to-text layer that decides whether your agent captures a confirmation code on the first try or makes the caller repeat it.&lt;/p&gt;

&lt;p&gt;The architecture is straightforward:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Caller's phone
       │
   Twilio Voice (PSTN)
       │  TwiML → open WebSocket
       ▼
  Your FastAPI server (this tutorial)
   ┌────┴────┐
   ▼         ▲
 AssemblyAI    ElevenLabs TTS
 Universal-3   (ulaw_8000 output)
 Pro Streaming
   │             ▲
   │ transcript  │ audio
   ▼             │
   GPT-4o + tool calling
     │
     └─► action + spoken reply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Audio flows in two directions continuously. Twilio sends inbound audio (caller → your server → AssemblyAI). Your server generates an LLM response, runs it through ElevenLabs, and streams the synthesized audio back to Twilio as mulaw frames. All of it stays inside one WebSocket per call.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Before you start&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;a href="https://www.assemblyai.com/dashboard/signup" rel="noopener noreferrer"&gt;AssemblyAI account&lt;/a&gt; with API key access to Universal-3 Pro Streaming&lt;/li&gt;
&lt;li&gt;A Twilio account with a Voice-enabled phone number&lt;/li&gt;
&lt;li&gt;An OpenAI API key (or another LLM provider)&lt;/li&gt;
&lt;li&gt;An ElevenLabs API key (or another streaming TTS provider with mulaw output)&lt;/li&gt;
&lt;li&gt;Python 3.11+&lt;/li&gt;
&lt;li&gt;ngrok for exposing your local server to Twilio during development&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Install the dependencies:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install fastapi uvicorn websockets python-dotenv openai elevenlabs twilio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Step 1: Configure the Twilio TwiML for an inbound call&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When someone calls your Twilio number, Twilio fetches a TwiML document from your server and uses it to decide what to do with the call. To stream the call audio to your WebSocket, you return TwiML with a  block:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# server.py
from fastapi import FastAPI, Request
from fastapi.responses import Response

app = FastAPI()

@app.post("/twilio/voice")
async def twilio_voice(request: Request):
    host = request.url.hostname
    twiml = f"""&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;Response&amp;gt;
  &amp;lt;Connect&amp;gt;
    &amp;lt;Stream url="wss://{host}/media-stream" /&amp;gt;
  &amp;lt;/Connect&amp;gt;
&amp;lt;/Response&amp;gt;"""
    return Response(content=twiml, media_type="application/xml")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In the Twilio console, set the phone number's voice webhook to POST &lt;a href="https://your-host/twilio/voice" rel="noopener noreferrer"&gt;https://your-host/twilio/voice&lt;/a&gt;. When a call comes in, Twilio will hit this endpoint, parse the TwiML, and open a WebSocket to /media-stream that carries the call audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Step 2: Bridge Twilio Media Streams to Universal-3 Pro Streaming&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is the core of the agent. The WebSocket handler receives Twilio's audio frames, forwards them to AssemblyAI, listens for transcripts, and routes them into the LLM loop.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# server.py (continued)
import asyncio
import base64
import json
import os
import websockets
from fastapi import WebSocket

ASSEMBLY_WS = "wss://streaming.assemblyai.com/v3/ws"

@app.websocket("/media-stream")
async def media_stream(twilio_ws: WebSocket):
    await twilio_ws.accept()
    stream_sid = None

    # Open AssemblyAI streaming session — note: pcm_mulaw, 8kHz
aai_url = (
    f"{ASSEMBLY_WS}"
    f"?speech_model=u3-rt-pro"
    f"&amp;amp;encoding=pcm_mulaw"
    f"&amp;amp;sample_rate=8000"
)
aai_ws = await websockets.connect(
    aai_url,
    extra_headers={"Authorization": os.environ["ASSEMBLYAI_API_KEY"]},
)

    async def pump_twilio_to_aai():
        nonlocal stream_sid
        async for raw in twilio_ws.iter_text():
            event = json.loads(raw)
            if event["event"] == "start":
                stream_sid = event["start"]["streamSid"]
            elif event["event"] == "media":
                audio_b64 = event["media"]["payload"]
                # Twilio sends base64-encoded mulaw. AssemblyAI accepts raw bytes.
                await aai_ws.send(base64.b64decode(audio_b64))
            elif event["event"] == "stop":
                await aai_ws.close()
                return

    async def pump_aai_to_llm():
        async for message in aai_ws:
            data = json.loads(message)
            if data.get("type") == "Turn" and data.get("end_of_turn"):
                transcript = data.get("transcript", "").strip()
                if transcript:
                    await handle_user_turn(transcript, twilio_ws, stream_sid)

    await asyncio.gather(pump_twilio_to_aai(), pump_aai_to_llm())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The critical settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;speech_model=u3-rt-pro selects Universal-3 Pro Streaming&lt;/li&gt;
&lt;li&gt;encoding=pcm_mulaw and sample_rate=8000 tell AssemblyAI to expect raw mulaw without resampling&lt;/li&gt;
&lt;li&gt;format_turns=true gives you properly cased and punctuated transcripts ready for the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When end_of_turn is true, the caller has finished speaking and you have a complete utterance to send to the LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Step 3: Run the LLM loop with tool calling&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;handle_user_turn is where the conversation logic lives. It takes the transcript, sends it to the LLM with the available tools, and either calls a tool or responds with text that becomes the agent's spoken reply.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# server.py (continued)
from openai import AsyncOpenAI

openai = AsyncOpenAI()

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_order_status",
            "description": "Look up the status of a customer order by order ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "e.g. AB3792"}
                },
                "required": ["order_id"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "transfer_to_human",
            "description": "Transfer the caller to a human agent.",
            "parameters": {
                "type": "object",
                "properties": {
                    "reason": {"type": "string"}
                },
                "required": ["reason"],
            },
        },
    },
]

conversation = [
    {
        "role": "system",
        "content": (
            "You are a friendly phone-based voice agent for a shoe retailer. "
            "Keep replies short — one or two sentences. "
            "Use get_order_status to look up orders. "
            "Use transfer_to_human if the caller asks for a person or is upset."
        ),
    }
]

async def handle_user_turn(transcript, twilio_ws, stream_sid):
    conversation.append({"role": "user", "content": transcript})
    response = await openai.chat.completions.create(
        model="gpt-4o",
        messages=conversation,
        tools=TOOLS,
        tool_choice="auto",
    )
    msg = response.choices[0].message

if msg.tool_calls:
    conversation.append(msg.model_dump())
    for call in msg.tool_calls:
        result = await dispatch_tool(call.function.name, call.function.arguments)
        conversation.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })
    followup = await openai.chat.completions.create(
        model="gpt-4o", messages=conversation
    )
    reply = followup.choices[0].message.content
    else:
        reply = msg.content

    conversation.append({"role": "assistant", "content": reply})
    await speak(reply, twilio_ws, stream_sid)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The tool dispatcher is where your business logic lives. For a real deployment, replace the stubs with calls to your CRM, order management system, or scheduling backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Step 4: Stream the TTS audio back to Twilio as mulaw&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Twilio expects audio frames as base64-encoded mulaw at 8kHz. ElevenLabs supports a ulaw_8000 output format that produces exactly this — which means no resampling, no conversion, just stream the bytes back.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# server.py (continued)
from elevenlabs.client import AsyncElevenLabs

eleven = AsyncElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

async def speak(text, twilio_ws, stream_sid):
    audio_stream = eleven.text_to_speech.stream(
        voice_id=os.environ.get("ELEVENLABS_VOICE_ID", "EXAVITQu4vr4xnSDxMaL"),
        text=text,
        model_id="eleven_turbo_v2_5",
        output_format="ulaw_8000",
    )
    async for chunk in audio_stream:
        payload = base64.b64encode(chunk).decode()
        await twilio_ws.send_text(json.dumps({
            "event": "media",
            "streamSid": stream_sid,
            "media": {"payload": payload},
        }))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Each chunk gets streamed to Twilio as a media event. Twilio plays the audio to the caller as it arrives, which means the caller hears the first word of the agent's reply while the rest is still being synthesized.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Step 5: Run it and connect Twilio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Start your server and expose it through ngrok:&lt;/p&gt;

&lt;p&gt;uvicorn server:app --port 8000&lt;br&gt;&lt;br&gt;
ngrok http 8000&lt;/p&gt;

&lt;p&gt;Copy the https://*.ngrok-free.dev URL ngrok prints. In the Twilio console:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Buy or pick a Voice-enabled phone number&lt;/li&gt;
&lt;li&gt;Open the number's configuration&lt;/li&gt;
&lt;li&gt;Under "A call comes in," set the webhook to &lt;a href="https://your-ngrok-url/twilio/voice" rel="noopener noreferrer"&gt;https://your-ngrok-url/twilio/voice&lt;/a&gt; with method POST&lt;/li&gt;
&lt;li&gt;Save&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Call the number from your phone. You should hear the agent pick up and respond in natural conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Latency budget: where your milliseconds go&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A natural-feeling phone agent answers in under 800ms from when the caller stops speaking to when the caller hears the first audio of the reply. Here's where that budget gets spent on a Twilio + AssemblyAI stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Typical latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AssemblyAI end-of-turn finalization&lt;/td&gt;
&lt;td&gt;~150–250ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM first-token generation (GPT-4o)&lt;/td&gt;
&lt;td&gt;~200–400ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTS first-byte (ElevenLabs streaming)&lt;/td&gt;
&lt;td&gt;~200–400ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Twilio round-trip&lt;/td&gt;
&lt;td&gt;~50–100ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total perceived latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~600–1100ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three things blow the budget the moment you stop being careful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resampling audio.&lt;/strong&gt; Anything that converts 8kHz mulaw to 16kHz PCM (and back) costs 50–150ms each way. AssemblyAI's Universal-3 Pro Streaming and ElevenLabs's ulaw_8000 output both keep audio in mulaw end-to-end.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-streaming LLMs.&lt;/strong&gt; Waiting for the full response before TTS starts is a guaranteed dead zone. Stream tokens from the LLM and chunk them to TTS sentence-by-sentence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold-start tools.&lt;/strong&gt; A tool call that hits a slow database eats your entire turn. Cache hot data and aggressively timeout slow lookups.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What about the AssemblyAI Voice Agent API?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If your voice agent doesn't need Twilio specifically — for example a browser-based assistant, a mobile app, or an embedded device — the &lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt;Voice Agent API&lt;/a&gt; wraps STT, LLM, TTS, turn detection, and tool calling behind a single WebSocket at a flat $4.50/hour (&lt;a href="https://www.assemblyai.com/blog/introducing-our-voice-agent-api" rel="noopener noreferrer"&gt;announcement&lt;/a&gt;). You skip the three-provider plumbing entirely.&lt;/p&gt;

&lt;p&gt;For Twilio-bridged phone calls today, the chained architecture in this tutorial is still the most flexible path — it lets you pick exactly the LLM, TTS voice, and tool definitions you want. The Voice Agent API is the right choice for everything that isn't a PSTN inbound call, and Twilio integration through the Voice Agent API is on the roadmap.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The complete repository&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Fork the runnable repo at &lt;a href="https://github.com/kelsey-aai/twilio-voice-agent-assemblyai" rel="noopener noreferrer"&gt;github.com/kelsey-aai/twilio-voice-agent-assemblyai&lt;/a&gt;. It includes the FastAPI server, tool dispatcher, sample tools (get_order_status, transfer_to_human), a .env.example, and ngrok setup instructions. Total length: ~250 lines of Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Frequently asked questions&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do I build a voice agent with Twilio and AssemblyAI?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To build a voice agent with Twilio and AssemblyAI, point your Twilio phone number at a TwiML endpoint that opens a  to your server's WebSocket. In the WebSocket handler, forward Twilio's 8kHz mulaw audio frames to AssemblyAI's Universal-3 Pro Streaming API using encoding=pcm_mulaw and sample_rate=8000. When AssemblyAI returns a finalized turn, pass the transcript to an LLM (GPT-4o, Claude) with your tool definitions — see our &lt;a href="https://www.assemblyai.com/blog/build-voice-agent-function-calling" rel="noopener noreferrer"&gt;function calling tutorial&lt;/a&gt; for a deeper walkthrough — then stream the LLM's reply through a TTS model that supports ulaw_8000 output (like ElevenLabs) back to Twilio as base64-encoded media events.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why use AssemblyAI for a Twilio voice agent?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;AssemblyAI's Universal-3 Pro Streaming model is built for the audio Twilio actually sends — 8kHz mulaw — without requiring resampling, which costs latency. For an overview of the broader category, see &lt;a href="https://www.assemblyai.com/blog/ai-voice-agents" rel="noopener noreferrer"&gt;AI voice agents in 2026&lt;/a&gt;. It delivers 307ms P50 latency, immutable transcripts your downstream LLM can trust, and 21% fewer alphanumeric errors than the previous generation, which matters when the agent is capturing confirmation codes, phone numbers, or email addresses over a phone line.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does the Voice Agent API work with Twilio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The AssemblyAI Voice Agent API is the simplest path for voice agents that don't need Twilio specifically — a single WebSocket replaces STT, LLM, and TTS at $4.50/hour. Native Twilio integration through the Voice Agent API is on the roadmap. Today, the chained architecture in this tutorial (Universal-3 Pro Streaming + your LLM + your TTS, bridged through a Twilio Media Streams WebSocket) is the standard path for Twilio-based phone agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What latency should I expect from a Twilio voice agent?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A well-tuned Twilio voice agent built on AssemblyAI Universal-3 Pro Streaming, GPT-4o, and ElevenLabs typically hits 600–1100ms from caller-stops-talking to caller-hears-reply. The biggest latency killers are resampling audio (use native mulaw end-to-end), non-streaming LLM responses (stream tokens), and slow tool calls (cache and timeout aggressively).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How much does it cost to run a phone-based voice agent?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The cost breaks down across four components: Twilio voice (per-minute, varies by country), AssemblyAI Universal-3 Pro Streaming ($0.15/hour of session time), the LLM (varies by provider — typically a few cents per minute of conversation for GPT-4o), and TTS (per-character or per-minute). End-to-end you're looking at a few cents per minute at scale, with the exact number driven by which LLM and TTS you choose.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can a Twilio voice agent handle multiple simultaneous calls?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. AssemblyAI's Universal-3 Pro Streaming supports unlimited concurrent streams at a flat $0.15/hour with no separate negotiation required. Twilio handles concurrency per-account based on your plan. The constraint at scale is usually your own server's WebSocket concurrency limits — FastAPI with uvicorn workers handles hundreds of concurrent calls comfortably on modest hardware.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>telephony</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Build an AI voice agent for customer support that can look up orders</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:27:21 +0000</pubDate>
      <link>https://dev.to/martschweiger/build-an-ai-voice-agent-for-customer-support-that-can-look-up-orders-4nlj</link>
      <guid>https://dev.to/martschweiger/build-an-ai-voice-agent-for-customer-support-that-can-look-up-orders-4nlj</guid>
      <description>&lt;p&gt;Tier-1 customer support is mostly the same five conversations on repeat: where's my order, can I change my address, can I get a refund, when does this ship, can I talk to a human. They're predictable, they're high-volume, and they don't need a person — they need a voice agent that can actually look things up.&lt;/p&gt;

&lt;p&gt;This tutorial walks you through building one. By the end, you'll have a Python voice agent that answers calls, listens for an order ID or email, calls into your backend to check the status, and reads the result back to the customer in real time. When something goes off-script, it transfers to a human with the full conversation context attached.&lt;/p&gt;

&lt;p&gt;We're using &lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt;AssemblyAI's Voice Agent API&lt;/a&gt; — one WebSocket that handles the speech understanding, LLM reasoning, voice generation, turn detection, and tool calling in a single connection. Total time to a working prototype: about an afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most support voice agents fail
&lt;/h2&gt;

&lt;p&gt;Before we build, it's worth knowing where these things break. The pattern is almost always the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Customer says "my order ID is A-B-3-7-9-2"&lt;/li&gt;
&lt;li&gt;STT mishears it as "a b 37 92" or "ABE 379 to"&lt;/li&gt;
&lt;li&gt;The LLM calls get_order_status("ab3792") or worse, asks the customer to repeat&lt;/li&gt;
&lt;li&gt;Customer hangs up&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent didn't fail because the LLM was wrong. It failed because the speech-to-text layer couldn't capture the entity correctly. This is why entity accuracy on alphanumerics, emails, and phone numbers matters more than overall WER for support agents — and why we're building on Universal-3 Pro Streaming, which has a 16.7% mixed-entity error rate vs. 23–25% for competing models.&lt;/p&gt;

&lt;p&gt;The second-most-common failure: dead air during tool calls. The customer asks a question, the agent calls a backend, and there's a 2–3 second silence while the lookup runs. The Voice Agent API solves this by speaking a natural transition phrase ("let me check that for you") while the tool runs — no dead air, no awkward pauses.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A Python voice support agent that handles three real workflows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Order status lookup&lt;/strong&gt; — customer says "where's my order?" → agent asks for the ID → looks it up → reads back status, ETA, tracking number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer info verification&lt;/strong&gt; — customer provides email or phone number → agent looks up the account → confirms identity before proceeding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human escalation&lt;/strong&gt; — customer asks for a person, or the agent gets stuck → graceful transfer with conversation context preserved&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AssemblyAI Voice Agent API (one WebSocket: STT + LLM + TTS)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;A backend with order data — we'll mock it; replace with your real CRM or order management system&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install "websockets&amp;gt;=14" pyaudio python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Create .env:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ASSEMBLYAI_API_KEY=your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The Voice Agent API uses a single endpoint: wss://agents.assemblyai.com/v1/ws. One key, one connection, no separate STT or TTS providers to wire in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Define the support tools
&lt;/h2&gt;

&lt;p&gt;Tools are the agent's interface to your backend. The Voice Agent API uses standard JSON Schema, so anything you can describe with a schema, the agent can call.&lt;/p&gt;

&lt;p&gt;For a support agent, you typically want four tools:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json

TOOLS = [
    {
        "type": "function",
        "name": "get_order_status",
        "description": "Look up an order's current status, shipping ETA, and 
tracking number by order ID.",
        "parameters": {
            "type": "object",
            "properties": {
                "order_id": {
                    "type": "string",
                    "description": "The customer's order ID, e.g. ORD-12345 or
78231-ABC.",
                },
            },
            "required": ["order_id"],
        },
    },
    {
        "type": "function",
        "name": "lookup_account_by_email",
        "description": "Find a customer account using their email address.",
        "parameters": {
            "type": "object",
            "properties": {
                "email": {"type": "string", "description": "The customer's email
address."},
            },
            "required": ["email"],
        },
    },
    {
        "type": "function",
        "name": "list_recent_orders",
        "description": "List the customer's most recent orders. Use after the 
account is verified.",
        "parameters": {
            "type": "object",
            "properties": {
                "account_id": {"type": "string"},
                "limit": {"type": "integer", "description": "Max number of orders 
to return.", "default": 5},
            },
            "required": ["account_id"],
        },
    },
    {
        "type": "function",
        "name": "transfer_to_human",
        "description": "Transfer the call to a human agent. Use when the customer 
asks, when you can't help, or when the issue is sensitive.",
        "parameters": {
            "type": "object",
            "properties": {
                "reason": {"type": "string", "description": "Short reason for the 
transfer."},
                "summary": {"type": "string", "description": "Brief summary of the 
conversation so far."},
            },
            "required": ["reason", "summary"],
        },
    },
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now implement the actual functions. Replace these stubs with calls to your real backend:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ORDERS_DB = {
    "ORD-12345": {"status": "shipped", "eta": "2026-05-09", "tracking": 
"1Z999AA10123456784"},
    "ORD-67890": {"status": "processing", "eta": "2026-05-12", "tracking": None},
}

ACCOUNTS_DB = {
    "jane@example.com": {"account_id": "ACC-001", "name": "Jane Doe"},
}

ACCOUNT_ORDERS = {
    "ACC-001": [
        {"order_id": "ORD-12345", "date": "2026-05-01", "total": "$84.99"},
        {"order_id": "ORD-12100", "date": "2026-04-22", "total": "$42.00"},
    ],
}

def run_tool(name: str, args: dict) -&amp;gt; dict:
    if name == "get_order_status":
        order = ORDERS_DB.get(args["order_id"].upper())
        if not order:
            return {"error": "order_not_found", "order_id": args["order_id"]}
        return order

    if name == "lookup_account_by_email":
        account = ACCOUNTS_DB.get(args["email"].lower())
        if not account:
            return {"error": "account_not_found"}
        return account

    if name == "list_recent_orders":
        orders = ACCOUNT_ORDERS.get(args["account_id"], [])
        return {"orders": orders[: args.get("limit", 5)]}

    if name == "transfer_to_human":
        # In production: trigger your call routing / queue handoff here
        return {"transferred": True, "queue": "support-tier-2"}

    return {"error": f"unknown_tool: {name}"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The error-shape pattern matters. When get_order_status can't find an order, it returns a structured error rather than throwing — that gives the LLM the context it needs to apologize and ask the customer to verify the ID, instead of crashing the conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Write the system prompt
&lt;/h2&gt;

&lt;p&gt;The system prompt is where you encode the agent's behavior. For support, you want a few things every time: identity and tone, when to ask for verification before sharing details, when to use which tool, when to transfer to a human, and specific phrasing for transition moments (the "let me check that" line).&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SYSTEM_PROMPT = """
You are Avery, a customer support agent for Acme Corp. Your goal is to help c
ustomers
quickly and accurately. You have access to tools that let you look up orders and
accounts.

Behavior rules:
- Greet warmly and ask how you can help.
- For order questions, ask for the order ID first if the customer hasn't given it.
- If a customer gives an email or phone number, use lookup_account_by_email to
verify.
- Read order status, ETA, and tracking number clearly. Don't read raw timestamps —
  say dates naturally (e.g., "Friday, May 9th").
- When you need to call a tool, say a brief transition like "Let me check on that"
  or "One moment while I pull that up."
- If the customer asks for a human, sounds frustrated, or has a complex issue
  (refund disputes, damaged product, billing errors), use transfer_to_human and
  include a short summary.
- Never make up an order ID, status, or tracking number. If a tool returns an 
error,
  apologize, ask the customer to verify the ID, and try again.
- Keep replies short and conversational. This is a phone call, not an email.
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The "never make up" line is the most important sentence in the prompt. Without it, LLMs sometimes invent plausible-sounding tracking numbers when the lookup fails. With it, they ask for clarification instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Connect to the Voice Agent API
&lt;/h2&gt;

&lt;p&gt;Now the WebSocket connection. The pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open wss://agents.assemblyai.com/v1/ws with your API key&lt;/li&gt;
&lt;li&gt;Send session.update with the system prompt, tools, voice, and greeting&lt;/li&gt;
&lt;li&gt;Wait for session.ready, then start streaming microphone audio&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Handle incoming events — tool.call, reply.audio, transcript.user, reply.done&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import asyncio
import websockets
import os
import pyaudio

API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
WS_URL = "wss://agents.assemblyai.com/v1/ws"
SAMPLE_RATE = 24000

async def run_agent():
    async with websockets.connect(
        WS_URL,
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        # Configure the agent
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": SYSTEM_PROMPT,
                "greeting": "Hi, this is Avery from Acme support. How can I
help?",
                "output": {"voice": "ivy"},
                "tools": TOOLS,
            },
        }))

        # Set up microphone capture and speaker playback
        pa = pyaudio.PyAudio()
        mic = pa.open(format=pyaudio.paInt16, channels=1, rate=SAMPLE_RATE,
                      input=True, frames_per_buffer=1024)
        speaker = pa.open(format=pyaudio.paInt16, channels=1, rate=SAMPLE_RATE,
                          output=True)

        ready = asyncio.Event()
        pending_tools = []

        async def send_audio():
            await ready.wait()
            import base64
            while True:
                audio = mic.read(1024, exception_on_overflow=False)
                await ws.send(json.dumps({
                    "type": "input.audio",
                    "audio": base64.b64encode(audio).decode(),
                }))
                await asyncio.sleep(0)

        async def handle_messages():
            async for raw in ws:
                event = json.loads(raw)
                t = event.get("type")

                if t == "session.ready":
                    ready.set()
                    print("Agent ready. Start speaking.")

                elif t == "transcript.user":
                    print(f"\nUser: {event['text']}")

                elif t == "transcript.agent":
                    print(f"Agent: {event['text']}")

                elif t == "reply.audio":
                    import base64
                    speaker.write(base64.b64decode(event["data"]))

                elif t == "tool.call":
                    name = event["name"]
                    args = event.get("arguments", {})
                    print(f"  [tool] {name}({args})")
                    result = run_tool(name, args)
                    pending_tools.append({"call_id": event["call_id"], "result": 
result})

                elif t == "reply.done":
                    if event.get("status") == "interrupted":
                        pending_tools.clear()
                    elif pending_tools:
                        for tool in pending_tools:
                            await ws.send(json.dumps({
                                "type": "tool.result",
                                "call_id": tool["call_id"],
                                "result": json.dumps(tool["result"]),
                            }))
                        pending_tools.clear()

        await asyncio.gather(send_audio(), handle_messages())

if __name__ == "__main__":
    asyncio.run(run_agent())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;A few details that the docs flag and you'd otherwise debug for an hour:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't send tool.result immediately&lt;/strong&gt; when you receive tool.call. Accumulate results and send them inside the reply.done handler. Sending too early causes timing issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discard pending tool results on interruption.&lt;/strong&gt; If the user speaks while the agent is generating a transition phrase, you'll get reply.done with status: "interrupted" — clear the buffer and wait for the next turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice names are case-sensitive.&lt;/strong&gt; Use lowercase: ivy, claire, dawn. An invalid voice returns session.error.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Test the three workflows
&lt;/h2&gt;

&lt;p&gt;Run the script and walk through each support scenario. You should hear:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow 1 — Order lookup:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; You: "Hi, I'm trying to check on order O-R-D 1-2-3-4-5"
Agent: "Sure, let me check on that... I see order ORD-12345. It shipped and is
        on its way — you should have it by Friday, May 9th. The tracking number
        is 1Z999AA10123456784."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Workflow 2 — Email-based account lookup:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; You: "I forgot my order ID. Can you look me up by email?"
Agent: "Of course. What's the email on the account?"
You: "It's jane at example dot com."
Agent: "One moment... Got it, you're Jane Doe. I see two recent orders:
        ORD-12345 from May 1st for $84.99, and ORD-12100 from April 22nd
        for $42.00. Which one are you asking about?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Workflow 3 — Human transfer:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; You: "I just want to talk to a person."
Agent: "I understand. Let me get you over to a teammate now."
[tool.call: transfer_to_human({"reason": "user requested human", "summary": "..."})]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Speak the order ID with hesitation, mumbles, accents, and natural disfluencies — that's where Universal-3 Pro Streaming earns its keep. The agent should still extract the ID correctly because it's tuned for the alphanumeric tokens that voice agents act on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Take it to the phone
&lt;/h2&gt;

&lt;p&gt;This works in your browser through your microphone, but real customer support runs on phones. Twilio Media Streams is the standard bridge — your server accepts the inbound call from Twilio and opens a parallel connection to the Voice Agent API, forwarding audio in both directions.&lt;/p&gt;

&lt;p&gt;The Voice Agent API supports audio/pcmu (G.711 μ-law at 8 kHz) natively, which matches Twilio's codec exactly. No transcoding, no resampling. The &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/connect-to-twilio" rel="noopener noreferrer"&gt;Twilio integration guide&lt;/a&gt; walks through the full bridge in about 100 lines of TypeScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to harden before production
&lt;/h2&gt;

&lt;p&gt;Three things you'll want to nail down before pointing this at real customers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replace the in-memory mocks&lt;/strong&gt; with calls to your actual CRM or order management system. Add timeouts and error handling so a slow backend doesn't kill the conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log everything.&lt;/strong&gt; Save user transcripts, tool calls, results, and the agent's responses tied to a session ID. Conversation logs are your debugging tool when something goes wrong on call #4,712.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune turn detection for your acoustic environment.&lt;/strong&gt; The defaults work for most use cases. For phone audio with background noise, you may want to raise min_end_of_turn_silence_ms slightly so the agent doesn't cut off thoughtful pauses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where to go from there
&lt;/h2&gt;

&lt;p&gt;Once the basic order-lookup loop works, the same tool-calling pattern extends to every other support workflow you have: cancel an order, update a shipping address, request a refund, schedule a callback, fetch FAQ answers from a knowledge base. Add the function, describe it in the system prompt, and the agent picks it up — no new infrastructure.&lt;/p&gt;

&lt;p&gt;The compounding win: every conversation goes through the same Voice Agent API connection, the same transcription model, the same billing relationship. You're not assembling a new vendor stack; you're adding tools to an agent that already works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt;Try the Voice Agent API live&lt;/a&gt; on the product page — it's the same API you'd ship with — or &lt;a href="https://www.assemblyai.com/dashboard/signup" rel="noopener noreferrer"&gt;grab a free API key with $50 in starter credits&lt;/a&gt; and have your first agent answering calls by end of day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I build an AI voice agent for customer support that can look up orders?
&lt;/h3&gt;

&lt;p&gt;Build it on AssemblyAI's Voice Agent API, register a get_order_status function as a tool with JSON Schema, and connect to the WebSocket at wss://agents.assemblyai.com/v1/ws. The agent transcribes the customer's speech, decides when to call your function, executes it through your backend, and speaks the result back — all on a single connection. Most developers ship a working agent in an afternoon because there's no SDK to learn and no separate STT, LLM, or TTS providers to wire together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does speech-to-text accuracy matter so much for support voice agents?
&lt;/h3&gt;

&lt;p&gt;Support agents constantly need to capture alphanumeric tokens — order IDs, account numbers, email addresses, phone numbers — and a single transcription error breaks the workflow. If the STT layer mishears "ORD-12345" as "or 12 three 45," your get_order_status function gets a garbled ID and returns nothing. AssemblyAI's Voice Agent API is built on Universal-3 Pro Streaming, which has a 16.7% mixed-entity error rate vs. 23–25% for competing models — that's the difference between tool calls that succeed and tool calls that silently fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does tool calling work with the AssemblyAI Voice Agent API?
&lt;/h3&gt;

&lt;p&gt;You register tools by passing an array of function definitions in session.tools on a session.update event. When the agent decides to call a tool, it emits a tool.call event with the function name and arguments. You execute the function and accumulate results, then send tool.result events inside your reply.done handler — not immediately on tool.call. While the tool runs, the agent speaks a brief transition phrase like "let me check that for you" so the conversation never goes silent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I connect AssemblyAI's Voice Agent API to phone calls with Twilio?
&lt;/h3&gt;

&lt;p&gt;Yes — the Voice Agent API supports audio/pcmu (G.711 μ-law at 8 kHz) natively, which matches Twilio's codec exactly with no transcoding needed. You set up a server that accepts the inbound Twilio Media Streams call, opens a parallel WebSocket to the Voice Agent API, and forwards audio in both directions. The official &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/connect-to-twilio" rel="noopener noreferrer"&gt;Twilio integration guide&lt;/a&gt; walks through inbound and outbound calling in about 100 lines of TypeScript.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the best way to handle escalation to a human in a customer support voice agent?
&lt;/h3&gt;

&lt;p&gt;Register a transfer_to_human tool with parameters for reason and summary, and instruct the agent in the system prompt to call it when the customer asks for a person, sounds frustrated, or has a complex issue (refund disputes, billing errors, damaged products). The agent generates a short summary of the conversation that you forward to your human queue, so the receiving agent doesn't have to ask the customer to repeat themselves. This is one of the most important workflows to design well — a poor handoff feels worse than no AI at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to run a customer support voice agent on AssemblyAI?
&lt;/h3&gt;

&lt;p&gt;The Voice Agent API is $4.50/hr flat — covering speech understanding, LLM reasoning, voice generation, turn detection, and tool calling all in one bill. There are no per-token surcharges, no concurrency caps, and no separate invoices for STT, LLM, and TTS providers. Pricing is billed by the minute on actual conversation duration, and a free tier is available for testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do voice agents built with AssemblyAI work with healthcare workflows subject to HIPAA?
&lt;/h3&gt;

&lt;p&gt;Yes — AssemblyAI offers a Business Associate Addendum (BAA) for customers processing protected health information (PHI) and is SOC 2 Type 2, ISO 27001:2022, and PCI DSS v4.0 certified. For clinical use cases (medical front-office voice agents, healthcare contact centers), enable Medical Mode with domain="medical-v1" to improve transcription accuracy on medication names, procedures, conditions, and dosages. Do not point the agent at real PHI without a signed BAA in place.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>customersupport</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Build a real-time voice AI agent in Python with the AssemblyAI Voice Agent API</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:26:37 +0000</pubDate>
      <link>https://dev.to/martschweiger/build-a-real-time-voice-ai-agent-in-python-with-the-assemblyai-voice-agent-api-4477</link>
      <guid>https://dev.to/martschweiger/build-a-real-time-voice-ai-agent-in-python-with-the-assemblyai-voice-agent-api-4477</guid>
      <description>&lt;p&gt;You can build a working real-time voice agent in Python in well under 100 lines of code if you use the right primitive. This tutorial walks through building one on the &lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt;AssemblyAI Voice Agent API&lt;/a&gt; — a single WebSocket that wraps streaming speech-to-text, an LLM, text-to-speech, turn detection, and tool calling at $4.50/hour flat. No three-provider pipeline to wire up, no separate STT WebSocket plus LLM HTTP plus TTS stream to coordinate. Audio in, audio out, tool calls in between.&lt;/p&gt;

&lt;p&gt;By the end of this guide, you'll have a runnable Python voice agent that listens through your microphone, holds a real conversation, and calls Python functions to take actions. The companion repository is linked at the end. If you'd rather chain streaming STT, an LLM, and a TTS provider yourself, our &lt;a href="https://www.assemblyai.com/blog/python-voice-agent-tutorial" rel="noopener noreferrer"&gt;Python voice agent tutorial&lt;/a&gt; covers that path, or see the &lt;a href="https://www.assemblyai.com/blog/build-a-voice-agent-5-minutes-voice-agent-api" rel="noopener noreferrer"&gt;5-minute Voice Agent API quickstart&lt;/a&gt; for an even faster path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use the Voice Agent API for a Python voice agent
&lt;/h2&gt;

&lt;p&gt;The traditional "voice agent in Python" tutorial wires together a streaming STT API, an LLM HTTP endpoint, and a TTS streaming connection — three providers, three sets of credentials, three sets of latency variables to tune, and your own turn detection logic to write. The result works, but it's a lot of plumbing.&lt;/p&gt;

&lt;p&gt;The Voice Agent API replaces all of that with one WebSocket. You connect once, send audio frames, and receive both audio output and tool call events on the same stream. Three properties make it useful for production Python voice agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One bill, one set of logs.&lt;/strong&gt; $4.50/hour of session time covers STT, LLM inference, TTS, turn detection, and tool calling. You're not pasting three invoices into a cost spreadsheet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speech accuracy that works on real audio.&lt;/strong&gt; &lt;a href="https://www.assemblyai.com/universal-3-pro-streaming" rel="noopener noreferrer"&gt;Universal-3 Pro Streaming&lt;/a&gt; sits underneath — 307ms P50 latency, immutable transcripts, native 8kHz mulaw support for telephony, and 21% fewer alphanumeric errors than the previous generation of &lt;a href="https://www.assemblyai.com/products/streaming-speech-to-text" rel="noopener noreferrer"&gt;streaming STT&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling that maps to Python functions cleanly.&lt;/strong&gt; Define tools as JSON schemas, the LLM calls them, results stream back into the conversation. No separate function-calling API or LLM provider to manage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Microphone
     │  PCM16 24kHz mono
     ▼
  Your Python script
     │  WebSocket: input.audio frames
     ▼
  AssemblyAI Voice Agent API
   ┌────────────────────────────────┐
   │  STT + Turn detection           │
   │      ↓                          │
   │  LLM + tool calling             │
   │      ↓                          │
   │  TTS                            │
   └────────────────────────────────┘
     │
     │  WebSocket: reply.audio + tool.call events
     ▼
  Your Python script
     ├─► Speaker playback
     └─► Dispatch tool calls back to LLM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Audio flows in two directions on the same WebSocket. Your script captures mic audio, base64-encodes it, and sends it as input.audio events. The API returns audio playback chunks as reply.audio events and structured tool.call events when the LLM decides to invoke one of your tools. You dispatch the tool, send back a tool.result, and the conversation continues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before you start
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;a href="https://www.assemblyai.com/dashboard/signup" rel="noopener noreferrer"&gt;AssemblyAI account&lt;/a&gt; with Voice Agent API access&lt;/li&gt;
&lt;li&gt;Python 3.11+&lt;/li&gt;
&lt;li&gt;A working microphone and speakers (use &lt;strong&gt;headphones&lt;/strong&gt; for clean barge-in — desktop mics pick up the agent's own voice and cause it to interrupt itself)&lt;/li&gt;
&lt;li&gt;portaudio installed system-wide (brew install portaudio on macOS, apt install portaudio19-dev on Debian/Ubuntu)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Install the dependencies:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install "websockets&amp;gt;=14" python-dotenv pyaudio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Drop your API key into a .env file:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ASSEMBLYAI_API_KEY=your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 1: Capture microphone audio
&lt;/h2&gt;

&lt;p&gt;PyAudio captures raw PCM audio. The Voice Agent API's default audio/pcm encoding is &lt;strong&gt;24 kHz, 16-bit, mono&lt;/strong&gt; — the audio format docs recommend ~50 ms chunks for low latency.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# audio.py
import threading
from queue import Queue
import pyaudio

SAMPLE_RATE = 24000
CHUNK_SIZE = 1200  # 50ms at 24kHz 16-bit mono

class Mic:
    def __init__(self):
        self._pa = pyaudio.PyAudio()
        self.queue = Queue()
        self._running = False

    def start(self):
        self._running = True
        self._stream = self._pa.open(
            format=pyaudio.paInt16, channels=1, rate=SAMPLE_RATE,
            input=True, frames_per_buffer=CHUNK_SIZE,
        )
        threading.Thread(target=self._loop, daemon=True).start()

    def _loop(self):
        while self._running:
            self.queue.put(self._stream.read(CHUNK_SIZE, 
exception_on_overflow=False))

    def stop(self):
        self._running = False
        self._stream.stop_stream(); self._stream.close()
        self._pa.terminate()

class Speaker:
    def __init__(self):
        self._pa = pyaudio.PyAudio()
        self._stream = self._open()

    def _open(self):
        return self._pa.open(
            format=pyaudio.paInt16, channels=1, rate=SAMPLE_RATE, output=True,
        )

    def play(self, audio_bytes):
        self._stream.write(audio_bytes)

    def flush_and_restart(self):
        # Called on barge-in: drop any queued speech and reopen the stream.
        try:
            self._stream.stop_stream(); self._stream.close()
        except Exception:
            pass
        self._stream = self._open()

    def close(self):
        self._stream.stop_stream(); self._stream.close()
        self._pa.terminate()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 2: Open the Voice Agent API session
&lt;/h2&gt;

&lt;p&gt;The Voice Agent API connection starts with a session.update message that declares your system prompt, the tools you want available, the agent's voice, and an opening greeting. The API picks audio/pcm (24 kHz) by default, so you don't need to specify input/output format explicitly.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# agent.py
import asyncio, base64, json, os
import websockets
from dotenv import load_dotenv

from audio import Mic, Speaker
from tools import TOOLS, dispatch_tool

load_dotenv()

VOICE_AGENT_WS = "wss://agents.assemblyai.com/v1/ws"

SYSTEM_PROMPT = """You are a helpful voice assistant.
Keep replies short and conversational — one or two sentences.
Use the available tools to answer questions when relevant."""

async def open_session(ws):
    await ws.send(json.dumps({
        "type": "session.update",
        "session": {
            "system_prompt": SYSTEM_PROMPT,
            "greeting": "Hi! How can I help?",
            "tools": TOOLS,
            "output": {"voice": "ivy"},
        },
    }))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;A few details worth flagging up front, because they're the easy ones to get wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The auth header for the Voice Agent API uses &lt;strong&gt;Authorization: Bearer YOUR_KEY&lt;/strong&gt; — note the Bearer prefix. This is different from every other AssemblyAI endpoint, which accepts the raw API key with no prefix.&lt;/li&gt;
&lt;li&gt;The first message you send is session.update, not session.start. All config nests under a session object.&lt;/li&gt;
&lt;li&gt;The voice field is a named voice from the Voice Agent API catalog (e.g. ivy, james, sophie) — not an ElevenLabs voice ID. See the &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/voices" rel="noopener noreferrer"&gt;voices reference&lt;/a&gt; for the full list.&lt;/li&gt;
&lt;li&gt;You must wait for the server's session.ready event before sending any audio.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Pump audio in, route events out
&lt;/h2&gt;

&lt;p&gt;Two coroutines run concurrently: one sends mic chunks once the session is ready, the other reads events as they arrive.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def run_agent():
    mic = Mic()
    speaker = Speaker()

    async with websockets.connect(
        VOICE_AGENT_WS,
        additional_headers={"Authorization": f"Bearer 
{os.environ['ASSEMBLYAI_API_KEY']}"},
    ) as ws:
        await open_session(ws)

        ready = asyncio.Event()
        pending_tools = []
        loop = asyncio.get_event_loop()

        async def send_audio():
            await ready.wait()
            mic.start()
            while True:
                chunk = await loop.run_in_executor(None, mic.queue.get)
                await ws.send(json.dumps({
                    "type": "input.audio",
                    "audio": base64.b64encode(chunk).decode(),
                }))

        async def receive():
            async for raw in ws:
                event = json.loads(raw)
                kind = event["type"]

                if kind == "session.ready":
                    ready.set()
                    print(f"Session ready: {event.get('session_id')}")

                elif kind == "reply.audio":
                    speaker.play(base64.b64decode(event["data"]))

                elif kind == "tool.call":
                    # Accumulate — flush on reply.done, not now.
                    result = dispatch_tool(event["name"], event.get("arguments",
{}))
                    pending_tools.append({"call_id": event["call_id"], "result":
result})

                elif kind == "reply.done":
                    if event.get("status") == "interrupted":
                        pending_tools.clear()
                        speaker.flush_and_restart()
                    elif pending_tools:
                        for tool in pending_tools:
                            value = tool["result"]
                            if not isinstance(value, str):
                                value = json.dumps(value)
                            await ws.send(json.dumps({
                                "type": "tool.result",
                                "call_id": tool["call_id"],
                                "result": value,
                            }))
                        pending_tools.clear()

                elif kind == "transcript.user":
                    print(f"You:   {event['text']}")

                elif kind == "transcript.agent":
                    print(f"Agent: {event['text']}")

        await asyncio.gather(send_audio(), receive())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That's the entire voice agent loop. The Voice Agent API handles every layer of the pipeline (STT, LLM, TTS, turn detection) inside the WebSocket. Your job is to feed it audio, play what comes back, and dispatch tool calls.&lt;/p&gt;

&lt;p&gt;Two more easy-to-miss details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool result timing.&lt;/strong&gt; Per the &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/tool-calling" rel="noopener noreferrer"&gt;tool calling docs&lt;/a&gt;, accumulate tool results when tool.call fires and send them inside the reply.done handler — not immediately. The agent generates a short transition phrase ("let me check on that") while the tools run; sending results too early can cause timing issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interruption handling.&lt;/strong&gt; When the user barges in, the server sends reply.done with status: "interrupted". Drop any queued tool results and flush the speaker so the caller doesn't keep hearing the previous reply.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Implement the tools
&lt;/h2&gt;

&lt;p&gt;The dispatch_tool function is where your agent does real work. The Voice Agent API delivers tool.call events with arguments already parsed as a Python dict — no json.loads() needed.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# tools.py
TOOLS = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
    {
        "type": "function",
        "name": "remember",
        "description": "Save something the user wants you to remember.",
        "parameters": {
            "type": "object",
            "properties": {"fact": {"type": "string"}},
            "required": ["fact"],
        },
    },
]

_memory = []

def dispatch_tool(name, args):
    if name == "get_weather":
        # In production: call a real weather API.
        return f"It's 68°F and partly cloudy in {args['city']}."
    if name == "remember":
        _memory.append(args["fact"])
        return f"Got it. I'll remember: {args['fact']}"
    return f"Unknown tool: {name}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The "type": "function" field on each tool is required. Forget it and the API will reject the session.update with a validation error.&lt;/p&gt;

&lt;p&gt;In production, replace the stubs with calls to a real weather API, your CRM, a database, or whatever your application actually does. The tool dispatcher is pure Python — anything you can do from a Python function, the voice agent can do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run it
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The agent greets you. Try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What's the weather in San Francisco?"&lt;/li&gt;
&lt;li&gt;"Remember that my passport expires in March."&lt;/li&gt;
&lt;li&gt;"What did I just ask you to remember?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full flow: your speech → STT → LLM (with tools available) → tool call (if applicable) → tool result → LLM continues → TTS → speaker. All in under a second, on one WebSocket.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency: getting under 500ms perceived
&lt;/h2&gt;

&lt;p&gt;A natural-feeling voice agent responds in under 800ms from when you stop talking to when you hear the reply. Best-in-class teams target sub-500ms. Where your milliseconds go on the Voice Agent API:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Typical latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mic chunk → server&lt;/td&gt;
&lt;td&gt;~50–100ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;End-of-turn detection&lt;/td&gt;
&lt;td&gt;~100–200ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM first-token&lt;/td&gt;
&lt;td&gt;~200–400ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTS first-byte → speaker&lt;/td&gt;
&lt;td&gt;~100–250ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Perceived total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~450–950ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Voice Agent API streams audio output as it's generated, so the user hears the first word of the reply while the rest is still being synthesized. The biggest latency wins on the Python side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't buffer mic audio.&lt;/strong&gt; Send 50ms chunks as they arrive — that's what the audio.py example does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't block in the tool dispatcher.&lt;/strong&gt; If a tool call takes more than 500ms, the silence becomes audible. Cache hot data, set aggressive timeouts, and consider returning a placeholder ("Let me check on that") while the real call resolves.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the streaming audio output.&lt;/strong&gt; Play reply.audio chunks as they arrive; never wait for the full response.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Handling interruptions
&lt;/h2&gt;

&lt;p&gt;Real conversations include interruptions. The user changes their mind, asks a follow-up while the agent is still talking, or says "wait, no, the other one." The Voice Agent API handles this server-side: barge-in is semantic — back-channels like "uh-huh" don't trigger an interruption, but "wait, stop" does.&lt;/p&gt;

&lt;p&gt;When the user actually interrupts, the server sends reply.done with status: "interrupted" (and transcript.agent with interrupted: true and the trimmed text). Your client should flush any queued speaker audio and drop any pending tool results, exactly as shown in the receive() loop above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going to production
&lt;/h2&gt;

&lt;p&gt;The agent above runs against your local microphone. To deploy it, swap the audio transport:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phone calls (PSTN)&lt;/strong&gt; — Bridge through Twilio Media Streams. The Voice Agent API supports audio/pcmu (G.711 μ-law at 8 kHz) natively, so phone audio stays in μ-law end-to-end with no resampling. See our our &lt;a href="https://www.assemblyai.com/blog/build-voice-agent-livekit" rel="noopener noreferrer"&gt;LiveKit voice agent guide&lt;/a&gt; if you'd rather use an orchestrator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web apps&lt;/strong&gt; — Capture audio in the browser with AudioWorklet, then stream it to the Voice Agent API. See &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/browser-integration" rel="noopener noreferrer"&gt;Browser integration&lt;/a&gt; for the temporary-token flow that keeps your API key off the client.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile&lt;/strong&gt; — Same pattern. The native audio capture APIs (iOS AVAudioEngine, Android AudioRecord) emit PCM you can forward through your server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For all production deployments, add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session persistence (save the session_id from session.ready and use session.resume to reconnect within 30 seconds without losing context)&lt;/li&gt;
&lt;li&gt;Per-session structured logs (user transcript, agent transcript, tool calls, tool results)&lt;/li&gt;
&lt;li&gt;PII redaction on transcripts before they hit your warehouse&lt;/li&gt;
&lt;li&gt;A timeout-and-retry policy for tool calls so a slow backend doesn't kill the call&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The complete repository
&lt;/h2&gt;

&lt;p&gt;Fork the runnable Python repo at &lt;a href="https://github.com/kelsey-aai/python-voice-agent-api" rel="noopener noreferrer"&gt;github.com/kelsey-aai/python-voice-agent-api&lt;/a&gt;. It includes mic capture, speaker playback, the WebSocket loop, the tool dispatcher, and example tools you can swap for your own. Around 200 lines of Python end-to-end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I build a real-time voice agent in Python?
&lt;/h3&gt;

&lt;p&gt;The fastest way to build a real-time &lt;a href="https://www.assemblyai.com/blog/ai-voice-agents" rel="noopener noreferrer"&gt;voice agent in Python&lt;/a&gt; in 2026 is to open a WebSocket to the AssemblyAI Voice Agent API at wss://agents.assemblyai.com/v1/ws, stream microphone audio in as input.audio events, and play the reply.audio events you get back. The Voice Agent API handles streaming speech-to-text, the LLM, text-to-speech, turn detection, and tool calling on a single connection at $4.50/hour, so you don't need to wire up three separate providers. With PyAudio for microphone access and the websockets library, the entire agent fits in well under 100 lines of Python.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between the Voice Agent API and chaining STT-LLM-TTS in Python?
&lt;/h3&gt;

&lt;p&gt;The chained approach uses three providers: a streaming STT API like AssemblyAI Universal-3 Pro Streaming, an LLM like GPT-4o, and a streaming TTS like ElevenLabs. You write the WebSocket bridge, turn detection logic, and retry handling yourself. The Voice Agent API replaces all of that with a single WebSocket — one provider, one bill, one set of logs. Chained pipelines give you finer control over each layer; the Voice Agent API is faster to ship and easier to operate at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I add tool calling to a Python voice agent?
&lt;/h3&gt;

&lt;p&gt;Define tools as JSON schemas in the tools field of your session.update message — each tool needs "type": "function", a name, a description, and a parameter schema. When the LLM decides to call a tool, the Voice Agent API emits a tool.call event on the WebSocket with the tool name, arguments (as a Python dict), and a call_id. Your Python dispatcher runs the actual function, then you send back a tool.result event with that call_id and the result. Send tool results inside your reply.done handler, not immediately on tool.call — the agent speaks a transition phrase while the tools run.&lt;/p&gt;

&lt;h3&gt;
  
  
  How low can latency go on a Python voice agent?
&lt;/h3&gt;

&lt;p&gt;A well-tuned Python voice agent on the Voice Agent API typically lands at 450–950ms perceived latency from end-of-turn to first audio out. The biggest wins are: (1) keep mic chunks small (~50ms) so end-of-turn detection fires fast, (2) don't block in your tool dispatcher — cache and timeout aggressively, and (3) play reply.audio chunks as they arrive instead of buffering. Universal-3 Pro Streaming alone hits 307ms P50 for transcription, which is the floor for the STT layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use a different LLM with the Voice Agent API?
&lt;/h3&gt;

&lt;p&gt;The Voice Agent API ships with frontier-quality LLMs under the hood, selected for low-latency conversational performance. If you specifically need a model that isn't available through the Voice Agent API, you can fall back to a chained architecture where you use AssemblyAI Universal-3 Pro Streaming for the STT layer and bring your own LLM and TTS. Most teams find the Voice Agent API model selection meets their needs and prefer the simpler architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I handle interruptions in a Python voice agent?
&lt;/h3&gt;

&lt;p&gt;The Voice Agent API detects barge-in semantically: back-channels like "uh-huh" don't interrupt, but "wait, stop" does. When the user actually interrupts, the server emits reply.done with status: "interrupted" and transcript.agent with interrupted: true. Your Python client should flush the speaker buffer (close and reopen the PyAudio output stream, or use sounddevice.abort()), drop any pending tool results, and continue listening for the user's new turn. This is what makes interruptions feel natural — the agent stops talking immediately instead of waiting for the previous reply to finish.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to create an AI cold-calling agent with the Voice Agent API</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:26:28 +0000</pubDate>
      <link>https://dev.to/martschweiger/how-to-create-an-ai-cold-calling-agent-with-the-voice-agent-api-70p</link>
      <guid>https://dev.to/martschweiger/how-to-create-an-ai-cold-calling-agent-with-the-voice-agent-api-70p</guid>
      <description>&lt;p&gt;An &lt;a href="https://www.assemblyai.com/solutions/voice-agents" rel="noopener noreferrer"&gt;AI cold-calling agent&lt;/a&gt; placed correctly does 500 lead-qualification calls in parallel for the cost of a single SDR. Placed poorly, it sounds like a robocall and gets hung up on in five seconds. The difference between the two isn't the LLM or the TTS — it's the speech accuracy on phone audio, the turn-taking that decides whether the agent interrupts a hesitant prospect, and the compliance layer that keeps you out of TCPA trouble.&lt;/p&gt;

&lt;p&gt;This tutorial walks through building an AI cold-calling agent on the &lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt;AssemblyAI Voice Agent API&lt;/a&gt; for the conversation layer, with Twilio for outbound dialing. The Voice Agent API gives you one WebSocket for STT, LLM, TTS, turn detection, and tool calling — you don't wire three providers together. You write the outbound dialer, the compliance gate, and the function dispatcher. The companion repository is linked at the end.&lt;/p&gt;

&lt;p&gt;If you're looking for the chained STT + LLM + TTS architecture instead, our &lt;a href="https://www.assemblyai.com/blog/how-to-create-ai-cold-calling-agent" rel="noopener noreferrer"&gt;original AI cold-calling agent guide&lt;/a&gt; covers that path with Universal-3 Pro Streaming directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI cold-calling agent does
&lt;/h2&gt;

&lt;p&gt;An AI cold-calling agent is an outbound voice AI system that dials a prospect, delivers a pitch in natural conversation, adapts in real time based on what the prospect says, and books qualified meetings or gathers disposition data. Unlike a robocall (one-way recorded message) or a power dialer with a human rep, it conducts a two-way conversation autonomously.&lt;/p&gt;

&lt;p&gt;The use cases where AI cold-calling agents work well today share three traits — high volume, structured pitch, and concrete success criteria (see our &lt;a href="https://www.assemblyai.com/blog/build-voice-agent-outbound-call-assemblyai" rel="noopener noreferrer"&gt;outbound calls walkthrough&lt;/a&gt; for the simpler "agent dials a single number" pattern):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Outbound SDR prospecting&lt;/strong&gt; : open with a relevant hook, qualify BANT, book a demo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Appointment setting&lt;/strong&gt; for field sales, financial advisors, home services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-engagement of lapsed leads&lt;/strong&gt; in a CRM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Survey and research calls&lt;/strong&gt; at scale&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Event follow-up and RSVP confirmation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Renewal and upsell motions&lt;/strong&gt; for existing customers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common thread: one script, thousands of conversations, a measurable booking rate or disposition. That's where the Voice Agent API's combination of speech accuracy, tool calling, and flat-rate pricing pays for itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  CRM / lead list (Salesforce, HubSpot, CSV)
       │
       ▼
  dialer.py
       │  compliance_gate()  ← TCPA, DNC, state laws, time windows
       ▼
  Twilio outbound dial
       │  TwiML → open Media Stream
       ▼
  bridge_server.py
       │  Twilio Media Stream ↔ Voice Agent API WebSocket
       ▼
  AssemblyAI Voice Agent API
   ┌──────────────────────────────────┐
   │  STT + Turn detection             │
   │      ↓                            │
   │  LLM with sales prompt + tools    │
   │      ↓                            │
   │  TTS                              │
   └──────────────────────────────────┘
       │
       │  tool calls
       ▼
  - book_meeting    (calendar API)
  - log_disposition (CRM update)
  - honor_dnc       (suppression list)
  - mark_callback   (scheduling)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The Voice Agent API handles the conversation. Your code handles three things outside the conversation: the &lt;strong&gt;dialer&lt;/strong&gt; (who to call, when, at what concurrency), the &lt;strong&gt;compliance gate&lt;/strong&gt; (TCPA, DNC, state consent), and the &lt;strong&gt;tool dispatcher&lt;/strong&gt; (book a meeting, update the CRM, honor a do-not-call request).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use the Voice Agent API for cold-calling
&lt;/h2&gt;

&lt;p&gt;Three things make the Voice Agent API a strong fit for outbound voice agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speech accuracy on phone audio.&lt;/strong&gt; Cold calls capture emails, phone numbers, company names, and job titles — "five one five, nine eight two, four zero zero zero," "J at acme dot io," "director of rev ops." &lt;a href="https://www.assemblyai.com/universal-3-pro-streaming" rel="noopener noreferrer"&gt;Universal-3 Pro Streaming&lt;/a&gt; (the STT layer under the Voice Agent API) delivers 21% fewer alphanumeric errors and 28% better accuracy on consecutive numbers than the previous generation. That's the difference between a booked meeting in your calendar and a typo you never catch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling that maps to the booking moment.&lt;/strong&gt; When a prospect says "yes, Tuesday at 2pm works," the agent has to fire book_meeting immediately — not in the next turn. The Voice Agent API's tool calling is structured-output reliable, which matters when one missed booking is the whole point of the call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Flat $4.50/hour pricing.&lt;/strong&gt; Outbound is bursty by nature. You don't want per-token surprises when the dialer fires 500 simultaneous calls. The Voice Agent API's flat hourly rate covers STT, LLM, TTS, and tool calls all-in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before you start
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;a href="https://www.assemblyai.com/dashboard/signup" rel="noopener noreferrer"&gt;AssemblyAI account&lt;/a&gt; with Voice Agent API access&lt;/li&gt;
&lt;li&gt;A Twilio account with an outbound-capable phone number (and a verified caller ID if your trial requires it)&lt;/li&gt;
&lt;li&gt;A list of leads with consent to be contacted (CSV is fine for testing — production should integrate your real CRM)&lt;/li&gt;
&lt;li&gt;Python 3.11+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Install:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install fastapi uvicorn "websockets&amp;gt;=14" python-dotenv twilio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 1: Build the compliance gate first
&lt;/h2&gt;

&lt;p&gt;Compliance is where AI cold-calling teams burn the most money — TCPA fines run $500–$1,500 per violating call. Build the gate before you write a line of dialer code.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# compliance.py
from datetime import datetime
from zoneinfo import ZoneInfo

DNC_LIST = set(open("suppression.txt").read().split())  # internal DNC

def compliance_gate(lead):
    # 1. Internal suppression (previous DNC requests, unsubscribes)
    if lead["phone"] in DNC_LIST:
        return False, "internal DNC"

    # 2. Federal DNC registry — integrate a real provider in production
    if on_federal_dnc(lead["phone"]):
        return False, "federal DNC"

    # 3. Time window — TCPA bans calls before 8am or after 9pm local
    local_tz = ZoneInfo(lead.get("timezone", "America/New_York"))
    local_hour = datetime.now(local_tz).hour
    if local_hour &amp;lt; 8 or local_hour &amp;gt;= 21:
        return False, f"outside TCPA window ({local_hour}:00 local)"

    # 4. State consent — California, Florida, PA require two-party consent
    if lead.get("state") in {"CA", "FL", "PA", "WA", "IL", "MD", "MT", "NH"}:
        # Agent must disclose recording at the top of the call.
        lead["needs_recording_disclosure"] = True

    return True, "ok"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Build this as a hard gate. No call goes out if any check fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Define the agent's tools
&lt;/h2&gt;

&lt;p&gt;Four tools the agent can call mid-conversation. In production, replace the stubs with real CRM, calendar, and DNC API calls. Each tool needs "type": "function" at the top level — the Voice Agent API validates this on session.update.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# tools.py
TOOLS = [
    {
        "type": "function",
        "name": "book_meeting",
        "description": "Book a meeting on the rep's calendar.",
        "parameters": {
            "type": "object",
            "properties": {
                "lead_id": {"type": "string"},
                "preferred_time": {"type": "string"},
                "email": {"type": "string"},
            },
            "required": ["lead_id", "preferred_time", "email"],
        },
    },
    {
        "type": "function",
        "name": "log_disposition",
        "description": "Record the call outcome in the CRM.",
        "parameters": {
            "type": "object",
            "properties": {
                "lead_id": {"type": "string"},
                "disposition": {
                    "type": "string",
                    "enum": ["booked", "not_now", "not_interested",
                             "wrong_person", "left_voicemail", "dnc"],
                },
                "notes": {"type": "string"},
            },
            "required": ["lead_id", "disposition"],
        },
    },
    {
        "type": "function",
        "name": "honor_dnc",
        "description": "Add the prospect to the do-not-call list immediately.",
        "parameters": {
            "type": "object",
            "properties": {"lead_id": {"type": "string"}, "phone": {"type": 
"string"}},
            "required": ["lead_id", "phone"],
        },
    },
    {
        "type": "function",
        "name": "mark_callback",
        "description": "Schedule a callback at the prospect's preferred time.",
        "parameters": {
            "type": "object",
            "properties": {
                "lead_id": {"type": "string"},
                "preferred_time": {"type": "string"},
            },
            "required": ["lead_id", "preferred_time"],
        },
    },
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The honor_dnc tool is the most important one. If the prospect says anything that sounds like a do-not-call request — "take me off your list," "don't call me again," "remove me" — the agent must call this tool &lt;strong&gt;immediately&lt;/strong&gt; , acknowledge, and end the call politely. No upselling, no "can I just ask one question." TCPA violations on DNC requests are the most expensive mistake a cold-calling agent can make.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Write the system prompt
&lt;/h2&gt;

&lt;p&gt;The system prompt is where the script lives. Four sections every cold-calling prompt needs:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# prompts.py
SYSTEM_PROMPT = """You are an AI sales development representative for Datafold.
You are calling {prospect_name}, {prospect_title} at {prospect_company}.

DISCLOSURE (required):
- Open every call by stating: "Hi {first_name}, this is an AI assistant calling
  on behalf of Datafold."
- This is non-negotiable and legally required in CA, FL, TX, and several other states.

OPENER (15 seconds):
- "I'm reaching out because we help data teams catch breaking changes before
  they hit production. Do you have 30 seconds for me to explain why I'm calling?"
- If yes, continue. If no, ask when's better and call mark_callback.

DISCOVERY (ask only 2 questions, max):
1. "How is your team handling data quality today — manual review, dbt tests,
   or something else?"
2. "How often does a broken model make it to production?"

PITCH (one sentence):
- "Datafold gives data teams CI for their pipelines. Customers like Patreon
  and Faire catch 90% of regressions before they ship."

CTA:
- Offer two specific times in the prospect's time zone.
- Call book_meeting with their email when they accept.

OBJECTION MAP:
- "How did you get my number?" → "You opted in on our website last month."
- "Send me an email" → "Happy to. What's the best address?" (call mark_callback)
- "Not the right person" → "Who handles data quality on your team?"
- "We already use [X]" → "Got it. Most of our customers use [X] alongside Datafold."
- "Not interested" → "No problem. Mind if I ask why?" (then call log_disposition)

DNC HANDLING (highest priority):
- If the prospect says ANYTHING like "take me off your list," "don't call me
  again," "remove me," "stop calling": call honor_dnc IMMEDIATELY, say "Of
  course, you're removed from our list. Sorry to bother you. Have a good day,"
  and end the call. Do NOT try to recover the conversation.

STYLE:
- One or two sentences per turn. Conversational, not formal.
- Listen for tone. If they sound annoyed, wrap up gracefully.
- Never claim to be human. If asked, confirm you're AI.
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That prompt is the entire sales playbook. The Voice Agent API will follow it turn by turn, calling tools when the conversation hits the right moments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Wire up the dialer
&lt;/h2&gt;

&lt;p&gt;The dialer pulls leads from your list, runs each through the compliance gate, and places Twilio calls. It controls concurrency and respects time-of-day rules.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# dialer.py
import asyncio
import csv
import os
from twilio.rest import Client

twilio = Client(os.environ["TWILIO_SID"], os.environ["TWILIO_TOKEN"])

async def dial_lead(lead, callback_url):
    ok, reason = compliance_gate(lead)
    if not ok:
        log_disposition(lead["lead_id"], "skipped", notes=reason)
        return

    call = twilio.calls.create(
        to=lead["phone"],
        from_=os.environ["TWILIO_FROM"],
        url=f"{callback_url}/twilio/voice?lead_id={lead['lead_id']}",
        machine_detection="Enable",  # Hang up on voicemail
        record=True,                  # Required for compliance/QA
    )
    print(f"Dialing {lead['lead_id']}: {call.sid}")

async def run_dialer(leads_csv, max_concurrent=10):
    sem = asyncio.Semaphore(max_concurrent)
    with open(leads_csv) as f:
        leads = list(csv.DictReader(f))

    async def with_limit(lead):
        async with sem:
            await dial_lead(lead, os.environ["PUBLIC_URL"])
            await asyncio.sleep(2)  # pace
    await asyncio.gather(*(with_limit(l) for l in leads))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The machine_detection="Enable" flag tells Twilio to hang up on voicemail rather than wasting a Voice Agent API session on a robot. Important: never leave a recorded message — that's a TCPA violation in most contexts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Bridge Twilio Media Streams to the Voice Agent API
&lt;/h2&gt;

&lt;p&gt;The bridge server is what connects Twilio's outbound call audio to the Voice Agent API WebSocket. Twilio sends G.711 μ-law at 8 kHz; the Voice Agent API accepts it natively when you set the encoding to audio/pcmu.&lt;/p&gt;

&lt;p&gt;A few details that are easy to get wrong on this endpoint specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The auth header is Authorization: Bearer YOUR_KEY — note the &lt;strong&gt;Bearer&lt;/strong&gt; prefix. This is unique to the Voice Agent API; the rest of AssemblyAI accepts the raw key.&lt;/li&gt;
&lt;li&gt;The first WebSocket message is a session.update event with all config nested under a session object. There is no session.start.&lt;/li&gt;
&lt;li&gt;The agent's voice is a named voice from the &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/voices" rel="noopener noreferrer"&gt;Voice Agent API catalog&lt;/a&gt; (ivy, james, sophie, etc.) — not an ElevenLabs voice ID.&lt;/li&gt;
&lt;li&gt;The telephony audio encoding is audio/pcmu (G.711 μ-law). Sample rate is implicit (8 kHz). Don't pass pcm_mulaw or a sample_rate field — the API ignores them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You must wait for session.ready before sending any input.audio frames.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# bridge_server.py
import asyncio, json, os
import websockets
from fastapi import FastAPI, Query, Request, WebSocket
from fastapi.responses import Response

from prompts import SYSTEM_PROMPT
from tools import TOOLS, dispatch_tool

VOICE_AGENT_WS = "wss://agents.assemblyai.com/v1/ws"
ASSEMBLYAI_KEY = os.environ["ASSEMBLYAI_API_KEY"]

app = FastAPI()

@app.post("/twilio/voice")
async def twilio_voice(request: Request, lead_id: str = Query(...)):
    host = request.url.hostname
    twiml = f"""&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;Response&amp;gt;
  &amp;lt;Connect&amp;gt;
    &amp;lt;Stream url="wss://{host}/media-stream?lead_id={lead_id}" /&amp;gt;
  &amp;lt;/Connect&amp;gt;
&amp;lt;/Response&amp;gt;"""
    return Response(content=twiml, media_type="application/xml")

@app.websocket("/media-stream")
async def media_stream(twilio_ws: WebSocket, lead_id: str = Query(...)):
    await twilio_ws.accept()
    lead = LEAD_CACHE[lead_id]
    stream_sid = {"value": None}

    session_config = {
        "type": "session.update",
        "session": {
            "system_prompt": SYSTEM_PROMPT.format(**lead),
            "tools": TOOLS,
            "input": {"format": {"encoding": "audio/pcmu"}},
            "output": {
                "voice": "ivy",
                "format": {"encoding": "audio/pcmu"},
            },
        },
    }

    async with websockets.connect(
        VOICE_AGENT_WS,
        additional_headers={"Authorization": f"Bearer {ASSEMBLYAI_KEY}"},
    ) as va_ws:
        await va_ws.send(json.dumps(session_config))

        ready = asyncio.Event()
        pending_tools = []

        async def pump_twilio_to_va():
            async for raw in twilio_ws.iter_text():
                event = json.loads(raw)
                kind = event.get("event")
                if kind == "start":
                    stream_sid["value"] = event["start"]["streamSid"]
                elif kind == "media":
                    if not ready.is_set():
                        continue
                    # Twilio sends base64 mulaw; AAI accepts it directly.
                    await va_ws.send(json.dumps({
                        "type": "input.audio",
                        "audio": event["media"]["payload"],
                    }))
                elif kind == "stop":
                    return

        async def pump_va_to_twilio():
            async for raw in va_ws:
                event = json.loads(raw)
                t = event.get("type")

                if t == "session.ready":
                    ready.set()

                elif t == "reply.audio" and stream_sid["value"]:
                    await twilio_ws.send_text(json.dumps({
                        "event": "media",
                        "streamSid": stream_sid["value"],
                        "media": {"payload": event["data"]},
                    }))

                elif t == "tool.call":
                    result = dispatch_tool(event["name"], event.get("arguments",
{}))
                    pending_tools.append({"call_id": event["call_id"], "result":
result})

                elif t == "reply.done":
                    if event.get("status") == "interrupted":
                        pending_tools.clear()
                    else:
                        for tool in pending_tools:
                            value = tool["result"]
                            if not isinstance(value, str):
                                value = json.dumps(value)
                            await va_ws.send(json.dumps({
                                "type": "tool.result",
                                "call_id": tool["call_id"],
                                "result": value,
                            }))
                        pending_tools.clear()

                elif t == "transcript.user":
                    print(f"[{lead_id}] User: {event['text']}")
                elif t == "transcript.agent":
                    print(f"[{lead_id}] Agent: {event['text']}")

        await asyncio.gather(pump_twilio_to_va(), pump_va_to_twilio())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Two subtleties worth understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool result timing.&lt;/strong&gt; Per the &lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/tool-calling" rel="noopener noreferrer"&gt;tool calling docs&lt;/a&gt;, accumulate tool results when tool.call fires and send them inside reply.done — not immediately. The agent speaks a transition phrase ("let me check") while the tools run; sending too early causes timing issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio pass-through.&lt;/strong&gt; Twilio's media.payload and AssemblyAI's input.audio.audio (and reply.audio.data) are all base64-encoded μ-law strings, so the bridge moves bytes through without any decode/re-encode step.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Compliance: the part most teams underweight
&lt;/h2&gt;

&lt;p&gt;Three things separate a working AI cold-calling agent from a $50,000 TCPA settlement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scrub against the federal DNC registry&lt;/strong&gt; before every call. Integrate a real provider — DNC.gov has a paid programmatic feed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honor state DNC lists.&lt;/strong&gt; Several states maintain their own — California, Pennsylvania, Indiana, Tennessee. Your scrub vendor should cover these.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-party consent disclosure.&lt;/strong&gt; In CA, FL, PA, WA, and several other states, you must disclose at the top of the call that the call is being recorded and that the caller is AI. Your system prompt's DISCLOSURE section is doing this work — never remove it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build all three as hard gates. If any check fails, the call doesn't go out. Log every disposition with a timestamp so you can prove compliance during an audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring success
&lt;/h2&gt;

&lt;p&gt;Three numbers tell you whether your AI cold-calling agent is working (see our broader &lt;a href="https://www.assemblyai.com/blog/ai-voice-agents" rel="noopener noreferrer"&gt;AI voice agents guide&lt;/a&gt; for context on conversion metrics across use cases):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection rate&lt;/strong&gt; : percentage of calls that reach a live human. Healthy: 30–50% with a local-presence dialer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation rate&lt;/strong&gt; : percentage of connected calls that last more than 30 seconds. Healthy: 25–40%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Book rate&lt;/strong&gt; : percentage of conversations that end in a booked meeting. Healthy: 5–15% for warm/intent leads, 1–3% for cold lists.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read every transcript for the first 500 calls. You'll catch prompt failures, silently wrong transcriptions on company names, and tool-call timing issues that you'd never notice listening to the audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  The complete repository
&lt;/h2&gt;

&lt;p&gt;Fork the runnable repo at &lt;a href="https://github.com/kelsey-aai/cold-calling-voice-agent-api" rel="noopener noreferrer"&gt;github.com/kelsey-aai/cold-calling-voice-agent-api&lt;/a&gt;. It includes the dialer, the compliance gate, the bridge server, the tool dispatcher, the system prompt, and a sample leads.csv. Around 400 lines of Python total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I create an AI cold-calling agent with the Voice Agent API?
&lt;/h3&gt;

&lt;p&gt;To create an AI cold-calling agent with the AssemblyAI Voice Agent API, build four pieces: a dialer that pulls leads from your CRM and places outbound Twilio calls, a compliance gate that scrubs against DNC registries and TCPA time windows, a bridge server that connects Twilio Media Streams to the Voice Agent API WebSocket at wss://agents.assemblyai.com/v1/ws, and a tool dispatcher with book_meeting, log_disposition, honor_dnc, and mark_callback. Define a sales-specific system prompt with disclosure, opener, discovery, pitch, CTA, objection map, and DNC handling rules. The Voice Agent API handles the conversation — your code handles dialing, compliance, and integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is AI cold-calling legal?
&lt;/h3&gt;

&lt;p&gt;AI cold-calling is legal in most U.S. jurisdictions if you comply with TCPA (federal), state-level consent laws, and disclose that the caller is AI. Specifically: scrub against the federal DNC registry before every call, respect TCPA calling windows (no calls before 8am or after 9pm in the recipient's local time), get two-party consent for recording in states that require it (CA, FL, PA, WA, and others), and disclose AI identity at the top of the call. The cost of getting this wrong is steep — $500–$1,500 per violating call. Build the compliance gate as a hard barrier and consult legal counsel before scaling.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to run an AI cold-calling agent?
&lt;/h3&gt;

&lt;p&gt;On the AssemblyAI Voice Agent API, you pay $4.50/hour of session time — STT, LLM, TTS, turn detection, and tool calls included. Twilio outbound voice adds a few cents per minute. A typical 90-second qualification call costs roughly $0.12–$0.18 all-in. At the typical 30–50% connection rate, the cost per actual conversation is closer to $0.30. Compare against a human SDR at fully-loaded $70–100/hour and the unit economics generally favor the agent for high-volume top-of-funnel motions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What speech-to-text accuracy do I need for cold-calling?
&lt;/h3&gt;

&lt;p&gt;The accuracy that matters for cold-calling is &lt;strong&gt;alphanumeric accuracy on phone audio&lt;/strong&gt; — capturing emails, phone numbers, company names, and job titles correctly the first time. Universal-3 Pro Streaming, which is the STT layer under the Voice Agent API, delivers 21% fewer alphanumeric errors and 28% better accuracy on consecutive numbers than the previous generation. That accuracy is the difference between booking a meeting in the rep's calendar (&lt;a href="mailto:alex@acme.io"&gt;alex@acme.io&lt;/a&gt;) and a typo your CRM never catches (&lt;a href="mailto:alec@akme.io"&gt;alec@akme.io&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Can the Voice Agent API place outbound calls directly?
&lt;/h3&gt;

&lt;p&gt;Today, you use Twilio (or another telephony provider) for the outbound dial, and bridge the resulting Media Stream into the Voice Agent API WebSocket. The Voice Agent API handles the conversation; Twilio handles the PSTN connection and the audio transport. Native outbound dialing through the Voice Agent API is on the roadmap — the bridge pattern in this tutorial is the standard path today, and the code in the companion repo handles it cleanly in about 100 lines.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>telephony</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Multi-language voice agents: Building agents that speak to anyone</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:25:44 +0000</pubDate>
      <link>https://dev.to/martschweiger/multi-language-voice-agents-building-agents-that-speak-to-anyone-40fk</link>
      <guid>https://dev.to/martschweiger/multi-language-voice-agents-building-agents-that-speak-to-anyone-40fk</guid>
      <description>&lt;p&gt;Building multilingual &lt;a href="https://www.assemblyai.com/blog/ai-voice-agents" rel="noopener noreferrer"&gt;voice agents&lt;/a&gt; requires coordinating four critical components—speech-to-text, language models, text-to-speech, and orchestration software—all working together within strict timing constraints to maintain natural conversation flow. The challenge isn't just connecting these pieces; each component must handle multiple languages, accents, and real-time language switching while keeping responses under one second.&lt;/p&gt;

&lt;p&gt;This guide walks you through the &lt;a href="https://www.assemblyai.com/blog/the-voice-ai-stack-for-building-agents" rel="noopener noreferrer"&gt;technical architecture&lt;/a&gt;, performance requirements, and implementation considerations for production multilingual voice agents. You'll learn how to handle automatic language detection, manage code-switching scenarios where users mix languages mid-sentence, and build systems that maintain conversation context across language transitions—essential knowledge for creating voice experiences that truly work for global audiences.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the core components of a multilingual voice agent?
&lt;/h2&gt;

&lt;p&gt;A multilingual voice agent is an AI system that listens to speech in multiple languages, understands what you're saying, and responds back in natural conversation. This means it can handle a customer service call where someone starts speaking Spanish, switches to English for technical terms, then back to Spanish—all in real-time.&lt;/p&gt;

&lt;p&gt;You need four components working together: speech-to-text converts your voice to text, language models understand and generate responses, text-to-speech converts responses back to speech, and orchestration software coordinates everything within milliseconds.&lt;/p&gt;

&lt;p&gt;The challenge isn't just connecting these pieces. Each component must handle multiple languages while keeping the conversation feeling natural and fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speech-to-text for multilingual support
&lt;/h3&gt;

&lt;p&gt;Speech-to-text (STT) is the foundation that converts spoken words into text that AI models can understand. This means turning "¿Puedes ayudarme?" into text that the system can process, regardless of accent or speaking speed.&lt;/p&gt;

&lt;p&gt;You have two main processing options: &lt;a href="https://www.assemblyai.com/blog/introducing-multilingual-universal-streaming" rel="noopener noreferrer"&gt;streaming transcription&lt;/a&gt; that processes speech as you speak, and batch processing that waits for complete sentences. Voice agents need streaming transcription because users expect responses before they finish talking.&lt;/p&gt;

&lt;p&gt;Here's what makes multilingual STT challenging:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Language detection:&lt;/strong&gt; The system must identify which language you're speaking within seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accent handling:&lt;/strong&gt; Spanish from Mexico sounds different from Spanish from Argentina&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code-switching:&lt;/strong&gt; When you mix languages mid-sentence like "Can you check mi cuenta"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your speech-to-text gets "schedule appointment" wrong as "cancel appointment," even perfect AI models downstream can't fix that error.&lt;/p&gt;

&lt;h3&gt;
  
  
  Language models and multilingual reasoning
&lt;/h3&gt;

&lt;p&gt;Language models take the transcribed text and figure out what you actually want, then generate appropriate responses. Large Language Models (LLMs) handle multiple languages through two approaches: translating everything to one language internally, or processing multiple languages directly.&lt;/p&gt;

&lt;p&gt;Direct multilingual processing works better because it keeps cultural context intact. "How can I help you?" and "¿En qué puedo ayudarle?" aren't just translations—they carry different levels of formality that matter for customer experience.&lt;/p&gt;

&lt;p&gt;Your language model also needs to remember context when you switch languages. If you start in Spanish, use English technical terms, then return to Spanish, the model must follow along without losing track of what you're trying to accomplish.&lt;/p&gt;

&lt;h3&gt;
  
  
  Text-to-speech synthesis across languages
&lt;/h3&gt;

&lt;p&gt;Text-to-speech (TTS) turns the AI's written response back into natural speech. This isn't just pronunciation—it's matching the rhythm, emotion, and cultural tone appropriate for each language.&lt;/p&gt;

&lt;p&gt;Modern TTS systems offer multiple voice options per language:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Demographics:&lt;/strong&gt; Different ages, genders, and speaking styles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional accents:&lt;/strong&gt; British vs American English, European vs Latin American Spanish&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tone matching:&lt;/strong&gt; Professional for banking, casual for shopping, empathetic for support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some languages create unique challenges. Mandarin uses pitch to change word meaning, while Arabic connects words in complex ways that affect pronunciation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-time orchestration and coordination
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.assemblyai.com/blog/orchestration-tools-ai-voice-agents" rel="noopener noreferrer"&gt;Orchestration software&lt;/a&gt; acts like air traffic control for your voice agent. This means managing timing between components, handling interruptions when users start speaking again, and keeping conversation state—all while staying under one second response time.&lt;/p&gt;

&lt;p&gt;Think of orchestration as the conductor making sure your voice agent doesn't talk over users, doesn't lose context, and recovers gracefully from errors.&lt;/p&gt;

&lt;p&gt;Key responsibilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline management:&lt;/strong&gt; Moving data smoothly between STT, LLM, and TTS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interruption handling:&lt;/strong&gt; Stopping playback when users interrupt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State tracking:&lt;/strong&gt; Remembering conversation history and language preferences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error recovery:&lt;/strong&gt; Handling network issues without breaking the conversation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What are the performance requirements for multilingual voice agents?
&lt;/h2&gt;

&lt;p&gt;Users expect voice agents to respond within one second of finishing their sentence. Anything longer makes conversations feel awkward and unnatural.&lt;/p&gt;

&lt;p&gt;Here's where that crucial second gets spent:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Time used&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speech-to-text&lt;/td&gt;
&lt;td&gt;200–400ms&lt;/td&gt;
&lt;td&gt;Converting your speech to text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM processing&lt;/td&gt;
&lt;td&gt;100–300ms&lt;/td&gt;
&lt;td&gt;Understanding and generating response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-to-speech&lt;/td&gt;
&lt;td&gt;300–600ms&lt;/td&gt;
&lt;td&gt;Converting response to speech&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network overhead&lt;/td&gt;
&lt;td&gt;50–100ms&lt;/td&gt;
&lt;td&gt;Data moving between systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total target&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Under 1000ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Must stay under one second&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Multilingual support makes these targets harder to hit. Language detection adds time, some languages process slower than others, and translation (when needed) creates additional delays.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency requirements for conversational quality
&lt;/h3&gt;

&lt;p&gt;The one-second rule comes from natural human conversation patterns. People typically pause 200–500ms before responding, so a voice agent responding in 800ms feels natural while 1500ms creates awkward silence.&lt;/p&gt;

&lt;p&gt;But perceived speed matters more than actual speed. If your agent starts responding quickly—even with "Let me check that for you"—users perceive faster service than an agent that stays silent for 800ms then gives a complete answer.&lt;/p&gt;

&lt;p&gt;Streaming helps here. Instead of waiting for complete responses, you can start speaking as soon as the first few words are ready. This cuts perceived latency by 30–40% while keeping the same actual processing time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accuracy requirements across languages and accents
&lt;/h3&gt;

&lt;p&gt;You need at least 90% word accuracy across all supported languages for reliable voice agents. The challenge? That 90% must work for English speakers from Boston, Spanish speakers from Mexico, and Mandarin speakers from Beijing—not just clear, neutral accents.&lt;/p&gt;

&lt;p&gt;Errors compound through your pipeline. If speech-to-text achieves 85% accuracy and your language model correctly interprets 90% of that text, you're down to 76% end-to-end accuracy. That's barely better than guessing for complete interactions.&lt;/p&gt;

&lt;p&gt;Critical accuracy areas include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Names and addresses:&lt;/strong&gt; Personal information must be captured exactly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Numbers:&lt;/strong&gt; Account numbers, phone numbers, and dollar amounts can't have errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent preservation:&lt;/strong&gt; The core request must survive even if some words are wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;High-quality speech-to-text models like AssemblyAI's Universal-2 model support 99 languages with industry-leading accuracy, creating a reliable foundation when errors can't be tolerated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key implementation considerations
&lt;/h2&gt;

&lt;p&gt;Moving from prototype to production means solving practical challenges that don't show up in demos. These details often determine whether your voice agent delights users or frustrates them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Language detection and real-time switching
&lt;/h3&gt;

&lt;p&gt;Automatic language detection sounds straightforward—identify the language and proceed. Real conversations are messier. Users greet in one language then switch to another, use technical English terms while speaking Spanish, or have accents that confuse detection.&lt;/p&gt;

&lt;p&gt;Most successful systems use a hybrid approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial detection:&lt;/strong&gt; Identify language from the first 2–3 seconds of speech&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence scoring:&lt;/strong&gt; Avoid false switches when detection isn't certain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context clues:&lt;/strong&gt; Use user profiles or phone number regions as hints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trickiest scenario? &lt;a href="https://www.assemblyai.com/blog/real-time-transcription-code-switches-multilingual-speakers" rel="noopener noreferrer"&gt;Code-switching&lt;/a&gt; where users naturally mix languages mid-sentence. "Can you check mi cuenta, I think there's a problem" requires handling English and Spanish simultaneously without breaking conversation flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing multilingual voice agent accuracy
&lt;/h3&gt;

&lt;p&gt;Testing multilingual voice agents requires systematic validation across language combinations, not just individual languages. A system perfect in English and Spanish separately might fail when users switch between them.&lt;/p&gt;

&lt;p&gt;Start with single-language testing using native speakers with various accents and natural speaking styles. Record actual conversations, not scripted readings—natural speech includes hesitations, corrections, and informal phrases that scripts miss.&lt;/p&gt;

&lt;p&gt;Then test language transitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mixed conversations:&lt;/strong&gt; Spanish speakers using English product names&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical explanations:&lt;/strong&gt; Users switching languages to explain complex issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cultural context:&lt;/strong&gt; Different communication styles across cultures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Essential testing scenarios include accent variations across regions, background noise from realistic environments, different speaking speeds, and code-switching patterns common in your user base.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common use cases for multilingual voice agents
&lt;/h2&gt;

&lt;p&gt;Multilingual voice agents excel where businesses need to serve diverse populations efficiently. Here are three high-impact applications you're likely to encounter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer support automation
&lt;/h3&gt;

&lt;p&gt;Customer support represents the biggest deployment of multilingual voice agents today. These systems handle routine requests—password resets, balance checks, order tracking—in dozens of languages without requiring multilingual human agents for every shift.&lt;/p&gt;

&lt;p&gt;Success depends on seamless escalation to humans. When the voice agent can't resolve your issue, it must transfer you to a human agent while preserving conversation context and language preference. Nobody wants to repeat their problem in a different language.&lt;/p&gt;

&lt;p&gt;Integration with existing systems matters here. The voice agent needs access to your account information and ability to update records in real-time. This means a Spanish-speaking customer can check order status, update delivery addresses, and receive confirmation without waiting for a Spanish-speaking human agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice assistants for global applications
&lt;/h3&gt;

&lt;p&gt;Consumer apps use multilingual voice assistants to reach global markets. Think banking apps that let you check balances, transfer money, or report lost cards through voice commands in your preferred language.&lt;/p&gt;

&lt;p&gt;These applications need cultural adaptation beyond translation. A voice assistant in Japan should understand indirect communication styles, while one in New York can be more direct. The same request gets phrased completely differently based on cultural expectations.&lt;/p&gt;

&lt;p&gt;Privacy becomes critical with sensitive financial or personal information. Your voice agent must handle this data across different regulatory environments while maintaining consistent service quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Contact center automation
&lt;/h3&gt;

&lt;p&gt;Enterprise &lt;a href="https://www.assemblyai.com/solutions/contact-centers" rel="noopener noreferrer"&gt;contact centers&lt;/a&gt; deploy multilingual voice agents to handle peak call volumes and provide 24/7 coverage. Instead of staffing overnight shifts with multilingual agents, you deploy voice agents that handle routine calls in any supported language.&lt;/p&gt;

&lt;p&gt;The business case is clear: one multilingual voice agent replaces dozens of language-specific phone menu systems while providing better service. Callers get natural conversation instead of pressing buttons through complex menus.&lt;/p&gt;

&lt;p&gt;Compliance considerations vary by industry and caller location. Your voice agent must adapt its behavior for call recording requirements, data retention rules, and disclosure obligations based on applicable regulations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final words
&lt;/h2&gt;

&lt;p&gt;Building reliable multilingual voice agents requires coordinating speech-to-text, language models, text-to-speech, and orchestration—all working within tight timing constraints that keep conversations natural. Your foundation starts with accurate speech recognition, because transcription errors cascade through every step, turning helpful interactions into frustrated customers.&lt;/p&gt;

&lt;p&gt;The implementation challenges we've covered show why thoughtful architecture matters more than raw technology. With accurate transcription as your starting point, you can build voice agents that truly communicate with anyone, anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What components do I need to build a multilingual voice agent?
&lt;/h3&gt;

&lt;p&gt;You need four integrated components: speech-to-text for converting speech to text, language models for understanding and generating responses, text-to-speech for voice synthesis, and orchestration software to coordinate everything in real-time within one second.&lt;/p&gt;

&lt;h3&gt;
  
  
  How quickly do multilingual voice agents need to respond?
&lt;/h3&gt;

&lt;p&gt;Target under 1000ms end-to-end latency for natural conversation flow. This includes 200–400ms for speech-to-text, 100–300ms for language model processing, and 300–600ms for text-to-speech synthesis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can voice agents detect language automatically during conversations?
&lt;/h3&gt;

&lt;p&gt;Yes, modern speech-to-text models detect language within the first 2–3 seconds of speech and can handle language switches mid-conversation. The system maintains conversation context across language changes without requiring users to specify their language preference.&lt;/p&gt;

&lt;h3&gt;
  
  
  What speech accuracy do I need for multilingual voice agents?
&lt;/h3&gt;

&lt;p&gt;Aim for at least 90% word accuracy across all supported languages and accents. Lower accuracy causes errors to compound through the pipeline, reducing end-to-end reliability below acceptable thresholds for production deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test multilingual voice agent performance before launch?
&lt;/h3&gt;

&lt;p&gt;Test systematically with native speakers across regional accents, speaking speeds, and background noise conditions. Validate both single-language accuracy and language-switching scenarios, measuring word error rates, intent recognition, and task completion rates.&lt;/p&gt;

&lt;h3&gt;
  
  
  What infrastructure supports multilingual voice agents at scale?
&lt;/h3&gt;

&lt;p&gt;You need &lt;a href="https://www.assemblyai.com/blog/choosing-a-stt-api-for-voice-agents" rel="noopener noreferrer"&gt;streaming speech-to-text APIs&lt;/a&gt;, multilingual language model services, text-to-speech capabilities, and orchestration platforms that handle concurrent conversations. The infrastructure must scale horizontally without degrading response times.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do multilingual voice agents handle mixed-language conversations?
&lt;/h3&gt;

&lt;p&gt;Advanced speech-to-text models can transcribe code-switching where speakers mix languages mid-sentence. Success depends on training data that includes natural bilingual speech patterns and systems designed to maintain context across language transitions.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>i18n</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Build a voice agent for telehealth triage</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 19 May 2026 19:25:34 +0000</pubDate>
      <link>https://dev.to/martschweiger/build-a-voice-agent-for-telehealth-triage-33j</link>
      <guid>https://dev.to/martschweiger/build-a-voice-agent-for-telehealth-triage-33j</guid>
      <description>&lt;h1&gt;
  
  
  Build a voice agent for telehealth triage
&lt;/h1&gt;

&lt;p&gt;A telehealth triage voice agent answers a patient's call, captures symptoms in their own words, scores severity against a defined protocol, and routes the patient to the right care level — emergency, urgent care, virtual visit, or self-care guidance. It doesn't diagnose, doesn't prescribe, and doesn't decide; it triages, in the same way an experienced nurse on a phone line would, then hands off with structured notes attached.&lt;/p&gt;

&lt;p&gt;This tutorial walks through building one on the AssemblyAI Voice Agent API with a clinical-specialty prompt and the architectural controls HIPAA requires — encrypted audio, BAA-backed deployment, PII redaction, and audit logging. We'll cover the triage protocol, symptom capture, severity scoring with tool calls, and the handoff that gets the patient to the right next step. The companion repository is linked at the end.&lt;/p&gt;

&lt;p&gt;This is a triage agent, not a clinical decision-maker. Everything in this guide assumes a human clinician makes the final call — the voice agent's job is to capture the data, run the protocol, and route the patient.&lt;/p&gt;

&lt;h2&gt;
  
  
  What telehealth triage looks like as a voice agent
&lt;/h2&gt;

&lt;p&gt;A triage call follows a predictable structure. The agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Greets the patient and confirms identity (name, date of birth)&lt;/li&gt;
&lt;li&gt;Asks for the chief complaint in the patient's own words&lt;/li&gt;
&lt;li&gt;Walks through a symptom protocol (when did it start, severity, associated symptoms)&lt;/li&gt;
&lt;li&gt;Captures red-flag symptoms that escalate severity&lt;/li&gt;
&lt;li&gt;Calls a score_severity tool that runs the captured symptoms through a triage algorithm&lt;/li&gt;
&lt;li&gt;Routes the patient — ER (911), urgent care, scheduled visit, or self-care&lt;/li&gt;
&lt;li&gt;Logs structured notes to the EHR for the receiving clinician&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pattern works for telehealth voice agents because it has a defined protocol, concrete success criteria (was the patient routed correctly?), and a clear failure mode (escalate to a human nurse if anything is unclear). It's not asking the voice agent to diagnose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use the Voice Agent API for telehealth triage
&lt;/h2&gt;

&lt;p&gt;Three properties matter specifically for healthcare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speech accuracy on medical terminology.&lt;/strong&gt; Patients say "metoprolol" and "lisinopril" and "I have a history of A-fib." A model that mishears any of these creates a downstream safety issue. &lt;a href="https://www.assemblyai.com/universal-3-pro-streaming" rel="noopener noreferrer"&gt;Universal-3 Pro Streaming&lt;/a&gt;, the STT layer under the &lt;a href="https://www.assemblyai.com/products/voice-agent-api" rel="noopener noreferrer"&gt;Voice Agent API&lt;/a&gt;, performs strongly on medical conversations; for post-call note generation and billing-grade documentation, AssemblyAI's &lt;a href="https://www.assemblyai.com/docs/pre-recorded-audio/medical-mode" rel="noopener noreferrer"&gt;Medical Mode&lt;/a&gt; async API is purpose-built for clinical terminology.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BAA-backed deployment for processing PHI.&lt;/strong&gt; AssemblyAI enables covered entities and their business associates subject to HIPAA to use AssemblyAI services to process protected health information (PHI), and offers a Business Associate Addendum (BAA) required under HIPAA. Without a BAA you legally cannot route PHI through the service, regardless of how good the model is. &lt;a href="https://www.assemblyai.com/contact/sales" rel="noopener noreferrer"&gt;Contact our sales team&lt;/a&gt; to execute a BAA.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling for protocolized triage.&lt;/strong&gt; The triage protocol lives in tool calls — score_severity, route_to_care_level, schedule_callback, escalate_to_nurse. The agent calls tools rather than generating free-form clinical guidance, which is what keeps the system inside the bounds of triage and out of the bounds of diagnosis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Patient call (PSTN via Twilio, or telehealth app)
        │
        ▼
  Voice Agent API (one WebSocket)
   ┌────────────────────────────────────┐
   │  Universal-3 Pro Streaming (STT)    │
   │     ↓                               │
   │  LLM with triage protocol           │
   │     ↓                               │
   │  TTS                                │
   └────────────────────────────────────┘
        │
        │  tool calls
        ▼
   Tool dispatcher
    - capture_symptom         (structured)
    - score_severity          (runs triage algorithm)
    - route_to_care_level     (ER / urgent / scheduled / self-care)
    - escalate_to_nurse       (live RN handoff)
    - log_to_ehr              (encrypted PHI write)

  (post-call)
        │
        ▼
   Async Medical Mode API
   - billing-grade SOAP note
   - ICD-10 candidate codes
   - quality review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The Voice Agent API runs the patient-facing conversation. The protocol logic lives in your tools. Post-call documentation goes through the async Medical Mode API for clinical-quality notes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before you start
&lt;/h2&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;a href="https://www.assemblyai.com/dashboard/signup" rel="noopener noreferrer"&gt;AssemblyAI account&lt;/a&gt; — for healthcare deployments, &lt;a href="https://www.assemblyai.com/contact/sales" rel="noopener noreferrer"&gt;contact our sales team&lt;/a&gt; to execute a BAA before processing any PHI&lt;/li&gt;
&lt;li&gt;A defined triage protocol from your clinical team. This guide uses a simplified version for illustration; your real protocol should come from licensed clinicians and be reviewed against ESI (Emergency Severity Index) or your organization's equivalent&lt;/li&gt;
&lt;li&gt;An EHR integration target (Epic, Cerner, athena, custom)&lt;/li&gt;
&lt;li&gt;A licensed RN available for live escalations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Don't deploy a telehealth triage agent into production without (1) a BAA executed with AssemblyAI, (2) clinical review of every prompt and tool, (3) an always-available escalation path to a human nurse, and (4) IRB or compliance review per your organization's policies. The agent in this tutorial is a working starter — not a production-ready clinical system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Define the triage protocol in the system prompt
&lt;/h2&gt;

&lt;p&gt;The system prompt is where the protocol lives. Three rules that make the difference between a triage agent and a chatbot:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SYSTEM_PROMPT = """You are an AI telehealth triage assistant for ACME Health.

You are NOT a doctor. You do NOT diagnose. You do NOT prescribe. Your job is
to capture symptoms, run a triage protocol, and route the patient to the
right care level. A licensed clinician makes the final decision.

CALL FLOW:
1. Greet the patient. Confirm name and date of birth.
2. Ask the chief complaint in their own words. Capture it verbatim using
   capture_symptom(category='chief_complaint', detail=...).
3. Walk through the OPQRST protocol:
   - Onset (when did it start?)
   - Provocation/Palliation (what makes it worse or better?)
   - Quality (sharp, dull, throbbing?)
   - Region/Radiation (where, does it spread?)
   - Severity (1–10)
   - Timing (constant, intermittent?)
   Call capture_symptom for each.
4. Screen for red flags relevant to the complaint:
   - Chest pain / shortness of breath / arm pain → cardiac red flags
   - Severe headache / vision changes / weakness → stroke red flags
   - High fever / stiff neck → meningitis red flags
   - Severe abdominal pain / blood → surgical red flags
   - Suicidal ideation → mental health red flags
   If ANY red flag is present, call escalate_to_nurse IMMEDIATELY and
   say: "These symptoms need immediate attention. I'm connecting you to
   our on-call nurse right now."
5. Call score_severity with all captured symptoms.
6. Based on the result, call route_to_care_level with the recommendation.

CRITICAL RULES:
- Never tell the patient what they have. Use "your symptoms suggest..." not
  "you have...".
- Never recommend medication or dosage changes.
- If the patient asks medical questions outside triage, say:
  "I can't answer that. Let me connect you with our nurse line."
  and call escalate_to_nurse.
- If you're uncertain at any point, escalate.

STYLE:
- Speak calmly. One or two sentences per turn.
- Use plain language, not medical jargon. "Pressure in your chest" not
  "thoracic discomfort".
- Confirm critical details back: "You said the pain started Tuesday — is
  that right?"
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The escalate-on-uncertainty rule is the most important. A triage agent that confidently routes a heart attack to "schedule a visit" is dangerous. One that escalates to a human nurse the moment red flags appear is safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Define the tools
&lt;/h2&gt;

&lt;p&gt;Each tool needs "type": "function" at the top level — the Voice Agent API validates this on session.update.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOLS = [
    {
        "type": "function",
        "name": "capture_symptom",
        "description": "Record a symptom or piece of OPQRST data.",
        "parameters": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                    "enum": ["chief_complaint", "onset", "provocation",
                             "quality", "region", "severity",
                             "timing", "red_flag"],
                },
                "detail": {"type": "string"},
            },
            "required": ["category", "detail"],
        },
    },
    {
        "type": "function",
        "name": "score_severity",
        "description": (
            "Score the patient's severity based on captured symptoms. "
            "Returns an ESI-style level (1=critical, 5=non-urgent)."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "symptoms": {"type": "array", "items": {"type": "string"}},
            },
            "required": ["symptoms"],
        },
    },
    {
        "type": "function",
        "name": "route_to_care_level",
        "description": "Route the patient to the appropriate care level.",
        "parameters": {
            "type": "object",
            "properties": {
                "level": {
                    "type": "string",
                    "enum": ["emergency", "urgent_care", "scheduled_visit",
                             "self_care"],
                },
                "reason": {"type": "string"},
            },
            "required": ["level", "reason"],
        },
    },
    {
        "type": "function",
        "name": "escalate_to_nurse",
        "description": (
            "Connect the patient to a live registered nurse immediately. "
            "Call this for any red-flag symptom or any time the protocol "
            "is unclear."
        ),
        "parameters": {
            "type": "object",
            "properties": {"reason": {"type": "string"}},
            "required": ["reason"],
        },
    },
    {
        "type": "function",
        "name": "log_to_ehr",
        "description": "Write structured triage notes to the EHR.",
        "parameters": {
            "type": "object",
            "properties": {
                "patient_id": {"type": "string"},
                "symptoms": {"type": "object"},
                "severity": {"type": "integer"},
                "disposition": {"type": "string"},
            },
            "required": ["patient_id", "symptoms", "severity", "disposition"],
        },
    },
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The score_severity tool is where your clinical algorithm lives. In the repo, it's a simple rule-based scorer for demonstration; in production, this is the function your clinical team reviews and signs off on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Severity scoring logic
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RED_FLAG_KEYWORDS = {
    "cardiac": ["chest pain", "pressure", "tight", "shortness of breath",
                "arm pain", "jaw pain", "sweating"],
    "stroke":  ["face drooping", "weakness", "slurred speech", "vision",
                "confusion"],
    "surgical":["severe abdominal", "blood in stool", "vomiting blood",
                "rigid abdomen"],
    "sepsis":  ["high fever", "stiff neck", "altered mental"],
    "mental":  ["suicidal", "self-harm", "kill myself"],
}

def score_severity(symptoms):
    text = " ".join(s.lower() for s in symptoms)
    for category, keywords in RED_FLAG_KEYWORDS.items():
        if any(kw in text for kw in keywords):
            return {"level": 1, "category": category, "route": "emergency"}
    if any(kw in text for kw in ["severe pain", "9/10", "10/10", "can't breathe"]):
        return {"level": 2, "route": "emergency"}
    if any(kw in text for kw in ["moderate pain", "7/10", "8/10", "fever 101", "fever 102"]):
        return {"level": 3, "route": "urgent_care"}
    if any(kw in text for kw in ["mild pain", "5/10", "6/10"]):
        return {"level": 4, "route": "scheduled_visit"}
    return {"level": 5, "route": "self_care"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is illustrative only. Real telehealth triage uses validated scoring (ESI, AMTS, organization-specific protocols) developed and reviewed by clinical staff. Don't ship anything to production without that review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Audit logging and PHI controls
&lt;/h2&gt;

&lt;p&gt;Every transcript event from the Voice Agent API is PHI. Treat it as such:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Encrypt at rest.&lt;/strong&gt; Use envelope encryption (KMS) for any persisted audio or transcripts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encrypt in transit.&lt;/strong&gt; The Voice Agent API WebSocket is TLS — no additional work there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit log every access.&lt;/strong&gt; Who read which call, when, from where.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apply PII redaction to anything that leaves your VPC.&lt;/strong&gt; Phone numbers, addresses, SSNs, names should be redacted before transcripts hit analytics warehouses or training pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set retention policies.&lt;/strong&gt; Most healthcare orgs retain triage call transcripts for 7 years; configure your storage accordingly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Voice Agent API's events (transcript.user, transcript.agent, tool.call, tool.result) are exactly what you'd write to the EHR. Build the log_to_ehr tool to flush a structured record at the end of every call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Test against representative cases
&lt;/h2&gt;

&lt;p&gt;Before any patient calls the agent, run it against a clinical test suite:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Expected route&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"I have crushing chest pain and my left arm is numb"&lt;/td&gt;
&lt;td&gt;emergency (cardiac red flag)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I have a fever of 102 and a stiff neck"&lt;/td&gt;
&lt;td&gt;emergency (sepsis red flag)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I sprained my ankle yesterday, pain is 5 out of 10"&lt;/td&gt;
&lt;td&gt;urgent_care or scheduled_visit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I have a runny nose and slight cough for two days"&lt;/td&gt;
&lt;td&gt;self_care&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I'm having thoughts of hurting myself"&lt;/td&gt;
&lt;td&gt;escalate_to_nurse (mental health red flag)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Run at least 200 cases through the agent with clinician review of every disposition. The cost of a missed escalation is a clinical safety event; the cost of an over-escalation is overuse of the nurse line. Tune until both are within your organization's tolerance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Post-call documentation with Medical Mode
&lt;/h2&gt;

&lt;p&gt;After the call, run the captured audio through AssemblyAI's &lt;a href="https://www.assemblyai.com/docs/pre-recorded-audio/medical-mode" rel="noopener noreferrer"&gt;Medical Mode&lt;/a&gt; async API for billing-grade clinical documentation. Enable it with the domain="medical-v1" parameter on a standard pre-recorded transcript request:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],
    domain="medical-v1",       # enables Medical Mode
    speaker_labels=True,        # provider/patient separation
    keyterms_prompt=["Lispro", "Humalog", "metoprolol"],
)
transcript = aai.Transcriber().transcribe(call_audio_url, config)
# Then send transcript.text through the LLM Gateway for SOAP generation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Medical Mode is purpose-built for medication names, procedures, conditions, and dosages — it's billed as a separate add-on (see &lt;a href="https://www.assemblyai.com/pricing" rel="noopener noreferrer"&gt;pricing&lt;/a&gt;). Combine it with &lt;a href="https://www.assemblyai.com/docs/guides/soap-note-generation" rel="noopener noreferrer"&gt;LLM Gateway SOAP generation&lt;/a&gt; to produce structured chart entries from the transcript.&lt;/p&gt;

&lt;h2&gt;
  
  
  The complete repository
&lt;/h2&gt;

&lt;p&gt;Fork the runnable repo at &lt;a href="https://github.com/kelsey-aai/telehealth-triage-voice-agent" rel="noopener noreferrer"&gt;github.com/kelsey-aai/telehealth-triage-voice-agent&lt;/a&gt;. It includes the triage agent loop, the OPQRST protocol prompt, the red-flag scorer, the routing logic, and a sample EHR adapter stub. Around 350 lines of Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I build a voice agent for telehealth triage?
&lt;/h3&gt;

&lt;p&gt;To build a voice agent for telehealth triage, open an AssemblyAI Voice Agent API session with a clinical-specialty system prompt that walks the patient through an OPQRST symptom protocol, screens for red flags, and routes via tool calls. The agent should never diagnose or prescribe — it captures symptoms with capture_symptom, scores severity with score_severity (your clinical algorithm), routes via route_to_care_level, and escalates to a live RN through escalate_to_nurse whenever red flags appear or the protocol is unclear. All of this runs inside one WebSocket at wss://agents.assemblyai.com/v1/ws, with audit logging, encrypted transcripts, and a BAA executed with AssemblyAI before any PHI is processed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use the Voice Agent API for healthcare workflows subject to HIPAA?
&lt;/h3&gt;

&lt;p&gt;AssemblyAI is considered a business associate under HIPAA and offers a standard Business Associate Addendum (BAA) for customers processing PHI. Before processing any PHI you need to execute the BAA with AssemblyAI — &lt;a href="https://www.assemblyai.com/contact/sales" rel="noopener noreferrer"&gt;contact our sales team&lt;/a&gt;. The Voice Agent API uses TLS for transit, supports PII redaction, and provides per-session audit logs. Your application also needs its own architecture aligned to HIPAA — encryption at rest, role-based access controls, audit logging, retention policies — to meet your obligations end-to-end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can a telehealth voice agent diagnose patients?
&lt;/h3&gt;

&lt;p&gt;No. A telehealth triage voice agent should never diagnose, prescribe, or provide clinical decisions. Its role is to capture symptoms, run a defined triage protocol developed by licensed clinicians, score severity, and route the patient to the appropriate care level — emergency, urgent care, scheduled visit, or self-care. A human clinician (nurse, physician, NP) makes the final clinical decision. The system prompt should explicitly forbid diagnostic statements ("you have..." — never; "your symptoms suggest..." — only when leading into a routing decision).&lt;/p&gt;

&lt;h3&gt;
  
  
  How does the Voice Agent API handle medical terminology?
&lt;/h3&gt;

&lt;p&gt;The STT layer under the Voice Agent API is Universal-3 Pro Streaming, which performs well on conversational medical terminology like medication names and common conditions. For billing-grade clinical documentation — SOAP notes, ICD-10 candidate coding, structured chart entries — AssemblyAI's separate &lt;a href="https://www.assemblyai.com/docs/async-stt/medical-mode" rel="noopener noreferrer"&gt;Medical Mode&lt;/a&gt; async API is purpose-built for clinical accuracy. Enable it with domain="medical-v1" on a pre-recorded transcript request. The common architecture is: real-time triage on the Voice Agent API, post-call documentation through Medical Mode async, both under the same BAA.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens when the agent encounters a red flag?
&lt;/h3&gt;

&lt;p&gt;When the agent detects a red flag — cardiac symptoms (chest pain, arm pain, shortness of breath), stroke symptoms (facial drooping, slurred speech, weakness), surgical symptoms (severe abdominal pain), sepsis indicators (high fever with stiff neck), or mental health emergencies (suicidal ideation) — it should immediately call escalate_to_nurse with the reason, tell the patient "These symptoms need immediate attention. I'm connecting you to our on-call nurse right now," and hand off the call along with the captured symptoms. Red-flag escalation must be automatic, not conditional. Never let the agent continue triaging after a red flag is captured.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between this and a healthcare scheduling voice agent?
&lt;/h3&gt;

&lt;p&gt;A healthcare scheduling voice agent books appointments, verifies insurance, and handles prescription refills — administrative tasks where the worst-case error is a rescheduled appointment. A telehealth triage voice agent captures clinical symptoms and routes to care levels — clinical tasks where the worst-case error is a missed cardiac event. The two have different risk profiles, different prompts, different tools, and different review processes. A team building both should keep them as separate agents with separate audit trails. Our &lt;a href="https://www.assemblyai.com/blog/voice-agents-healthcare" rel="noopener noreferrer"&gt;healthcare voice agents guide&lt;/a&gt; covers the scheduling/administrative side.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>ai</category>
      <category>healthcare</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to add automatic LLM fallbacks to your voice pipeline</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 12 May 2026 18:01:08 +0000</pubDate>
      <link>https://dev.to/martschweiger/how-to-add-automatic-llm-fallbacks-to-your-voice-pipeline-4cn0</link>
      <guid>https://dev.to/martschweiger/how-to-add-automatic-llm-fallbacks-to-your-voice-pipeline-4cn0</guid>
      <description>&lt;p&gt;Your voice agent is mid-conversation when Anthropic's API returns a 529 overloaded error. The user is waiting. Your code throws. The call drops.&lt;/p&gt;

&lt;p&gt;This is the failure mode most voice pipelines aren't built for—and it's getting worse, not better. As more applications move to a single LLM provider, a regional outage at any one of them stalls every downstream voice agent that depends on it. The fix isn't more retries on the same model; it's an automatic switch to a different one.&lt;/p&gt;

&lt;p&gt;This tutorial walks you through adding automatic LLM fallbacks to a voice pipeline using AssemblyAI's LLM Gateway. With one extra parameter in your request, the Gateway will automatically retry failed calls on a backup model—Claude to Gemini to GPT—without you writing a line of retry logic. By the end, you'll have a runnable Python pipeline that transcribes live audio with Universal-3 Pro Streaming, routes the transcript through a primary LLM with a fallback chain, and stays online when any single provider does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why fallbacks matter more for voice than for chat
&lt;/h2&gt;

&lt;p&gt;In a chat app, an LLM error means a spinner and a retry button. In a Voice AI pipeline, it means dead air. The user is on the phone, waiting for a response, and a five-second silence while you reconnect to a different provider already feels like a hang-up.&lt;/p&gt;

&lt;p&gt;Three failure modes that fallbacks solve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Provider rate limits.&lt;/strong&gt; OpenAI, Anthropic, and Google all enforce per-account TPM (tokens per minute) ceilings. A traffic spike on a Monday morning sales line can blow through your default tier before lunch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional outages.&lt;/strong&gt; Provider status pages show a real distribution of multi-hour incidents per quarter. If your only LLM call is to a single model, your uptime is capped at theirs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model deprecations.&lt;/strong&gt; A model gets sunset on short notice. Without a fallback configured, every voice session that hits the deprecated model fails until you ship a code change.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM Gateway sits in front of every supported provider. You point your client at one endpoint, specify a primary model, and list one or two fallbacks. When the primary fails—overloaded, rate-limited, or unavailable—the Gateway transparently retries on the next model in line and returns the response as if nothing went wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A Python voice pipeline that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Streams microphone audio to AssemblyAI's Universal-3 Pro streaming speech-to-text model&lt;/li&gt;
&lt;li&gt;On end-of-turn, sends the final transcript to LLM Gateway with kimi-k2.5 as the primary model and claude-sonnet-4-6 as the fallback&lt;/li&gt;
&lt;li&gt;Prints the agent's response—and logs which model actually handled the call&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You'll also see how to chain multiple fallbacks, override prompts per fallback model, and tune retry behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AssemblyAI Universal-3 Pro Streaming (speech-to-text)&lt;/li&gt;
&lt;li&gt;AssemblyAI LLM Gateway (LLM routing with fallbacks)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Install the dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;assemblyai requests python-dotenv pyaudio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a .env file with your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You only need one key. The same key authenticates both the streaming STT WebSocket and the LLM Gateway endpoint—no separate accounts with OpenAI, Anthropic, or Google required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Connect to Universal-3 Pro Streaming
&lt;/h2&gt;

&lt;p&gt;For a voice agent, you want the lowest-latency path from speech to text, then immediately hand the transcript to the LLM. We'll use AssemblyAI's v3 streaming API, which returns immutable final transcripts in roughly 300ms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;assemblyai.streaming.v3&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingClientOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingParameters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;BeginEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TurnEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TerminationEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_begin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeginEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Session started: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TurnEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_of_turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;respond_with_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r&lt;/span&gt;&lt;span class="s"&gt;Partial: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StreamingError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STT error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_terminated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TerminationEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Session terminated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The on_turn handler is where the LLM call happens. Every time the user finishes speaking, we hand the final transcript to respond_with_fallback—the function we're about to define.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Add the fallback chain
&lt;/h2&gt;

&lt;p&gt;Here's the part that matters. A standard chat completions request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;respond_with_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://llm-gateway.assemblyai.com/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kimi-k2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Primary: fast, low-latency
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful voice assistant. Keep responses to one or two short sentences.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fallbacks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;# First fallback
&lt;/span&gt;                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;    &lt;span class="c1"&gt;# Second fallback
&lt;/span&gt;            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fallback_config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;      &lt;span class="c1"&gt;# Try up to two fallbacks
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All models failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;actual_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actual_model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details that matter for production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model field in the response reflects which model actually answered. If your primary failed and the Gateway used Claude instead, you'll see claude-sonnet-4-6 in the response—and you'll only be billed for that model.&lt;/li&gt;
&lt;li&gt;Without a fallbacks array, the Gateway still does one automatic retry on the primary after 500ms (default fallback_config.retry: true). That handles transient blips. The fallback array handles outright failures.&lt;/li&gt;
&lt;li&gt;fallback_config.depth controls how many fallbacks to try. Setting it to 2 means the Gateway will try the primary, then the first fallback, then the second.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Choose the right primary and fallback models
&lt;/h2&gt;

&lt;p&gt;Latency and capability vary widely across providers. For voice, you want a fast primary because the user is waiting in real time, and a more reliable secondary in case the fast one is overloaded.&lt;/p&gt;

&lt;p&gt;Pulled from the LLM Gateway model list, here are sensible voice agent pairings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Primary&lt;/th&gt;
&lt;th&gt;Fallback 1&lt;/th&gt;
&lt;th&gt;Fallback 2&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Latency-critical (phone agent)&lt;/td&gt;
&lt;td&gt;kimi-k2.5 (~1.2s)&lt;/td&gt;
&lt;td&gt;gemini-2.5-flash-lite (~1.1s)&lt;/td&gt;
&lt;td&gt;gpt-5-nano (~3.2s)&lt;/td&gt;
&lt;td&gt;All low latency; different providers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality-first (clinical, legal)&lt;/td&gt;
&lt;td&gt;claude-sonnet-4-6&lt;/td&gt;
&lt;td&gt;gemini-2.5-pro&lt;/td&gt;
&lt;td&gt;gpt-5.1&lt;/td&gt;
&lt;td&gt;Highest quality models in each provider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Balanced (most consumer apps)&lt;/td&gt;
&lt;td&gt;gpt-5.2 (~1.6s)&lt;/td&gt;
&lt;td&gt;claude-haiku-4-5-20251001 (~4.1s)&lt;/td&gt;
&lt;td&gt;kimi-k2.5&lt;/td&gt;
&lt;td&gt;Speed + cross-provider redundancy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key constraint: your fallbacks should be on different providers from the primary. A Claude Sonnet to Claude Haiku fallback won't help during an Anthropic outage—both calls hit the same upstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Override fields per fallback
&lt;/h2&gt;

&lt;p&gt;Sometimes a fallback model needs a different prompt. Maybe your primary uses a 4,000-token system prompt that your cheaper fallback doesn't have the context window for. Or you want the fallback to be more concise to keep latency in check.&lt;/p&gt;

&lt;p&gt;LLM Gateway lets you override any request field per fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fallbacks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Be very concise. One sentence max.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any field you don't override is inherited from the original request. This is especially useful when your primary is tuned with a long, detailed system prompt and you want a stripped-down version on the backup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Putting it all together
&lt;/h2&gt;

&lt;p&gt;Wire the streaming client to your fallback-enabled response function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;assemblyai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;aai&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;StreamingClientOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;api_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;streaming.assemblyai.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Begin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_begin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Termination&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_terminated&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;StreamingParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;speech_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;u3-rt-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MicrophoneStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;terminate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it, speak into your microphone, and watch the printed model name. Then—if you want to see the fallback fire—set the primary model parameter to a deliberately invalid string like "this-model-does-not-exist". The Gateway will fail the primary, immediately route to your first fallback, and return a normal response with the fallback model name in the output.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this gets you in production
&lt;/h2&gt;

&lt;p&gt;Three changes to your voice pipeline as soon as fallbacks are in place:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provider outages stop being your incidents.&lt;/strong&gt; When Anthropic, OpenAI, or Google has a regional issue, your voice sessions keep flowing—they just route through whichever provider is healthy. You don't get paged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate-limit spikes self-heal.&lt;/strong&gt; A traffic spike that would have hit your TPM ceiling on the primary now spreads across providers automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model migrations are zero-downtime.&lt;/strong&gt; When a new model ships, you can flip the primary to the new model and keep the old one as a fallback. If anything goes wrong, traffic falls back automatically while you debug.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can layer more on top of this—separate fallback chains per use case, EU-resident endpoints for GDPR compliance, prompt caching to amortize cost—but the single fallbacks array gets you 90% of the resilience for two extra lines of JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to build next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pair fallbacks with&lt;/strong&gt; &lt;a href="https://www.assemblyai.com/docs/llm-gateway/chat-completions" rel="noopener noreferrer"&gt;&lt;strong&gt;streaming chat completions&lt;/strong&gt;&lt;/a&gt; so the user hears the first sentence while the LLM is still generating the rest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add&lt;/strong&gt; &lt;a href="https://www.assemblyai.com/docs/llm-gateway/tool-calling" rel="noopener noreferrer"&gt;&lt;strong&gt;tool calling&lt;/strong&gt;&lt;/a&gt; to let your voice agent look up orders, schedule callbacks, or transfer to a human—same fallback behavior carries through.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidate to one API.&lt;/strong&gt; If you're managing this on top of a separate STT provider, AssemblyAI's Voice Agent API bundles speech understanding, LLM reasoning, and voice generation into a single WebSocket—same fallback patterns apply at the LLM layer, and there's nothing to wire together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Voice agents need to be built for the failures that will actually happen, not the happy path. Fallbacks turn LLM availability from a single point of failure into a non-event.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an LLM fallback and why does my voice pipeline need one?
&lt;/h3&gt;

&lt;p&gt;An LLM fallback is a backup model that automatically takes over when your primary model fails—whether from a provider outage, rate limit, or transient error. Voice pipelines need fallbacks because a failed LLM call means dead air on a live call, which is much worse than a failed text request that the user can retry. With AssemblyAI's LLM Gateway, you specify a fallbacks array in your request and the Gateway transparently retries on the next model if the primary fails—no custom retry logic required.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does AssemblyAI's LLM Gateway handle automatic LLM failover?
&lt;/h3&gt;

&lt;p&gt;LLM Gateway accepts a fallbacks array of up to two backup models per request. If the primary model fails, the Gateway automatically retries the request with the first fallback, then the second, until one succeeds. The response payload reflects the model that actually answered, and you're billed only for that model. By default, the Gateway also performs one automatic retry on the primary after 500 ms to handle transient errors before falling back to a different provider.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which LLM providers does AssemblyAI's LLM Gateway support for fallback chains?
&lt;/h3&gt;

&lt;p&gt;LLM Gateway supports 25+ models across Anthropic Claude (Opus 4.7, Sonnet 4.6, Haiku 4.5), OpenAI GPT (GPT-5.2, 5.1, 5, 4.1, mini, nano, gpt-oss), Google Gemini (3 Flash Preview, 2.5 Pro/Flash/Flash-Lite), Alibaba Cloud Qwen, and Moonshot AI Kimi. For voice fallback chains, the key constraint is to chain across different providers—a Claude to Claude fallback won't help during an Anthropic outage because both calls hit the same upstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I add automatic LLM fallbacks to my voice pipeline?
&lt;/h3&gt;

&lt;p&gt;Add a fallbacks array to your chat/completions request body—that's it. The Gateway handles retries, model switching, and billing automatically. A typical voice agent pairing is kimi-k2.5 as the primary (~1.2s latency), claude-sonnet-4-6 as the first fallback for higher quality, and gemini-2.5-flash-lite as a second fallback for additional provider redundancy. Set fallback_config.depth: 2 to use both backups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I customize the prompt or temperature for each fallback model?
&lt;/h3&gt;

&lt;p&gt;Yes—LLM Gateway lets you override any request field per fallback. This is useful when your primary uses a long, detailed system prompt that a smaller fallback can't accommodate, or when you want the fallback to be more concise to keep latency in check. Any field you don't override on the fallback is inherited from the original request, so you only need to specify what changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does billing work when an LLM Gateway request falls back to a different model?
&lt;/h3&gt;

&lt;p&gt;You're charged only for the model that actually returned the response, at that model's per-token rate. If your primary fails and the Gateway retries with a fallback, you pay only for the fallback model's tokens—not for the failed primary attempt. All usage shows up on a single AssemblyAI invoice across providers, with no markup on top of model rates.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between LLM Gateway fallbacks and writing my own retry logic?
&lt;/h3&gt;

&lt;p&gt;LLM Gateway fallbacks handle the entire retry-and-route flow inside the Gateway, so your application code makes one request and gets one response—no custom timeout handling, no model-switching logic, no per-provider error mapping. Writing it yourself works for chat apps where a few seconds of retry latency is fine, but in a voice pipeline every second of dead air costs you, and built-in fallbacks fire faster than client-side retries because the Gateway is already inside the network path.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
      <category>llm</category>
    </item>
    <item>
      <title>AssemblyAI LLM Gateway vs. OpenRouter vs. LLM Gateway.io: Pricing, security, and reliability compared</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 12 May 2026 18:00:32 +0000</pubDate>
      <link>https://dev.to/martschweiger/assemblyai-llm-gateway-vs-openrouter-vs-llm-gatewayio-pricing-security-and-reliability-4hg3</link>
      <guid>https://dev.to/martschweiger/assemblyai-llm-gateway-vs-openrouter-vs-llm-gatewayio-pricing-security-and-reliability-4hg3</guid>
      <description>&lt;p&gt;Picking an LLM gateway used to be a niche infrastructure decision. In 2026, it's table stakes for any team running production AI workloads—especially voice agents, where a single provider outage means dead air on a live call.&lt;/p&gt;

&lt;p&gt;Three names come up over and over again in this evaluation: AssemblyAI's LLM Gateway, OpenRouter, and LLM Gateway.io. They sound similar on the surface—all three give you a single API for routing requests across Claude, GPT, Gemini, and other major providers—but they're built for different workloads and they price, fail over, and handle data very differently.&lt;/p&gt;

&lt;p&gt;This post compares the three head-to-head on the dimensions that actually matter when you're shipping: pricing model, reliability features, security posture, model coverage, and developer experience. By the end, you'll know which one fits your stack—and where the cheap-on-paper option will cost you more downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick verdict
&lt;/h2&gt;

&lt;p&gt;If you're building...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voice agents, AI scribes, meeting tools, or anything on top of audio&lt;/strong&gt;&lt;br&gt;
AssemblyAI LLM Gateway — speech-native context, one billing relationship, sits next to your STT&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A general-purpose LLM app, side project, or model marketplace UI&lt;/strong&gt;&lt;br&gt;
OpenRouter — widest model selection (300+), BYO-key option, strong for experimentation&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A self-hosted gateway you fully control, with custom routing logic&lt;/strong&gt;&lt;br&gt;
LLM Gateway.io — open-source, self-hostable, maximum customization&lt;/p&gt;

&lt;p&gt;The rest of this post unpacks why.&lt;/p&gt;

&lt;h2&gt;
  
  
  What each one actually is
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AssemblyAI LLM Gateway
&lt;/h3&gt;

&lt;p&gt;A managed, OpenAI-compatible chat completions API that routes to 25+ models across Anthropic, OpenAI, Google, Alibaba Cloud Qwen, and Moonshot AI Kimi. Available at llm-gateway.assemblyai.com/v1/chat/completions (US) or llm-gateway.eu.assemblyai.com/v1/chat/completions (EU). Built specifically for Voice AI workloads—designed to take transcripts from AssemblyAI's Universal-3 Pro Streaming or pre-recorded models and apply LLMs to them with native preservation of speaker labels, timestamps, and conversation structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; teams already using AssemblyAI for transcription, or any team building voice agents, conversation intelligence, AI medical scribes, or audio analytics.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenRouter
&lt;/h3&gt;

&lt;p&gt;A model marketplace that aggregates 300+ models from dozens of providers behind a single OpenAI-compatible endpoint. OpenRouter operates as a billing intermediary—you pay OpenRouter, OpenRouter pays the upstream provider—typically at a small markup over direct API rates, with bring-your-own-API-key supported on most models for users who want to bypass the markup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; general-purpose LLM applications, hobbyist and prosumer use cases, and teams that want access to long-tail or specialized open-source models that other gateways don't carry.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Gateway.io
&lt;/h3&gt;

&lt;p&gt;An open-source LLM gateway that you can self-host or use through their managed cloud. Focuses on infrastructure-level features: custom routing rules, observability, caching, rate limiting, and budget controls. Less of a marketplace and more of a control plane you put in front of your LLM traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; teams with strict deployment requirements (air-gapped, on-prem, regulated industries) or teams that need deep customization of routing logic and want to own the infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing, head-to-head
&lt;/h2&gt;

&lt;p&gt;This is where the differences are sharpest—and where the cheapest sticker price isn't always the cheapest total cost.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AssemblyAI LLM Gateway&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;LLM Gateway.io&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Markup over provider rates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None — pay model-specific rates&lt;/td&gt;
&lt;td&gt;Small markup on most models (BYOK avoids it)&lt;/td&gt;
&lt;td&gt;None when self-hosted; managed plan has its own pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Billing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unified with your AssemblyAI account (single invoice)&lt;/td&gt;
&lt;td&gt;Separate OpenRouter account&lt;/td&gt;
&lt;td&gt;Separate or self-hosted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free tier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — $50 in starter credits&lt;/td&gt;
&lt;td&gt;Yes — limited free models&lt;/td&gt;
&lt;td&gt;Open-source is free; managed has tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Volume discounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available via custom plans&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Self-hosted: scale at infrastructure cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hidden costs to watch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None obvious&lt;/td&gt;
&lt;td&gt;BYOK still pays small platform fee on some providers&lt;/td&gt;
&lt;td&gt;Self-hosted ops overhead (hosting, monitoring, scaling)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The quiet cost of OpenRouter for high-volume production traffic is the per-token markup, which compounds across millions of tokens. The quiet cost of self-hosting LLM Gateway.io is the engineering time to keep it healthy. AssemblyAI's pricing is the most predictable: model-list rate, no markup, one bill.&lt;/p&gt;

&lt;p&gt;For voice workloads specifically, the bigger pricing story is what's &lt;em&gt;not&lt;/em&gt; on this table. If you're already paying for speech-to-text, LLM Gateway adds the LLM layer on the same bill—no second vendor relationship, no separate procurement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model coverage
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AssemblyAI LLM Gateway&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;LLM Gateway.io&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total models&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;25+&lt;/td&gt;
&lt;td&gt;300+&lt;/td&gt;
&lt;td&gt;Whatever you configure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anthropic Claude&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All major models (Opus 4.7, Sonnet 4.6, Haiku 4.5)&lt;/td&gt;
&lt;td&gt;All major models&lt;/td&gt;
&lt;td&gt;Yes (BYO)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI GPT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GPT-5.2, 5.1, 5, 4.1, GPT-5 mini/nano, gpt-oss&lt;/td&gt;
&lt;td&gt;All major models&lt;/td&gt;
&lt;td&gt;Yes (BYO)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash Preview, 2.5 Pro/Flash/Flash-Lite&lt;/td&gt;
&lt;td&gt;All major Gemini models&lt;/td&gt;
&lt;td&gt;Yes (BYO)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open-source / specialty&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen3, Kimi K2.5, gpt-oss&lt;/td&gt;
&lt;td&gt;Long tail (Mistral, Llama variants, Cohere, fine-tunes, etc.)&lt;/td&gt;
&lt;td&gt;Yes (BYO)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;New model availability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same week as upstream release in most cases&lt;/td&gt;
&lt;td&gt;Within hours-days&lt;/td&gt;
&lt;td&gt;Depends on your config&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenRouter wins on raw breadth—if you need an obscure fine-tune or a specific open-source variant, it's there. AssemblyAI's lineup is curated to the production-grade frontier and best-of-class fast models, which is what almost every voice agent or audio app actually needs. LLM Gateway.io, being the gateway layer rather than the model layer, gives you whatever you wire up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reliability features
&lt;/h2&gt;

&lt;p&gt;For voice and real-time use cases, this is the table that matters most.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AssemblyAI LLM Gateway&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;LLM Gateway.io&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Automatic fallback to backup model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — built-in fallbacks array, up to 2 backups&lt;/td&gt;
&lt;td&gt;Yes — fallback model parameter&lt;/td&gt;
&lt;td&gt;Yes — configurable routing rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retry on transient failure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — automatic 500ms retry by default&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (configurable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Per-fallback field overrides&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — override prompt, temp, max_tokens per backup&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes (custom logic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Streaming support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (OpenAI models)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompt caching&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — Anthropic and OpenAI caching supported&lt;/td&gt;
&lt;td&gt;Provider-dependent&lt;/td&gt;
&lt;td&gt;Provider-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-region failover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;US + EU endpoints&lt;/td&gt;
&lt;td&gt;Single global endpoint&lt;/td&gt;
&lt;td&gt;Whatever you build&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AssemblyAI's fallback model is worth a closer look. You can specify a chain of up to two backup models; if your primary fails, the Gateway transparently retries the next model in line and returns the response as if nothing happened. The response payload includes the actual model that handled the request, and you're only billed for that model. For voice pipelines where every second of dead air costs you, this is the feature that turns LLM availability from a single point of failure into a non-event.&lt;/p&gt;

&lt;p&gt;OpenRouter's fallback support is similar in concept but implemented differently—you specify fallbacks at the request level and the platform handles routing. LLM Gateway.io gives you the most flexibility because you write the routing logic, but that flexibility is also work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security and compliance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AssemblyAI LLM Gateway&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;LLM Gateway.io&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SOC 2 Type 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Self-hosted: depends on your setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HIPAA BAA available&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited (varies by provider)&lt;/td&gt;
&lt;td&gt;Self-hosted: yours to maintain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EU data residency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — dedicated EU endpoint&lt;/td&gt;
&lt;td&gt;No dedicated EU endpoint&lt;/td&gt;
&lt;td&gt;Self-hosted: yours to deploy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PCI DSS v4.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Self-hosted: yours to certify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ISO 27001:2022&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Self-hosted: yours to certify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data retention controls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Configurable; opt-out of training&lt;/td&gt;
&lt;td&gt;Provider-dependent&lt;/td&gt;
&lt;td&gt;You control everything&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For regulated industries—healthcare, financial services, legal—the compliance story is the deciding factor. AssemblyAI offers a Business Associate Agreement for HIPAA workloads and is SOC 2 Type 2, ISO 27001:2022, and PCI DSS v4.0 certified. The EU endpoint guarantees data never leaves the European Union, which matters under GDPR.&lt;/p&gt;

&lt;p&gt;OpenRouter's compliance posture is thinner—it's a marketplace, and the underlying compliance ultimately depends on the provider you route to. LLM Gateway.io self-hosted shifts every compliance burden onto your team, which is either a feature (full control) or a bug (full responsibility) depending on your org.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice and audio: where the real differences show up
&lt;/h2&gt;

&lt;p&gt;This is where AssemblyAI's gateway separates from the others, and the comparison stops being symmetric.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speech-native context preservation.&lt;/strong&gt; When you pass an AssemblyAI transcript to LLM Gateway, speaker labels, timestamps, and conversation structure are preserved in the prompt automatically. You don't flatten the transcript; the model receives the structured speech data. Generic LLM gateways can't do this because they're not aware of the upstream STT.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Same-account billing with transcription.&lt;/strong&gt; If you're already using AssemblyAI for STT or the Voice Agent API, every LLM call shows up on the same invoice. No reconciling tokens with minutes-of-audio across two vendors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming integration.&lt;/strong&gt; AssemblyAI's streaming API returns final transcripts in roughly 300 ms; you can hand each segment to LLM Gateway in real time for live summarization, translation, sentiment tagging, or agentic logic—no separate pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built for audio-specific workloads.&lt;/strong&gt; Meeting summarization, action item extraction, SOAP note generation for ambient AI scribes, sales call analytics, real-time translation—these are all first-class patterns in the docs and they work the same way you'd expect a chat completion to work.&lt;/p&gt;

&lt;p&gt;OpenRouter and LLM Gateway.io can technically do all of this—you just have to glue the audio side together yourself. For one or two endpoints, that's fine. For a production voice product with complex prompts, multiple LLM tasks per call, and tight latency budgets, the integrated path saves real engineering time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer experience
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AssemblyAI LLM Gateway&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;LLM Gateway.io&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API compatibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible chat completions&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single AssemblyAI API key&lt;/td&gt;
&lt;td&gt;OpenRouter key (or BYOK)&lt;/td&gt;
&lt;td&gt;Self-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SDKs / docs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Official AssemblyAI SDKs (Python, Node, .NET, Java, etc.) + docs&lt;/td&gt;
&lt;td&gt;Their own SDK + community libraries&lt;/td&gt;
&lt;td&gt;Open-source repo + docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Playground&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — test models side-by-side&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Self-hosted only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Minutes (just swap the base URL)&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Hours-days for self-host&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Migration friction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same OpenAI-compatible request schema&lt;/td&gt;
&lt;td&gt;Same OpenAI-compatible request schema&lt;/td&gt;
&lt;td&gt;Same OpenAI-compatible request schema&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three are easy to adopt because they all speak the same chat completions schema. Switching from one to another requires changing a base URL and an API key—not a rewrite. That's the right way to think about lock-in: low.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to pick each one
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pick AssemblyAI LLM Gateway if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building voice agents, AI scribes, conversation intelligence, or any audio-first product&lt;/li&gt;
&lt;li&gt;You're already using AssemblyAI for transcription and want to consolidate&lt;/li&gt;
&lt;li&gt;You need a BAA for HIPAA workloads, EU data residency, or PCI compliance&lt;/li&gt;
&lt;li&gt;You want predictable pricing without per-token markups&lt;/li&gt;
&lt;li&gt;You want fallbacks, prompt caching, and EU/US endpoints out of the box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick OpenRouter if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building a chat app, agent product, or general LLM tool unrelated to audio&lt;/li&gt;
&lt;li&gt;You need access to a long tail of open-source or specialty models&lt;/li&gt;
&lt;li&gt;You want to experiment across many models before committing&lt;/li&gt;
&lt;li&gt;You're a hobbyist or prosumer who values selection over enterprise compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick LLM Gateway.io if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have hard requirements to self-host or run air-gapped&lt;/li&gt;
&lt;li&gt;You need to write custom routing logic (e.g., regulatory rules, cost-aware routing across BYO accounts)&lt;/li&gt;
&lt;li&gt;You have engineering capacity to operate the infrastructure&lt;/li&gt;
&lt;li&gt;You're standardizing across many internal teams and want one control plane&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The hidden tradeoff
&lt;/h2&gt;

&lt;p&gt;The real question isn't "which gateway has the most features." It's "which one will I regret picking in six months when my workload doubles."&lt;/p&gt;

&lt;p&gt;For voice and audio workloads, that answer is almost always the gateway that's natively integrated with your speech stack. The marginal latency, the speech-aware context, the unified billing, the compliance—all of it adds up to engineering hours you don't spend wiring two vendors together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an LLM gateway and why would I use one?
&lt;/h3&gt;

&lt;p&gt;An LLM gateway is a routing layer that sits between your application and multiple LLM providers, giving you one API endpoint for Claude, GPT, Gemini, and other models. You'd use one to avoid vendor lock-in, add automatic failover when a provider has an outage, unify billing across models, and switch models without rewriting client code. AssemblyAI's LLM Gateway, OpenRouter, and LLM Gateway.io are the three main options—they serve different workloads and price differently.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between AssemblyAI's LLM Gateway and OpenRouter?
&lt;/h3&gt;

&lt;p&gt;"AssemblyAI's LLM Gateway is purpose-built for Voice AI workloads—it natively preserves speaker labels, timestamps, and conversation structure when you pass transcripts." OpenRouter serves as a general-purpose model marketplace that aggregates 300+ models with a per-token markup. For voice agents, AI scribes, and audio applications, the integrated approach offers advantages in handling speech context and unified billing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which LLM gateway is best for voice agents?
&lt;/h3&gt;

&lt;p&gt;AssemblyAI's LLM Gateway represents the strongest fit for voice agents because it integrates with Universal-3 Pro Streaming and the Voice Agent API through the same WebSocket layer. This configuration provides unified authentication, combined billing, automatic fallbacks across providers, and native speech context preservation—advantages that generic gateways require additional engineering to achieve.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does LLM Gateway pricing compare to calling LLM providers directly?
&lt;/h3&gt;

&lt;p&gt;AssemblyAI's LLM Gateway charges model-specific rates with no markup, billed through your AssemblyAI account. OpenRouter adds a small per-token platform fee, though their bring-your-own-API-key option can reduce this. LLM Gateway.io remains free as open-source software when self-hosted, with infrastructure costs your team absorbs, or users can opt for their managed tier. For high-volume production, AssemblyAI and self-hosted LLM Gateway.io provide the most predictable cost structures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does AssemblyAI's LLM Gateway support EU data residency and HIPAA compliance?
&lt;/h3&gt;

&lt;p&gt;Yes—a dedicated EU endpoint at llm-gateway.eu.assemblyai.com/v1/chat/completions keeps all request and response data inside the European Union, supporting Anthropic Claude and most Google Gemini models. AssemblyAI provides a Business Associate Agreement for HIPAA workloads and maintains SOC 2 Type 2, ISO 27001:2022, and PCI DSS v4.0 certification, representing the strictest compliance posture among the three platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I switch between LLM gateways without rewriting my code?
&lt;/h3&gt;

&lt;p&gt;Yes—all three gateways use OpenAI-compatible chat completions schemas, so switching typically requires changing only the base URL and API key. This means lock-in remains low; you can evaluate one platform against another and migrate without rewriting application code. Moving from direct OpenAI integration to any of these gateways involves similarly minimal changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which LLM gateway should I use for HIPAA-regulated healthcare apps?
&lt;/h3&gt;

&lt;p&gt;AssemblyAI's LLM Gateway represents the most straightforward choice for HIPAA workloads since the company offers a Business Associate Agreement and operates SOC 2 Type 2, ISO 27001:2022, and PCI DSS v4.0-certified infrastructure. For data isolation beyond BAA scope, LLM Gateway.io self-hosted provides complete deployment control but requires your team to maintain compliance certification. OpenRouter generally misaligns with regulated healthcare data requirements due to variable compliance support across upstream providers.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>tooling</category>
      <category>comparison</category>
    </item>
    <item>
      <title>Stream LLM responses in a voice pipeline: Tool calling, structured outputs, and real-time actions</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 12 May 2026 17:59:55 +0000</pubDate>
      <link>https://dev.to/martschweiger/stream-llm-responses-in-a-voice-pipeline-tool-calling-structured-outputs-and-real-time-actions-1h67</link>
      <guid>https://dev.to/martschweiger/stream-llm-responses-in-a-voice-pipeline-tool-calling-structured-outputs-and-real-time-actions-1h67</guid>
      <description>&lt;p&gt;When a user finishes a sentence in a voice conversation, they expect to hear the agent start replying within roughly a second. Anything longer feels broken. The fastest way to hit that target isn't a faster LLM—it's not waiting for the LLM to finish before you start speaking.&lt;/p&gt;

&lt;p&gt;Streaming the LLM response, sentence by sentence, into a TTS engine is the trick that turns a 4-second response time into a sub-second one. And once you're streaming, you can layer on tool calling for real-world actions and structured outputs for predictable downstream code—all without giving up that latency budget.&lt;/p&gt;

&lt;p&gt;This tutorial walks through how to build that pipeline using AssemblyAI's LLM Gateway and Universal-3 Pro Streaming. By the end, you'll have a Python voice pipeline that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Streams microphone audio into AssemblyAI for live transcription&lt;/li&gt;
&lt;li&gt;Streams the LLM response token-by-token through LLM Gateway&lt;/li&gt;
&lt;li&gt;Calls tools mid-conversation to look up data or trigger actions&lt;/li&gt;
&lt;li&gt;Returns structured JSON when the workflow needs predictable output&lt;/li&gt;
&lt;li&gt;Hands each completed sentence to TTS as it arrives&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why streaming matters more in voice than in chat
&lt;/h2&gt;

&lt;p&gt;In a chat UI, streaming is a nice-to-have—you see the response appear word by word instead of all at once. In a voice agent, it's the difference between conversational and broken.&lt;/p&gt;

&lt;p&gt;The math is simple. End-to-end voice latency is roughly:&lt;/p&gt;

&lt;p&gt;STT finalization (200-500 ms)&lt;br&gt;&lt;br&gt;
+ LLM time-to-first-token (150-400 ms)&lt;br&gt;&lt;br&gt;
+ TTS time-to-first-audio (200-400 ms)&lt;br&gt;&lt;br&gt;
+ network overhead (50-150 ms)&lt;br&gt;&lt;br&gt;
= 600-1500 ms before the user hears anything  &lt;/p&gt;

&lt;p&gt;If you wait for the full LLM response before sending text to TTS, add another 1-3 seconds onto that. Users notice. Conversation breaks. They start over.&lt;/p&gt;

&lt;p&gt;If you stream—flushing each completed sentence to TTS as soon as the LLM emits it—the user hears the first sentence while the LLM is still generating the second. End-to-end latency stays inside the 600-900 ms range that feels conversational.&lt;/p&gt;
&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A Python pipeline that handles three voice agent patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Streamed conversational replies&lt;/strong&gt;—the user asks a question; the agent's voice starts within ~1 second and flows naturally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling&lt;/strong&gt;—the user says "what's my order status?"; the agent calls get_order_status(order_id) and speaks the result&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured outputs&lt;/strong&gt;—the agent returns a JSON object matching a schema (e.g., {intent, urgency, escalate}), which your code consumes directly without parsing freeform text&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AssemblyAI Universal-3 Pro Streaming (speech-to-text)&lt;/li&gt;
&lt;li&gt;AssemblyAI LLM Gateway (streaming chat completions, tools, structured outputs)&lt;/li&gt;
&lt;li&gt;A TTS engine of your choice (we'll use a placeholder—same pattern works with any streaming TTS)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;assemblyai requests python-dotenv pyaudio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Create .env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_key_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same AssemblyAI API key authenticates both the streaming STT WebSocket and the LLM Gateway endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Stream tokens from LLM Gateway
&lt;/h2&gt;

&lt;p&gt;LLM Gateway supports OpenAI-style streaming on OpenAI models. Set stream: True in the request and read the response as a Server-Sent Events (SSE) stream. Each chunk contains a partial token; you stitch them together as they arrive.&lt;/p&gt;

&lt;p&gt;The key trick for voice: don't wait for the full response. Buffer tokens, watch for sentence boundaries (., !, ?), and flush each completed sentence to TTS the instant it's ready.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_llm_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Stream the LLM response. Yield each completed sentence as it&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s generated.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://llm-gateway.assemblyai.com/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a friendly voice assistant. Keep replies short.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;sentence_endings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DONE]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentence_endings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;split_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentence_endings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt; &lt;span class="n"&gt;split_idx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;split_idx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generator yields each completed sentence as it's ready. Your TTS engine consumes these one at a time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send a sentence to your TTS engine. Replace with your provider&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s API.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [TTS] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# tts_client.stream(sentence)
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_final_transcript&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;stream_llm_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single change—yielding sentences as they arrive instead of waiting for the full reply—typically cuts perceived response time by 60-80% for any reply longer than two sentences.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Add tool calling
&lt;/h2&gt;

&lt;p&gt;Voice agents become useful the moment they can do something—look up an order, check inventory, schedule a callback, transfer to a human. LLM Gateway supports OpenAI-compatible tool calling across every supported model (Claude, OpenAI, Gemini), so you write the code once and it works no matter which provider you route to.&lt;/p&gt;

&lt;p&gt;Define your tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up the status of a customer order by order ID.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The order ID, e.g. ORD-12345&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule_callback&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Schedule a callback with a sales rep.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preferred_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preferred_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implement the actual functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_order_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-05-09&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;schedule_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;preferred_time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CB-9982&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;preferred_time&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;TOOL_REGISTRY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_order_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule_callback&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;schedule_callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now extend the LLM call to handle tool requests. The Gateway returns a tool_calls field on the assistant message; you execute each tool, append the result to the conversation history, and call again to let the model produce its spoken response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_llm_response_with_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Stream a follow-up reply using the existing conversation history.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://llm-gateway.assemblyai.com/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;sentence_endings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DONE]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentence_endings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;split_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentence_endings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt; &lt;span class="n"&gt;split_idx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;split_idx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;respond_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://llm-gateway.assemblyai.com/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;fn_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;fn_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_REGISTRY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;fn_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;stream_llm_response_with_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_yield_once&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_yield_once&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;stream_llm_response_with_history is the same streaming function from Step 1, except it sends the full conversation history (which now includes the tool result) so the model can speak the answer naturally.&lt;/p&gt;

&lt;p&gt;The clean part: tool calling and streaming compose. The model thinks for a moment ("let me check that for you"), executes the tool, and then streams the spoken result token by token—exactly the conversational rhythm users expect.&lt;/p&gt;

&lt;p&gt;A note for entity-heavy use cases: if your tool parameters include order IDs, phone numbers, or email addresses, your speech-to-text accuracy on those tokens is what determines whether tool calls succeed. Universal-3 Pro Streaming has roughly 16.7% mixed-entity error rate vs. 23-25% for competing models—that's the difference between ORD-12345 and or 12 three 45 getting passed to your function.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Use structured outputs for predictable JSON
&lt;/h2&gt;

&lt;p&gt;Sometimes you don't want a spoken reply—you want machine-readable output your downstream code can act on. Routing decisions, intent classification, sentiment scoring, escalation flags. LLM Gateway supports structured outputs via JSON schema, which guarantees the model returns exactly the shape you specified.&lt;/p&gt;

&lt;p&gt;Define the schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;classification_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;other&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;boolean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Send it with response_format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_utterance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://llm-gateway.assemblyai.com/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s intent for a customer service workflow.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intent_classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;classification_schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_utterance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I want to cancel my subscription right now.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {"intent": "cancel", "urgency": "high", "escalate": True, "summary": "..."}
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;transfer_to_human&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get back a parsed dict you can route on directly. Pair this with streaming for the user-facing reply: classify the intent (structured), then stream a conversational acknowledgment based on the classification. The user hears a friendly sentence in under a second while your code routes the call in the background.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Wire it all together with streaming STT
&lt;/h2&gt;

&lt;p&gt;The full pipeline looks like this—STT WebSocket on the inbound side, streamed LLM Gateway responses on the outbound side, with tool calls and structured outputs available when needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;assemblyai.streaming.v3&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingClientOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingParameters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;BeginEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TurnEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StreamingError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful voice assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TurnEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_of_turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;user_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_utterance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Let me get a human on the line for you.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;respond_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;assemblyai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;aai&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StreamingClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;StreamingClientOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;api_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;streaming.assemblyai.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StreamingEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;StreamingParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;speech_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;u3-rt-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MicrophoneStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;terminate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Speak into your microphone, ask about an order ID, and watch the agent execute the tool call and stream the spoken reply back. The combination of streaming STT, streaming LLM, and tool calling produces the responsive voice experience users now expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use which technique
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Use it when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Streaming reply only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The user asked a question; you want a fast, conversational answer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool calling + streamed reply&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The agent needs to act on real data (order lookup, scheduling, transfers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Structured outputs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You need machine-readable output for routing, classification, or downstream logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Structured + streamed combo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Classify the intent in JSON, then stream a conversational acknowledgment to the user&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Skip the wiring with the Voice Agent API
&lt;/h2&gt;

&lt;p&gt;Streaming, tools, structured outputs, and an STT-LLM-TTS pipeline tied together—if you're building a single voice agent and don't need to swap LLM providers per request, AssemblyAI's Voice Agent API bundles all of this behind one WebSocket. You set a system prompt, register tools, and get back streamed audio with built-in turn detection and barge-in. Same Universal-3 Pro Streaming foundation, same fallback patterns, no glue code.&lt;/p&gt;

&lt;p&gt;The lower-level approach in this tutorial is the right call when you need maximum control—choosing different LLMs per request, applying custom retry logic, or running structured-output classification in parallel with the spoken reply. Both paths are first-class on AssemblyAI; pick the one that matches your constraint.&lt;/p&gt;

&lt;p&gt;Streaming everything is the new baseline for voice. Tool calling and structured outputs are what turn a streaming chatbot into something that can actually do work. Build for both and your voice agent stops feeling like a demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does it mean to stream LLM responses in a voice pipeline?
&lt;/h3&gt;

&lt;p&gt;Streaming LLM responses means receiving and processing the model's output token by token as it's generated, instead of waiting for the full response to complete. In a voice pipeline, streaming lets you forward each completed sentence to a text-to-speech engine the moment the LLM emits it—so the user hears the first sentence of the agent's reply while the LLM is still generating the second. This typically cuts perceived response time by 60–80% for any reply longer than two sentences.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I stream LLM responses through AssemblyAI's LLM Gateway?
&lt;/h3&gt;

&lt;p&gt;Set stream: True in your chat/completions request and read the response as a Server-Sent Events (SSE) stream. Each chunk contains a partial token in the choices[0].delta.content field. Buffer tokens, watch for sentence-ending punctuation, and flush each completed sentence to your TTS engine as soon as it's ready. Streaming is supported on OpenAI models in LLM Gateway today.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does tool calling work with the LLM Gateway?
&lt;/h3&gt;

&lt;p&gt;Tool calling lets your voice agent invoke functions to access data or trigger actions—looking up an order, scheduling a callback, transferring to a human. Define your tools as JSON Schema in the tools array, and when the model decides to call one it returns a tool_calls field on the assistant message. You execute the tool, append the result to the conversation history, and call the Gateway again to let the model produce a spoken response that incorporates the tool output. The schema is OpenAI-compatible, so the same code works across Claude, GPT, and Gemini.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I get structured JSON outputs from the LLM Gateway for voice agents?
&lt;/h3&gt;

&lt;p&gt;Yes—LLM Gateway supports structured outputs via JSON schema using the response_format parameter. This guarantees the model returns exactly the shape you specified, which is useful for intent classification, routing decisions, sentiment scoring, and any voice agent workflow that needs machine-readable output your downstream code can consume directly. A common voice pattern is to classify intent in JSON first, then stream a conversational acknowledgment back to the user while your code routes the call in the background.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the latency budget for a real-time voice agent using streamed LLM responses?
&lt;/h3&gt;

&lt;p&gt;A well-tuned voice pipeline targets 600–900 ms from the moment the user stops speaking to the moment they hear the agent's first audio. That budget breaks down roughly as: 200–500 ms for STT finalization, 150–400 ms for LLM time-to-first-token, 200–400 ms for TTS time-to-first-audio, and 50–150 ms of network overhead. Streaming everything—STT transcripts, LLM tokens, TTS audio—is what makes hitting that budget achievable.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I use the Voice Agent API instead of wiring streaming STT and LLM Gateway separately?
&lt;/h3&gt;

&lt;p&gt;Use the Voice Agent API when you're building a single voice agent and want one WebSocket that handles STT, LLM, TTS, turn detection, and tool calling out of the box. Use the lower-level streaming STT plus LLM Gateway approach when you need more control—choosing different LLMs per request, applying custom retry logic, or running structured-output classification in parallel with the spoken reply. Both options use the same Universal-3 Pro Streaming foundation, so accuracy is identical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does streaming work with tool calling and structured outputs?
&lt;/h3&gt;

&lt;p&gt;Yes—streaming composes with both. With tool calling, the agent thinks for a moment, executes the tool, then streams the spoken result token by token. With structured outputs, you typically don't stream the JSON itself (you want the complete object before parsing) but you can stream a separate conversational acknowledgment to the user while the structured classification finalizes in parallel.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
      <category>llm</category>
    </item>
    <item>
      <title>Build an AI voice agent for customer support that can look up orders</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 12 May 2026 17:59:18 +0000</pubDate>
      <link>https://dev.to/martschweiger/build-an-ai-voice-agent-for-customer-support-that-can-look-up-orders-1lil</link>
      <guid>https://dev.to/martschweiger/build-an-ai-voice-agent-for-customer-support-that-can-look-up-orders-1lil</guid>
      <description>&lt;p&gt;Tier-1 customer support is mostly the same five conversations on repeat: where's my order, can I change my address, can I get a refund, when does this ship, can I talk to a human. They're predictable, they're high-volume, and they don't need a person—they need a voice agent that can actually look things up.&lt;/p&gt;

&lt;p&gt;This tutorial walks you through building one. By the end, you'll have a Python voice agent that answers calls, listens for an order ID or email, calls into your backend to check the status, and reads the result back to the customer in real time. When something goes off-script, it transfers to a human with the full conversation context attached.&lt;/p&gt;

&lt;p&gt;We're using AssemblyAI's Voice Agent API—one WebSocket that handles the speech understanding, LLM reasoning, voice generation, turn detection, and tool calling in a single connection. Total time to a working prototype: about an afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most support voice agents fail
&lt;/h2&gt;

&lt;p&gt;Before we build, it's worth knowing where these things break. The pattern is almost always the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Customer says "my order ID is A-B-3-7-9-2"&lt;/li&gt;
&lt;li&gt;STT mishears it as "a b 37 92" or "ABE 379 to"&lt;/li&gt;
&lt;li&gt;The LLM calls get_order_status("ab3792") or worse, asks the customer to repeat&lt;/li&gt;
&lt;li&gt;Customer hangs up&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent didn't fail because the LLM was wrong. It failed because the speech-to-text layer couldn't capture the entity correctly. This is why entity accuracy on alphanumerics, emails, and phone numbers matters more than overall WER for support agents—and why we're building on Universal-3 Pro Streaming, which has a "16.7% mixed-entity error rate vs. 23-25% for competing models."&lt;/p&gt;

&lt;p&gt;The second-most-common failure: dead air during tool calls. The customer asks a question, the agent calls a backend, and there's a 2-3 second silence while the lookup runs. The Voice Agent API solves this by speaking a natural transition phrase ("let me check that for you") while the tool runs—no dead air, no awkward pauses.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A Python voice support agent that handles three real workflows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Order status lookup&lt;/strong&gt;—customer says "where's my order?" then the agent asks for the ID, looks it up, and reads back status, ETA, and tracking number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer info verification&lt;/strong&gt;—customer provides email or phone number, the agent looks up the account, and confirms identity before proceeding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human escalation&lt;/strong&gt;—customer asks for a person, or the agent gets stuck, and a graceful transfer happens with conversation context preserved&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AssemblyAI Voice Agent API (one WebSocket: STT + LLM + TTS)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;A backend with order data—we'll mock it; replace with your real CRM or order management system&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;websockets pyaudio python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create .env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;your_key_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Voice Agent API uses a single endpoint: wss://agents.assemblyai.com/v1/ws. One key, one connection, no separate STT or TTS providers to wire in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Define the support tools
&lt;/h2&gt;

&lt;p&gt;Tools are the agent's interface to your backend. The Voice Agent API uses standard JSON Schema, so anything you can describe with a schema, the agent can call.&lt;/p&gt;

&lt;p&gt;For a support agent, you typically want four tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up an order&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s current status, shipping ETA, and tracking number by order ID.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s order ID, e.g. ORD-12345 or 78231-ABC.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookup_account_by_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find a customer account using their email address.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s email address.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_recent_orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List the customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s most recent orders. Use after the account is verified.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max number of orders to return.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transfer_to_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Transfer the call to a human agent. Use when the customer asks, when you can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t help, or when the issue is sensitive.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Short reason for the transfer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Brief summary of the conversation so far.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now implement the actual functions. Replace these stubs with calls to your real backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ORDERS_DB&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-12345&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-05-09&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tracking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1Z999AA10123456784&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-67890&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-05-12&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tracking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ACCOUNTS_DB&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jane@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACC-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Jane Doe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ACCOUNT_ORDERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACC-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-12345&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-05-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$84.99&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-12100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-04-22&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$42.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ORDERS_DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookup_account_by_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;account&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ACCOUNTS_DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;account&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_recent_orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ACCOUNT_ORDERS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transfer_to_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transferred&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support-tier-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown_tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error-shape pattern matters. When get_order_status can't find an order, it returns a structured error rather than throwing—that gives the LLM the context it needs to apologize and ask the customer to verify the ID, instead of crashing the conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Write the system prompt
&lt;/h2&gt;

&lt;p&gt;The system prompt is where you encode the agent's behavior. For support, you want a few things every time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identity and tone&lt;/li&gt;
&lt;li&gt;When to ask for verification before sharing details&lt;/li&gt;
&lt;li&gt;When to use which tool&lt;/li&gt;
&lt;li&gt;When to transfer to a human&lt;/li&gt;
&lt;li&gt;Specific phrasing for transition moments (the "let me check that" line)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are Avery, a customer support agent for Acme Corp. Your goal is to help customers
quickly and accurately. You have access to tools that let you look up orders and accounts.

Behavior rules:
- Greet warmly and ask how you can help.
- For order questions, ask for the order ID first if the customer hasn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t given it.
- If a customer gives an email or phone number, use lookup_account_by_email to verify.
- Read order status, ETA, and tracking number clearly. Don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t read raw timestamps —
  say dates naturally (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Friday, May 9th&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;).
- When you need to call a tool, say a brief transition like &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Let me check on that&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
  or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;One moment while I pull that up.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
- If the customer asks for a human, sounds frustrated, or has a complex issue
  (refund disputes, damaged product, billing errors), use transfer_to_human and
  include a short summary.
- Never make up an order ID, status, or tracking number. If a tool returns an error,
  apologize, ask the customer to verify the ID, and try again.
- Keep replies short and conversational. This is a phone call, not an email.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "never make up" line is the most important sentence in the prompt. Without it, LLMs sometimes invent plausible-sounding tracking numbers when the lookup fails. With it, they ask for clarification instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Connect to the Voice Agent API
&lt;/h2&gt;

&lt;p&gt;Now the WebSocket connection. The pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open wss://agents.assemblyai.com/v1/ws with your API key&lt;/li&gt;
&lt;li&gt;Send session.update with the system prompt, tools, voice, and greeting&lt;/li&gt;
&lt;li&gt;Wait for session.ready, then start streaming microphone audio&lt;/li&gt;
&lt;li&gt;Handle incoming events—tool.call, reply.audio, transcript.user, reply.done
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyaudio&lt;/span&gt;

&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WS_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wss://agents.assemblyai.com/v1/ws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24000&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;WS_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;additional_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;greeting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi, this is Avery from Acme support. How can I help?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ivy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;

        &lt;span class="n"&gt;pa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PyAudio&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;mic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paInt16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frames_per_buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;speaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paInt16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;ready&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;pending_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_audio&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exception_on_overflow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input.audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="p"&gt;}))&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_messages&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.ready&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent ready. Start speaking.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply.audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
                    &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [tool] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply.done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;interrupted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                            &lt;span class="p"&gt;}))&lt;/span&gt;
                        &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;send_audio&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;handle_messages&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details that the docs flag and you'd otherwise debug for an hour:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't send tool.result immediately&lt;/strong&gt; when you receive tool.call. Accumulate results and send them inside the reply.done handler. Sending too early causes timing issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discard pending tool results on interruption.&lt;/strong&gt; If the user speaks while the agent is generating a transition phrase, you'll get reply.done with status: "interrupted"—clear the buffer and wait for the next turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice names are case-sensitive.&lt;/strong&gt; Use lowercase: ivy, james, mia, winter, bella. An invalid voice returns session.error.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Test the three workflows
&lt;/h2&gt;

&lt;p&gt;Run the script and walk through each support scenario. You should hear:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow 1—Order lookup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "Hi, I'm trying to check on order O-R-D 1-2-3-4-5"
Agent: "Sure, let me check on that... I see order ORD-12345. It shipped and is
        on its way — you should have it by Friday, May 9th. The tracking number
        is 1Z999AA10123456784."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Workflow 2—Email-based account lookup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "I forgot my order ID. Can you look me up by email?"
Agent: "Of course. What's the email on the account?"
You: "It's jane at example dot com."
Agent: "One moment... Got it, you're Jane Doe. I see two recent orders:
        ORD-12345 from May 1st for $84.99, and ORD-12100 from April 22nd
        for $42.00. Which one are you asking about?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Workflow 3—Human transfer:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;You:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I just want to talk to a person."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I understand. Let me get you over to a teammate now."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;tool.call:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;transfer_to_human(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user requested human"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Speak the order ID with hesitation, mumbles, accents, and natural disfluencies—that's where Universal-3 Pro Streaming earns its keep. The agent should still extract the ID correctly because it's tuned for the alphanumeric tokens that voice agents act on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Take it to the phone
&lt;/h2&gt;

&lt;p&gt;This works in your browser through your microphone, but real customer support runs on phones. Twilio Media Streams is the standard bridge—your server accepts the inbound call from Twilio and opens a parallel connection to the Voice Agent API, forwarding audio in both directions.&lt;/p&gt;

&lt;p&gt;The Voice Agent API supports audio/pcmu (G.711 u-law at 8 kHz) natively, which matches Twilio's codec exactly. No transcoding, no resampling. The Twilio integration guide walks through the full bridge in about 100 lines of TypeScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to harden before production
&lt;/h2&gt;

&lt;p&gt;Three things you'll want to nail down before pointing this at real customers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replace the in-memory mocks&lt;/strong&gt; with calls to your actual CRM or order management system. Add timeouts and error handling so a slow backend doesn't kill the conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log everything.&lt;/strong&gt; Save user transcripts, tool calls, results, and the agent's responses tied to a session ID. Conversation logs are your debugging tool when something goes wrong on call #4,712. Conversation intelligence features like speaker diarization can help you analyze these logs at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune turn detection for your acoustic environment.&lt;/strong&gt; The defaults work for most use cases. For phone audio with background noise, you may want to raise min_end_of_turn_silence_ms slightly so the agent doesn't cut off thoughtful pauses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where to go from there
&lt;/h2&gt;

&lt;p&gt;Once the basic order-lookup loop works, the same tool-calling pattern extends to every other support workflow you have: cancel an order, update a shipping address, request a refund, schedule a callback, fetch FAQ answers from a knowledge base. Add the function, describe it in the system prompt, and the agent picks it up—no new infrastructure.&lt;/p&gt;

&lt;p&gt;The compounding win: every conversation goes through the same Voice Agent API connection, the same transcription model, the same billing relationship. You're not assembling a new vendor stack; you're adding tools to an agent that already works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I build an AI voice agent for customer support that can look up orders?
&lt;/h3&gt;

&lt;p&gt;Build it on AssemblyAI's Voice Agent API, register a get_order_status function as a tool with JSON Schema, and connect to the WebSocket at wss://agents.assemblyai.com/v1/ws. The agent transcribes the customer's speech, decides when to call your function, executes it through your backend, and speaks the result back—all on a single connection. Most developers ship a working agent in an afternoon because there's no SDK to learn and no separate STT, LLM, or TTS providers to wire together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does speech-to-text accuracy matter so much for support voice agents?
&lt;/h3&gt;

&lt;p&gt;Support agents constantly need to capture alphanumeric tokens—order IDs, account numbers, email addresses, phone numbers—and a single transcription error breaks the workflow. If the STT layer mishears "ORD-12345" as "or 12 three 45," your get_order_status function gets a garbled ID and returns nothing. AssemblyAI's Voice Agent API is built on Universal-3 Pro Streaming, which has a "16.7% mixed-entity error rate vs. 23–25% for competing models"—that's the difference between tool calls that succeed and tool calls that silently fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does tool calling work with the AssemblyAI Voice Agent API?
&lt;/h3&gt;

&lt;p&gt;You register tools by passing an array of function definitions in session.tools on a session.update event. When the agent decides to call a tool, it emits a tool.call event with the function name and arguments. You execute the function and accumulate results, then send tool.result events inside your reply.done handler—not immediately on tool.call. While the tool runs, the agent speaks a brief transition phrase like "let me check that for you" so the conversation never goes silent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I connect AssemblyAI's Voice Agent API to phone calls with Twilio?
&lt;/h3&gt;

&lt;p&gt;Yes—the Voice Agent API supports audio/pcmu (G.711 u-law at 8 kHz) natively, which matches Twilio's codec exactly with no transcoding needed. You set up a server that accepts the inbound Twilio Media Streams call, opens a parallel WebSocket to the Voice Agent API, and forwards audio in both directions. The official Twilio integration guide walks through inbound and outbound calling in about 100 lines of TypeScript.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the best way to handle escalation to a human in a customer support voice agent?
&lt;/h3&gt;

&lt;p&gt;Register a transfer_to_human tool with parameters for reason and summary, and instruct the agent in the system prompt to call it when the customer asks for a person, sounds frustrated, or has a complex issue (refund disputes, billing errors, damaged products). The agent generates a short summary of the conversation that you forward to your human queue, so the receiving agent doesn't have to ask the customer to repeat themselves. This is one of the most important workflows to design well—a poor handoff feels worse than no AI at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to run a customer support voice agent on AssemblyAI?
&lt;/h3&gt;

&lt;p&gt;The Voice Agent API is $4.50/hr flat—covering speech understanding, LLM reasoning, voice generation, turn detection, and tool calling all in one bill. There are no per-token surcharges, no concurrency caps, and no separate invoices for STT, LLM, and TTS providers. Pricing is billed by the minute on actual conversation duration, and a free tier is available for testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do voice agents built with AssemblyAI work with healthcare workflows?
&lt;/h3&gt;

&lt;p&gt;AssemblyAI offers a BAA for HIPAA workloads and is SOC 2 Type 2, ISO 27001:2022, and PCI DSS v4.0 certified. For clinical use cases (medical front-office voice agents, healthcare contact centers), enable Medical Mode with domain="medical-v1" to improve transcription accuracy on medication names, procedures, conditions, and dosages. Do not point the agent at real PHI without a signed BAA in place.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building a voice-powered e-commerce shopping assistant</title>
      <dc:creator>Mart Schweiger</dc:creator>
      <pubDate>Tue, 12 May 2026 17:57:59 +0000</pubDate>
      <link>https://dev.to/martschweiger/building-a-voice-powered-e-commerce-shopping-assistant-30ok</link>
      <guid>https://dev.to/martschweiger/building-a-voice-powered-e-commerce-shopping-assistant-30ok</guid>
      <description>&lt;p&gt;Voice shopping has crossed an inflection point. Search by typing is being replaced by search by saying—"show me waterproof hiking boots under $150 in size 10," "add the second one to my cart," "what's the return policy on these." For e-commerce teams, that's both an opportunity and a problem: the existing product search and checkout flow was designed for clicks and keystrokes, not natural language.&lt;/p&gt;

&lt;p&gt;This tutorial walks through building a voice-powered shopping assistant that customers can actually talk to. By the end, you'll have a Python voice agent that handles four real shopping workflows—product search, add-to-cart, order tracking, and checkout assistance—all on top of a single WebSocket connection using AssemblyAI's Voice Agent API.&lt;/p&gt;

&lt;p&gt;The same pattern works whether you're embedding voice into a mobile shopping app, an in-store kiosk, a smart speaker integration, or a phone-based ordering line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why voice e-commerce is different from voice support
&lt;/h2&gt;

&lt;p&gt;If you've built a customer support Voice AI agent, the shopping use case looks similar—but the constraints are sharper:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entity accuracy is everything.&lt;/strong&gt; Sizes ("size ten and a half"), SKUs ("SKU 9-9-2-1-A"), prices ("under one fifty"), quantities ("get me three of those"). Mishear any of those and you've added the wrong item, the wrong size, or the wrong quantity to a cart someone is about to check out with.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversations are exploratory.&lt;/strong&gt; Support calls have a clear job-to-be-done; shopping conversations meander. The customer browses, narrows, compares, asks about returns, gets distracted, comes back. The agent has to track all of that without losing context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stakes shift mid-conversation.&lt;/strong&gt; "Tell me about this jacket" is low-stakes. "Charge my saved card for $284.50" is not. The agent needs to know when to ask for confirmation and when to just answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accent and code-switching show up.&lt;/strong&gt; Shoppers globally pronounce brand names, colors, and product categories differently. The agent needs to handle that gracefully.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Voice Agent API addresses these directly: built on Universal-3 Pro Streaming for high entity accuracy (16.7% mixed-entity error rate vs. 23–25% for competitors), with mid-conversation system prompt updates so you can tighten or loosen the agent's behavior as the customer moves from browsing to buying.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A Python voice shopping assistant that handles four workflows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Product search&lt;/strong&gt;—"show me wireless headphones under $200" → searches your catalog → reads back top results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart management&lt;/strong&gt;—"add the second one in black" → adds to cart → confirms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order tracking&lt;/strong&gt;—"where's my order from last week?" → looks up customer orders → reads back status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkout assistance&lt;/strong&gt;—guides the user through review and confirmation, never executing payment without explicit verbal "yes"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AssemblyAI Voice Agent API (one WebSocket: STT + LLM + TTS)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;A product catalog and order DB—we'll mock both; replace with your real Shopify, Commerce Cloud, or BigCommerce backend&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;websockets pyaudio python-dotenv

&lt;span class="c"&gt;# .env&lt;/span&gt;
&lt;span class="nv"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Endpoint: wss://agents.assemblyai.com/v1/ws. One key, one connection—the same key works for all AssemblyAI products.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Define the shopping tools
&lt;/h2&gt;

&lt;p&gt;The toolset shapes what your agent can do. Start with the four core shopping verbs and grow from there. The Voice Agent API supports tool calling natively, so each tool is defined as a JSON function schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the product catalog. Use whenever the customer is browsing or asking what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s available.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Free-text query, e.g. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;waterproof hiking boots&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_product_details&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get full details on a specific product, including return policy and stock.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add_to_cart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add a product to the customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s cart. Confirm size, color, and quantity before calling.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Specific size/color variant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;view_cart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read back the customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s current cart with subtotal.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remove_from_cart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Remove an item from the cart by line item ID.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_item_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_item_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checkout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Submit the order using the customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s saved payment and shipping. ONLY call after explicit verbal &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to a clear confirmation prompt.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmation_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The exact phrase the customer said to confirm, e.g. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes, place the order&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmation_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;track_order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up the status of a customer order.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The confirmation_phrase parameter on checkout is the trick that prevents accidental orders. The system prompt tells the agent it can only call checkout if the customer literally says yes—and the parameter forces the agent to record what was said. Your backend can additionally enforce that only a list of accepted phrases ("yes", "place the order", "go ahead") triggers the actual payment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Implement the backend (mocked)
&lt;/h2&gt;

&lt;p&gt;Replace these stubs with calls to your real catalog and order systems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;CATALOG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKU-2201&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Trail Runner 3 hiking boots&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;139.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;footwear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;waterproof&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hiking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKU-2201-BK-10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKU-2201-BK-11&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKU-2201-BR-10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;return_policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30-day free returns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKU-3104&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summit Pro waterproof boots&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;199.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;footwear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;waterproof&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hiking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SKU-3104-BK-10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;return_policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30-day free returns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;CART&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;  &lt;span class="c1"&gt;# In production, scope this per session
&lt;/span&gt;&lt;span class="n"&gt;ORDERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-9981&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-05-09&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tracking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1Z999AA10123456784&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CATALOG&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_product_details&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CATALOG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add_to_cart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CATALOG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;out_of_stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
                    &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_item_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LI-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;color&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;added&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;variant_not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;view_cart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;subtotal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subtotal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remove_from_cart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;CART&lt;/span&gt;
        &lt;span class="n"&gt;CART&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CART&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_item_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_item_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;removed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_item_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cart_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checkout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;accepted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;place the order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go ahead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buy it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmation_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;accepted&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmation_unclear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phrase_received&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmation_phrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-9982&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CART&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;track_order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ORDERS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown_tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Write a shopping-aware system prompt
&lt;/h2&gt;

&lt;p&gt;Shopping system prompts should encode three patterns: how to describe products on a phone (terse, scannable), how to gather variant info (size, color, quantity) before adding to cart, and how to confirm checkout.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are Riley, a friendly voice shopping assistant for Trailgear, an outdoor retailer.

Behavior rules:

PRODUCT SEARCH
- When customers ask about products, call search_products with a clean query.
- Read back top 2-3 results conversationally. Don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t list more than 3 unless asked.
- Format prices naturally: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one hundred thirty-nine dollars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; not &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;139.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.
- Mention only the most relevant detail per product (price + key feature). Save the rest for follow-ups.

VARIANT SELECTION
- Before adding to cart, confirm size, color, and quantity. Never assume.
- If a variant is out of stock, say so immediately and offer the closest alternative.
- Read sizes naturally: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size ten&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; not &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size 10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.

CART MANAGEMENT
- After adding, briefly confirm what was added and the new cart size.
- If the customer asks &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s in my cart,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; call view_cart and read it back with subtotal.

CHECKOUT
- Before calling checkout, summarize the cart and explicitly ask: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Should I place the order?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
- ONLY call checkout if the customer responds with a clear yes (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yes,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;place it,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go ahead,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;).
- If the response is ambiguous, ask again. Do not interpret &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sure I think so&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; as confirmation.
- After checkout succeeds, read back the order ID slowly so the customer can write it down.

ORDER TRACKING
- For order status questions, ask for the order ID.
- When reading a tracking number, slow down and group digits in pairs.

GENERAL
- Keep replies short and conversational. This is voice, not chat.
- When you call a tool, say a brief transition like &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Let me look that up.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
- Never invent products, prices, or stock — if the catalog doesn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t have it, say so.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "explicit yes" pattern is what makes this safe to point at production payments. The agent's prompt forbids it from calling checkout on ambiguous responses, and the backend independently validates the confirmation phrase. Belt and suspenders.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Wire the WebSocket
&lt;/h2&gt;

&lt;p&gt;This is essentially the same WebSocket loop as a support agent—the difference is in the tools and prompt, not the protocol. If you've already built a voice agent with function calling, this will look familiar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyaudio&lt;/span&gt;

&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WS_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wss://agents.assemblyai.com/v1/ws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24000&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_assistant&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;WS_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;additional_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;greeting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi, this is Riley from Trailgear. What can I help you find today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;

        &lt;span class="n"&gt;pa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PyAudio&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;mic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paInt16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frames_per_buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;speaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pyaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paInt16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;ready&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;pending_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_audio&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exception_on_overflow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input.audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="p"&gt;}))&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_messages&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.ready&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Riley is ready.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Riley: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply.audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [tool] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="si"&gt;{}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}))&lt;/span&gt;
                    &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply.done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;interrupted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                            &lt;span class="p"&gt;}))&lt;/span&gt;
                        &lt;span class="n"&gt;pending_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;send_audio&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;handle_messages&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_assistant&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Test the four workflows
&lt;/h2&gt;

&lt;p&gt;Run the script and walk through each shopping flow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "I'm looking for waterproof hiking boots under one fifty."
Riley: "Let me check our catalog... I've got the Trail Runner 3 at $139,
        which is waterproof and great for day hikes, and the Summit Pro
        at $199 — that's premium with extra ankle support. Want to hear
        more about either?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Add to cart:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "The Trail Runner threes — size ten, in black."
Riley: "Got it, the Trail Runner 3 in black, size ten. How many?"
You: "Just one."
Riley: "One moment... Added. That's one item in your cart."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Checkout:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "Okay let's check out."
Riley: "Sure. You have one Trail Runner 3 in black, size ten, for $139.
        Should I place the order?"
You: "Yes, place it."
Riley: "Order placed. Your order ID is O-R-D 9-9-8-2."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Track order:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "Where's my order from last week — ORD 9-9-8-1?"
Riley: "Let me check... ORD-9981 has shipped and should arrive Friday,
        May 9th. Tracking is 1Z 99 9A A1 01 23 45 67 84."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tracking number readback is intentional—grouping digits in pairs is a common voice pattern that makes long alphanumerics easier to write down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this gets harder in production
&lt;/h2&gt;

&lt;p&gt;Two patterns to plan for once the basic loop works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personalization.&lt;/strong&gt; Authenticated shoppers expect the agent to know their saved address, recent purchases, and size preferences. Add a get_customer_profile() tool gated on session auth. Use the result in the system prompt via mid-conversation session.update so the agent personalizes without re-asking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn refinement.&lt;/strong&gt; "Show me hiking boots" → "in waterproof" → "size ten only" → "under $150." Each refinement should narrow the same result set rather than triggering a fresh search. Pass a session_filters object as a tool parameter and have the agent accumulate filters across turns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where to take it from here
&lt;/h2&gt;

&lt;p&gt;The same architecture extends to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;In-store kiosks&lt;/strong&gt; for hands-free product search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phone-based ordering lines&lt;/strong&gt; for restaurants, takeout, and reorders (&lt;a href="https://www.assemblyai.com/docs/voice-agents/voice-agent-api/connect-to-twilio" rel="noopener noreferrer"&gt;Twilio Media Streams + the Voice Agent API Twilio integration&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile shopping apps&lt;/strong&gt; with a press-and-hold voice button&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart speaker integrations&lt;/strong&gt; that hand off to your agent when the user wants to shop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What stays the same across all of those: one WebSocket, one system prompt, one tool registry. The voice agent is the same regardless of the front-end channel.&lt;/p&gt;

&lt;p&gt;Voice shopping isn't replacing search bars or product pages—it's running alongside them, picking up the conversational moments those interfaces can't handle. Build for the conversational moments and the rest of your funnel benefits from it. For teams building AI-powered customer service workflows, the same voice agent architecture handles both pre-sale shopping and post-sale support.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I build a voice-powered shopping assistant for e-commerce?
&lt;/h3&gt;

&lt;p&gt;Build it on AssemblyAI's Voice Agent API and register the four core shopping verbs as tools: search_products, add_to_cart, view_cart, and checkout. The agent transcribes the customer's speech, calls your catalog and cart functions, and speaks the result back—all on a single WebSocket. Most developers have a working voice shopping assistant running the same day, with no SDK to install and no separate STT, LLM, or TTS providers to manage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can a voice shopping assistant handle product variants like size, color, and quantity?
&lt;/h3&gt;

&lt;p&gt;Yes—define the variant fields as parameters on your add_to_cart tool (e.g., variant_id, quantity) and instruct the agent in the system prompt to confirm size, color, and quantity before calling the function. The Voice Agent API is built on Universal-3 Pro Streaming, which has industry-leading accuracy on alphanumeric tokens like sizes, SKUs, and quantities—that's what makes "size ten and a half" reliably parse as 10.5 instead of 10 or 1010.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I prevent accidental orders in a voice checkout flow?
&lt;/h3&gt;

&lt;p&gt;Use a two-layer pattern: the system prompt instructs the agent to only call the checkout tool after an explicit verbal "yes" to a clear confirmation prompt, and the checkout function itself accepts a confirmation_phrase parameter that your backend independently validates against an accepted list ("yes," "place the order," "go ahead," "confirm"). This belt-and-suspenders design ensures ambiguous responses like "sure I think so" never trigger a real charge.&lt;/p&gt;

&lt;h3&gt;
  
  
  What channels can I deploy a voice shopping assistant on?
&lt;/h3&gt;

&lt;p&gt;The same Voice Agent API connection powers in-app voice (mobile or web with a press-and-hold button), in-store kiosks for hands-free product search, phone-based ordering lines via Twilio Media Streams, and smart speaker integrations that hand off to your agent for shopping. The system prompt and tool registry stay the same across channels—only the front-end audio path changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does the AssemblyAI Voice Agent API compare to Vapi or Retell for e-commerce?
&lt;/h3&gt;

&lt;p&gt;The Voice Agent API is infrastructure rather than a platform—it gives you a standard JSON WebSocket with full control over conversation design, tool integrations, and agent behavior, so your voice shopping experience can feel uniquely yours instead of like every other agent built on a no-code platform. Vapi and Retell are higher-level platforms that work well for non-technical configuration but constrain custom integrations and agent personality. For e-commerce teams that already have engineering capacity, the Voice Agent API is typically a better fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I personalize a voice shopping assistant for authenticated customers?
&lt;/h3&gt;

&lt;p&gt;Add a get_customer_profile tool that returns the customer's saved address, payment, recent purchases, and size preferences, gated on session auth. The Voice Agent API supports mid-conversation session.update events, so you can update the system prompt with the customer's context after they authenticate without dropping the connection. The agent can then personalize recommendations, default to known sizes, and skip questions like "what's your shipping address?"&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to run a voice shopping assistant on AssemblyAI?
&lt;/h3&gt;

&lt;p&gt;The Voice Agent API is $4.50/hr flat-rate, covering STT, LLM, voice generation, turn detection, and tool calling on a single bill. There are no per-token surcharges, no concurrency caps, and no separate invoices for the STT, LLM, and TTS layers—pricing is billed by the minute on actual conversation duration. A free tier with $50 in starter credits is available for testing.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
      <category>ecommerce</category>
    </item>
  </channel>
</rss>
