<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: WanjohiChristopher</title>
    <description>The latest articles on DEV Community by WanjohiChristopher (@wanjohichristopher).</description>
    <link>https://dev.to/wanjohichristopher</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F326317%2F0db4c2e4-befa-413b-acc7-1900e8e582a4.JPG</url>
      <title>DEV Community: WanjohiChristopher</title>
      <link>https://dev.to/wanjohichristopher</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wanjohichristopher"/>
    <language>en</language>
    <item>
      <title>Voxtral TTS: Is Open-Source Voice AI About to Disrupt ElevenLabs?</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Fri, 29 May 2026 15:29:08 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/voxtral-tts-is-open-source-voice-ai-about-to-disrupt-elevenlabs-1k51</link>
      <guid>https://dev.to/wanjohichristopher/voxtral-tts-is-open-source-voice-ai-about-to-disrupt-elevenlabs-1k51</guid>
      <description>&lt;p&gt;The voice AI landscape has been dominated by a handful of closed providers for years. If you wanted state-of-the-art text-to-speech (TTS), realistic voice cloning, emotional speech generation, and low-latency streaming, you typically had one option: pay for an API.&lt;/p&gt;

&lt;p&gt;That may be changing.&lt;/p&gt;

&lt;p&gt;In March 2026, Mistral AI released &lt;a href="https://huggingface.co/mistralai/Voxtral-4B-TTS-2603" rel="noopener noreferrer"&gt;Voxtral TTS&lt;/a&gt;, a &lt;strong&gt;4-billion-parameter&lt;/strong&gt; open-weights text-to-speech model that challenges the long-standing assumption that frontier voice AI must remain proprietary. In Mistral's human evaluations, native speakers preferred Voxtral over ElevenLabs for multilingual voice cloning in &lt;strong&gt;68.4%&lt;/strong&gt; of side-by-side comparisons, judged on naturalness and expressivity. (Worth noting that scope: the headline number is specifically for multilingual cloning, not a blanket "better at everything" claim.)&lt;/p&gt;

&lt;p&gt;For AI engineers, voice-agent builders, and researchers, this is one of the most important open-weight AI releases of the year.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Voice AI Has Been Different
&lt;/h2&gt;

&lt;p&gt;Unlike large language models, speech synthesis has remained largely controlled by commercial providers. While the AI community gained access to powerful open-weight language models such as Llama, Qwen, DeepSeek, and Mistral, high-quality TTS remained mostly locked behind APIs.&lt;/p&gt;

&lt;p&gt;There were several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speech datasets are expensive to collect.&lt;/li&gt;
&lt;li&gt;Natural prosody is difficult to model.&lt;/li&gt;
&lt;li&gt;Real-time inference requires significant optimization.&lt;/li&gt;
&lt;li&gt;Voice cloning introduces safety and abuse concerns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, companies such as ElevenLabs built strong moats around their speech technology. Voxtral represents one of the first serious attempts to challenge that moat using an open-weight approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Voxtral TTS?
&lt;/h2&gt;

&lt;p&gt;Voxtral TTS is an open-weights text-to-speech model released by Mistral AI. Key capabilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4 billion&lt;/strong&gt; parameters&lt;/li&gt;
&lt;li&gt;Streaming generation&lt;/li&gt;
&lt;li&gt;Approximately &lt;strong&gt;70 ms&lt;/strong&gt; time-to-first-audio under optimized H200 inference conditions&lt;/li&gt;
&lt;li&gt;Voice cloning from a &lt;strong&gt;3-second&lt;/strong&gt; reference clip&lt;/li&gt;
&lt;li&gt;Reference-based emotion transfer&lt;/li&gt;
&lt;li&gt;Natural pauses and conversational speech patterns&lt;/li&gt;
&lt;li&gt;Cross-lingual voice transfer&lt;/li&gt;
&lt;li&gt;Support for &lt;strong&gt;9 languages&lt;/strong&gt; - Arabic, Dutch, English, French, German, Hindi, Italian, Portuguese, and Spanish&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the most impressive capabilities is cross-lingual voice transfer. For example, a French speaker's voice can be used to generate natural English speech without retraining the model. This has significant implications for multilingual assistants, customer support systems, and global AI products.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the 70 ms Latency Matters
&lt;/h2&gt;

&lt;p&gt;Many people focus on voice quality. Engineers focus on latency. A voice assistant may sound amazing, but if it takes &lt;strong&gt;500 milliseconds&lt;/strong&gt; to begin speaking, users perceive it as slow.&lt;/p&gt;

&lt;p&gt;Human conversations operate on extremely short turn-taking cycles. Research consistently shows that delays above a few hundred milliseconds make conversations feel unnatural. Mistral reports a time-to-first-audio of approximately &lt;strong&gt;70 milliseconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For comparison:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human conversational response gaps are often around &lt;strong&gt;200 milliseconds&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Many cloud TTS APIs require significantly longer startup times.&lt;/li&gt;
&lt;li&gt;Real-time AI agents depend heavily on reducing latency at every stage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is particularly relevant for systems such as customer service agents, AI receptionists, real-time translators, interactive tutoring systems, and autonomous voice assistants. Low-latency speech generation is becoming as important as model intelligence itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Voxtral Works
&lt;/h2&gt;

&lt;p&gt;Voxtral uses a hybrid architecture that splits speech generation into two stages, tied together by a custom neural codec. A common misconception is that it replaces autoregressive generation with flow matching. It does not. It uses both, for different parts of the problem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    ref["Voice reference&amp;lt;br/&amp;gt;(3-30s)"] --&amp;gt; enc["Voxtral Codec&amp;lt;br/&amp;gt;encoder"]
    enc --&amp;gt;|"ref audio tokens (12.5 Hz)"| bb["Autoregressive Decoder&amp;lt;br/&amp;gt;Backbone (Ministral-3B)"]
    text["Text prompt tokens"] --&amp;gt; bb
    bb --&amp;gt; lin["Linear Head&amp;lt;br/&amp;gt;semantic token"]
    bb --&amp;gt; flow["Flow-Matching Transformer&amp;lt;br/&amp;gt;(acoustic head)&amp;lt;br/&amp;gt;acoustic tokens"]
    lin -.-&amp;gt;|"conditions per timestep"| flow
    lin --&amp;gt;|semantic| dec["Voxtral Codec&amp;lt;br/&amp;gt;decoder (VQ-FSQ)"]
    flow --&amp;gt;|acoustic| dec
    dec --&amp;gt; out["24 kHz waveform"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Autoregressive Semantic Backbone
&lt;/h3&gt;

&lt;p&gt;The model is built on Mistral's Ministral-3B architecture. A voice reference (&lt;strong&gt;3 to 30 seconds&lt;/strong&gt;) is first encoded by the Voxtral Codec into audio tokens at a &lt;strong&gt;12.5 Hz&lt;/strong&gt; frame rate, where each frame carries both a semantic token and an acoustic token. Those reference tokens, together with the text prompt tokens, are fed to the autoregressive decoder backbone, which generates a sequence of semantic tokens one step at a time until it emits a special end-of-audio token.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Flow-Matching Acoustic Head
&lt;/h3&gt;

&lt;p&gt;This is where Voxtral diverges from a pure autoregressive design, but it layers flow matching on top of autoregression rather than abandoning it. At each timestep, the semantic token produced by the backbone conditions a separate acoustic head, a flow-matching transformer, which predicts the acoustic tokens. So the system is autoregressive for the semantic stream and flow-matching for the acoustic stream. Flow matching fills in high-fidelity acoustic detail while keeping inference fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Voxtral Codec (Hybrid VQ-FSQ)
&lt;/h3&gt;

&lt;p&gt;Both token streams are encoded and decoded by the Voxtral Codec, a speech tokenizer Mistral trained from scratch. It uses a split quantization scheme: vector quantization (VQ) for the semantic tokens and finite scalar quantization (FSQ) for the acoustic tokens. The semantic path also receives a distillation loss from a supervised ASR model, which keeps those tokens linguistically meaningful. At the end, the semantic and acoustic tokens are decoded together into the final &lt;strong&gt;24 kHz&lt;/strong&gt; waveform.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    inp["24 kHz audio"] --&amp;gt; encoder["Encoder&amp;lt;br/&amp;gt;Conv + Transformer&amp;lt;br/&amp;gt;(to 12.5 Hz)"]
    encoder --&amp;gt; vq["VQ&amp;lt;br/&amp;gt;semantic tokens"]
    encoder --&amp;gt; fsq["FSQ&amp;lt;br/&amp;gt;acoustic tokens"]
    vq --&amp;gt; decoder["Decoder&amp;lt;br/&amp;gt;Transformer + Conv"]
    fsq --&amp;gt; decoder
    decoder --&amp;gt; outp["Reconstructed&amp;lt;br/&amp;gt;24 kHz audio"]
    asr["Supervised ASR model"] -.-&amp;gt;|"distillation loss"| vq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. 12.5 Hz Frame Rate
&lt;/h3&gt;

&lt;p&gt;Operating at a low &lt;strong&gt;12.5 Hz&lt;/strong&gt; frame rate keeps the number of tokens the model has to generate small. That is a major reason Voxtral can reach roughly &lt;strong&gt;70 ms&lt;/strong&gt; time-to-first-audio while still producing natural-sounding speech.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice Cloning in Three Seconds
&lt;/h2&gt;

&lt;p&gt;Perhaps the most attention-grabbing feature is voice cloning from only &lt;strong&gt;three seconds&lt;/strong&gt; of reference audio. Historically, voice cloning systems required minutes of training audio, speaker adaptation procedures, and fine-tuning.&lt;/p&gt;

&lt;p&gt;Modern foundation models are increasingly able to infer speaker characteristics from extremely short samples. Voxtral extracts speaker identity information from a brief reference clip and conditions generation on those characteristics. The result is speech that preserves vocal tone, speaking style, rhythm, and intonation. This dramatically lowers the barrier for personalized voice applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implications for AI Agents
&lt;/h2&gt;

&lt;p&gt;The biggest impact may not be content creation. It may be AI agents. Most modern voice-agent stacks contain several components: speech-to-text (ASR), a language model, and text-to-speech (TTS). Historically, the TTS component has often been the most closed and expensive layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    cin["Caller audio"] --&amp;gt; asr["ASR&amp;lt;br/&amp;gt;(speech to text)"]
    asr --&amp;gt; llm["LLM&amp;lt;br/&amp;gt;(response)"]
    llm --&amp;gt; tts["Voxtral TTS&amp;lt;br/&amp;gt;(text to speech)"]
    tts --&amp;gt; cout["Audio reply"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Voxtral enables developers to self-host that layer. This creates opportunities for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lower infrastructure costs&lt;/li&gt;
&lt;li&gt;Reduced vendor lock-in&lt;/li&gt;
&lt;li&gt;Better privacy controls&lt;/li&gt;
&lt;li&gt;Fully local voice agents&lt;/li&gt;
&lt;li&gt;Edge deployment scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams building conversational AI, this is potentially transformative.&lt;/p&gt;

&lt;p&gt;For voice AI engineers, Voxtral is arguably more interesting as an architectural contribution than as a benchmark result. The hybrid autoregressive plus flow-matching design demonstrates a path toward combining low latency, strong speaker similarity, and expressive speech generation in a single model. Expect future open-weight voice models to adopt similar hybrid architectures.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;One important caveat.&lt;/strong&gt; Voxtral's weights are released under &lt;strong&gt;CC BY-NC 4.0&lt;/strong&gt;, a non-commercial license inherited from the voice datasets it was trained on (EARS, CML-TTS, IndicVoices-R, and others). You can self-host it today for research, prototyping, internal tools, and personal projects, but shipping it inside a commercial product would require a separate commercial license from Mistral. So the "self-host to cut costs" story is real for experimentation, but it is not yet a drop-in replacement for a paid API in production.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Does This Kill ElevenLabs?
&lt;/h2&gt;

&lt;p&gt;No. At least not yet. ElevenLabs still maintains several advantages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production Infrastructure.&lt;/strong&gt; Running a research model and operating a globally scalable voice platform are very different challenges. ElevenLabs has invested heavily in reliability, scaling, monitoring, and developer tooling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proprietary Datasets.&lt;/strong&gt; Data remains one of the strongest competitive advantages in AI. Even if architectures become public, proprietary speech datasets can continue to provide significant performance benefits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise Features.&lt;/strong&gt; Organizations often care about compliance, security, support, SLAs, and governance. These are areas where commercial providers continue to have advantages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Licensing.&lt;/strong&gt; For now, Voxtral's non-commercial license is itself part of ElevenLabs' moat. A startup can prototype on Voxtral for free, but the moment it wants to charge customers it has to either negotiate a commercial license with Mistral or pay for a production-ready API. That keeps the commercial door at least partly closed, regardless of how good the model sounds.&lt;/p&gt;

&lt;p&gt;One additional consideration is that all benchmark results reported in this article originate from Mistral's own evaluations. While the results are impressive, independent third-party benchmarking will be important to validate performance across broader workloads, deployment environments, and real-world voice-agent applications. As with any frontier AI model, external validation often reveals strengths and weaknesses not captured in vendor evaluations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens Next?
&lt;/h2&gt;

&lt;p&gt;The most likely outcome is not that ElevenLabs disappears. Instead, we may see the same pattern that occurred in large language models. Open-weight systems become increasingly capable, while commercial providers continue competing through infrastructure, convenience, reliability, and specialized features.&lt;/p&gt;

&lt;p&gt;This shifts the market from "Can open source compete?" to "Why pay for closed systems if open models are good enough?" That is exactly what happened with LLMs. Voice AI may be following the same trajectory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Researchers
&lt;/h2&gt;

&lt;p&gt;For researchers working on speech processing, voice agents, target speaker extraction, conversational AI, and human-computer interaction, Voxtral provides something extremely valuable: access. Researchers can now inspect, evaluate, modify, and build upon a frontier-level speech model rather than treating it as a black-box API.&lt;/p&gt;

&lt;p&gt;Historically, breakthroughs in AI accelerate when researchers gain direct access to the underlying models. Voxtral could become a similar catalyst for speech AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Voxtral TTS is more than another model release. It signals a broader shift in the voice AI ecosystem. For years, speech synthesis remained one of the strongest proprietary strongholds in artificial intelligence. Mistral's release demonstrates that frontier-quality voice generation can increasingly be delivered through open-weight models.&lt;/p&gt;

&lt;p&gt;Whether Voxtral ultimately dethrones ElevenLabs is almost beside the point. The real story is that developers, startups, researchers, and open-source communities now have access to a serious alternative. And history suggests that when powerful AI technology becomes openly available, innovation accelerates rapidly.&lt;/p&gt;

&lt;p&gt;The next generation of voice agents may not be built on closed APIs. They may be built on open foundations.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://wanjohichristopher.com/blog/ai/voxtral-tts/" rel="noopener noreferrer"&gt;wanjohichristopher.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2603.25551" rel="noopener noreferrer"&gt;Voxtral TTS paper (arXiv:2603.25551)&lt;/a&gt; · &lt;a href="https://huggingface.co/mistralai/Voxtral-4B-TTS-2603" rel="noopener noreferrer"&gt;Model weights on Hugging Face&lt;/a&gt; (CC BY-NC 4.0)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>voiceagents</category>
    </item>
    <item>
      <title>I built a phone number you can call and argue with an AI. Here's the part nobody tells you.</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Thu, 28 May 2026 02:34:53 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/i-built-a-phone-number-you-can-call-and-argue-with-an-ai-heres-the-part-nobody-tells-you-41f4</link>
      <guid>https://dev.to/wanjohichristopher/i-built-a-phone-number-you-can-call-and-argue-with-an-ai-heres-the-part-nobody-tells-you-41f4</guid>
      <description>&lt;p&gt;&lt;em&gt;Audience: engineers and the people who hire them. ~10 min read.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I wanted one thing: dial a regular phone number and have an AI support agent pick&lt;br&gt;
up and actually help. Pull from a knowledge base, book an appointment, sound like&lt;br&gt;
a person. The text-chat version of this is a solved problem now. The phone&lt;br&gt;
version is where the interesting engineering hides, because a phone call is a&lt;br&gt;
real-time, full-duplex audio stream, the model in the middle is slow, and the&lt;br&gt;
transcription is noisy enough that you can't treat it as authoritative.&lt;/p&gt;

&lt;p&gt;This is the story of building voice for TeaVoice, an AI customer-support&lt;br&gt;
platform. I'll show you the path I took, the wall I hit, and the four problems&lt;br&gt;
that don't exist in chat and absolutely do exist on a phone.&lt;/p&gt;

&lt;p&gt;First, a quick glossary so the rest reads clean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PSTN&lt;/strong&gt;: the regular phone network. Actual calls, not app-to-app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DID&lt;/strong&gt;: the phone number people dial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telnyx&lt;/strong&gt;: my telephony provider. It bridges the phone call to my server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Webhook&lt;/strong&gt;: Telnyx HTTP-POSTs my server when something happens on the call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media stream&lt;/strong&gt;: Telnyx sends me the &lt;em&gt;raw audio&lt;/em&gt;, live, instead of a transcript.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;STT / TTS&lt;/strong&gt;: speech→text and text→speech.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VAD&lt;/strong&gt;: voice activity detection. Figuring out when the caller stopped talking.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Attempt 1: let the phone company do the hard part
&lt;/h2&gt;

&lt;p&gt;The obvious first move is to let Telnyx handle speech. They have an API for it.&lt;br&gt;
The loop looks clean on a whiteboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A[Answer the call] --&amp;gt; B[Speak the greeting]
    B --&amp;gt; C[Wait for 'speak finished']
    C --&amp;gt; D[Start transcription]
    D --&amp;gt; E[Caller talks]
    E --&amp;gt; F[Transcription webhook arrives]
    F --&amp;gt; G[Stop transcription]
    G --&amp;gt; H[Run it through the AI]
    H --&amp;gt; I[Speak the reply]
    I --&amp;gt; C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Every box from "start transcription" to "transcription webhook" is Telnyx's to&lt;br&gt;
own. That's the part that bit me.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I built that. Then I spent a genuinely humbling number of hours discovering that&lt;br&gt;
the provider's transcription has trapdoors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One transcription engine returns &lt;code&gt;200 OK&lt;/code&gt; and then sends zero transcription
events. Forever. No error. It just silently does nothing.&lt;/li&gt;
&lt;li&gt;The moment I added a config option to pick a better transcription model, the
whole thing went quiet again. Same &lt;code&gt;200 OK&lt;/code&gt;, still no events.&lt;/li&gt;
&lt;li&gt;Their built-in "AI assistant" feature can only be started once per call, so you
can't use it to drive a turn-by-turn conversation with your own logic.&lt;/li&gt;
&lt;li&gt;And the speech recognition keeps transcribing the agent's own voice as if the
caller said it, so you have to choreograph exactly when you start and stop
listening around when you're talking.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It wasn't that the provider was bad. It was that the more of the audio pipeline I&lt;br&gt;
handed off, the less I could control the two things that actually matter: latency&lt;br&gt;
and correctness. I was tuning a black box.&lt;/p&gt;

&lt;p&gt;So I stopped asking the phone company to listen for me.&lt;/p&gt;


&lt;h2&gt;
  
  
  Attempt 2: take over the audio
&lt;/h2&gt;

&lt;p&gt;The better path: have Telnyx fork the raw audio of the call to my server over a&lt;br&gt;
WebSocket, and run my own everything. Now the flow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TB
    Caller([📞 Caller on PSTN]) &amp;lt;--&amp;gt;|phone audio| Telnyx[Telnyx Call Control]

    subgraph CP["Control plane: HTTP webhooks"]
        direction LR
        W1["call.initiated&amp;lt;br/&amp;gt;route number, create&amp;lt;br/&amp;gt;record, answer"]
        W2["call.answered&amp;lt;br/&amp;gt;start media stream"]
        W3["call.hangup&amp;lt;br/&amp;gt;clean up, finalize"]
    end

    subgraph DP["Data plane: media WebSocket"]
        direction TB
        VAD["VAD: detect end of turn&amp;lt;br/&amp;gt;(loudness + ~1.5s silence)"]
        STT["Speech-to-Text (Whisper)"]
        AI["AI pipeline&amp;lt;br/&amp;gt;guardrails → search → LLM&amp;lt;br/&amp;gt;(same brain as web chat)"]
        TTS["Text-to-Speech"]
        VAD --&amp;gt; STT --&amp;gt; AI --&amp;gt; TTS
    end

    Telnyx --&amp;gt;|HTTP events| CP
    Telnyx &amp;lt;--&amp;gt;|raw L16 audio| VAD
    TTS -.-&amp;gt;|synthesized audio back| Telnyx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two clean halves here. The &lt;strong&gt;control plane&lt;/strong&gt; is still webhooks, but&lt;br&gt;
tiny now. Just three events:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;call starts&lt;/strong&gt; → look up which business and which agent this number belongs
to, create a call record, answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;call answered&lt;/strong&gt; → tell Telnyx "stream the audio to this WebSocket."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;call hangs up&lt;/strong&gt; → clean up timers, finalize the record.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;data plane&lt;/strong&gt; is the audio WebSocket, and that's where everything&lt;br&gt;
interesting lives.&lt;/p&gt;

&lt;p&gt;The big win: the same AI brain that powers web chat now powers the phone. The&lt;br&gt;
transcript runs through the identical pipeline of content guardrails,&lt;br&gt;
knowledge-base search, the LLM, and output checks. I just wrap it with&lt;br&gt;
voice-specific instructions. One brain, two mouths.&lt;/p&gt;

&lt;p&gt;That's the architecture. Now the four problems that only exist on a phone.&lt;/p&gt;


&lt;h2&gt;
  
  
  Problem 1: "Are they done talking?"
&lt;/h2&gt;

&lt;p&gt;In chat, the user presses Enter. That's the turn boundary, handed to you for&lt;br&gt;
free. On a phone there's no Enter. You get a relentless stream of audio chunks&lt;br&gt;
and you have to &lt;em&gt;decide&lt;/em&gt; when the caller has finished a thought.&lt;/p&gt;

&lt;p&gt;I do the cheap, boring thing that works: measure how loud each chunk is (RMS&lt;br&gt;
amplitude), and call it "end of turn" after about 1.5 seconds of silence&lt;br&gt;
following speech. Buffers shorter than ~100ms get thrown away as noise. No ML, no&lt;br&gt;
fancy endpointing model. Just a loudness threshold and a silence counter.&lt;/p&gt;

&lt;p&gt;It's not glamorous, and it occasionally clips someone who pauses mid-sentence to&lt;br&gt;
think. But it's predictable, it adds zero latency, and "predictable" beats&lt;br&gt;
"clever" when you're debugging a live phone call.&lt;/p&gt;


&lt;h2&gt;
  
  
  Problem 2: the AI keeps interviewing itself
&lt;/h2&gt;

&lt;p&gt;Here's a bug that doesn't exist anywhere else. Because the audio stream is&lt;br&gt;
bidirectional (my TTS audio goes back out the same pipe the caller's audio comes&lt;br&gt;
in), the agent hears its own voice, transcribes it, and treats it as the caller&lt;br&gt;
talking. The AI ends up in a conversation with itself. It's funny for about ten&lt;br&gt;
seconds.&lt;/p&gt;

&lt;p&gt;Two guards fix it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;While I'm playing audio to the caller, I drop every incoming chunk on the
floor. The agent is deaf while it's speaking.&lt;/li&gt;
&lt;li&gt;For a full second &lt;em&gt;after&lt;/em&gt; I finish speaking, I keep ignoring incoming audio,
because there's a tail of echo and network delay where my own voice is still
arriving.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Crude? Yes, and it's a real tradeoff: going deaf while I talk means the caller&lt;br&gt;
can't interrupt me, which is closer to a walkie-talkie than a natural&lt;br&gt;
conversation. But echo cancellation is a rabbit hole, and "go deaf while you&lt;br&gt;
talk, plus a one-second cooldown" eliminated the self-conversation completely. It&lt;br&gt;
was a debugging-first choice, not ideal conversational UX.&lt;/p&gt;


&lt;h2&gt;
  
  
  Problem 3: the transcription is just... wrong a lot
&lt;/h2&gt;

&lt;p&gt;Phone audio is 8–16kHz of compressed, noisy garbage compared to a podcast mic.&lt;br&gt;
Whisper does its best, but you get transcripts like "I wanna book a point mint&lt;br&gt;
for toose day." If you treat that as gospel and the AI replies "I'm sorry, I&lt;br&gt;
didn't understand" every third turn, the call is unusable.&lt;/p&gt;

&lt;p&gt;The fix wasn't a better STT model. It was telling the AI to expect garbage and&lt;br&gt;
guess anyway. Before each turn I inject instructions that say, in effect:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This text came from speech recognition and may be wrong. Figure out what the&lt;br&gt;
caller &lt;em&gt;probably&lt;/em&gt; meant and help them. Don't say "could you repeat that" over&lt;br&gt;
and over. If it's truly unintelligible, ask &lt;strong&gt;one&lt;/strong&gt; specific clarifying&lt;br&gt;
question. Keep your answer to 1–2 sentences, because it's going to be read out&lt;br&gt;
loud.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;"Book a point mint for toose day" becomes "Sure, I can book an appointment for&lt;br&gt;
Tuesday. What time works?" The model is a fantastic error-correcting decoder if&lt;br&gt;
you give it permission to be one. That instruction prefix did more for call&lt;br&gt;
quality than anything I changed in the audio layer.&lt;/p&gt;

&lt;p&gt;Two details that mattered. I pass those instructions as a &lt;em&gt;separate&lt;/em&gt; system&lt;br&gt;
message, not glued onto the transcript, because otherwise the model occasionally&lt;br&gt;
repeats them back as if the caller said them. And I cap replies at 1–2 sentences,&lt;br&gt;
because nobody wants an AI reading a five-paragraph essay at them over the phone.&lt;/p&gt;


&lt;h2&gt;
  
  
  Problem 4: one thing at a time
&lt;/h2&gt;

&lt;p&gt;Audio chunks arrive continuously and I process turns as async tasks, so it's&lt;br&gt;
entirely possible for two turns to start overlapping: two TTS clips playing at&lt;br&gt;
once, two "am I speaking?" flags fighting each other. I wrap the&lt;br&gt;
AI-plus-speak-plus-playback part of each turn in a lock so exactly one turn runs&lt;br&gt;
at a time. Simple, and it kills a whole category of race conditions.&lt;/p&gt;


&lt;h2&gt;
  
  
  The whole turn, as one state machine
&lt;/h2&gt;

&lt;p&gt;Those four problems aren't separate features. They're a single loop. Here's the&lt;br&gt;
life of one conversational turn, including the deaf-while-speaking and cooldown&lt;br&gt;
states that keep the bot from hearing itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stateDiagram-v2
    [*] --&amp;gt; Listening
    Listening --&amp;gt; Capturing: caller speaks, loud enough
    Capturing --&amp;gt; Listening: too short, discard as noise
    Capturing --&amp;gt; Processing: ~1.5s of silence
    Processing --&amp;gt; Speaking: STT then AI then TTS
    Processing --&amp;gt; [*]: caller said goodbye
    Speaking --&amp;gt; Cooldown: playback finished
    Cooldown --&amp;gt; Listening: 1s echo guard elapsed

    note right of Speaking
        Deaf while speaking:
        every inbound chunk dropped
    end note
    note right of Cooldown
        Still deaf for 1s:
        tail echo is still arriving
    end note
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How the pros do this
&lt;/h2&gt;

&lt;p&gt;Before you conclude I invented something weird in a basement: I didn't. The&lt;br&gt;
cascade I just described (telephony → audio stream → speech-to-text → LLM →&lt;br&gt;
text-to-speech → back) is &lt;em&gt;the&lt;/em&gt; standard voice-agent architecture. It's what&lt;br&gt;
&lt;strong&gt;Pipecat AI&lt;/strong&gt; and &lt;strong&gt;LiveKit Agents&lt;/strong&gt; are frameworks for, and what platforms like&lt;br&gt;
Vapi, Retell, Deepgram's Voice Agent API, and ElevenLabs' Conversational AI all&lt;br&gt;
run under the hood. Pipecat in particular follows the same shape as&lt;br&gt;
what's in this post: transport → VAD → STT → LLM → TTS → transport, frame by&lt;br&gt;
frame. &lt;strong&gt;I hand-rolled a mini-Pipecat, emphasis on &lt;em&gt;mini&lt;/em&gt;.&lt;/strong&gt; The frameworks do&lt;br&gt;
the hard parts properly (interruption handling, streaming orchestration,&lt;br&gt;
partial-transcript routing) where I cut corners. If I were&lt;br&gt;
starting today and didn't want to learn these lessons the hard way, I'd reach for&lt;br&gt;
one of those frameworks first.&lt;/p&gt;

&lt;p&gt;Where the serious systems pull ahead is that they replace each of my deliberately&lt;br&gt;
crude mechanisms with a purpose-built model. My "wait for 1.5 seconds of silence"&lt;br&gt;
turn detection becomes a semantic turn-taking model (Deepgram's endpointing,&lt;br&gt;
ElevenLabs' dedicated turn-taking model, LiveKit's turn detector) that knows the&lt;br&gt;
difference between "I'm done" and "I'm thinking." My "go deaf while I'm speaking"&lt;br&gt;
echo guard becomes real acoustic echo cancellation plus true barge-in that cuts&lt;br&gt;
the bot off mid-sentence the instant you interrupt. And my batch "synthesize the&lt;br&gt;
whole reply, then play it" becomes streaming TTS, fed token-by-token straight from&lt;br&gt;
the LLM so the caller hears the first words while the rest is still generating:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Piece&lt;/th&gt;
&lt;th&gt;My crude version&lt;/th&gt;
&lt;th&gt;The production version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Turn detection&lt;/td&gt;
&lt;td&gt;Loudness + 1.5s silence&lt;/td&gt;
&lt;td&gt;Semantic turn-taking / endpointing model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Echo / interruption&lt;/td&gt;
&lt;td&gt;Go deaf while speaking&lt;/td&gt;
&lt;td&gt;Acoustic echo cancellation + real barge-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-to-speech&lt;/td&gt;
&lt;td&gt;Batch, then play&lt;/td&gt;
&lt;td&gt;Streaming, fed from LLM tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speech-to-text&lt;/td&gt;
&lt;td&gt;Buffer a turn, transcribe once&lt;/td&gt;
&lt;td&gt;Continuous streaming with partial results&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There's also a second paradigm worth knowing about, because it changes the whole&lt;br&gt;
picture. Everything above is a &lt;strong&gt;cascade&lt;/strong&gt;: three separate models in a row,&lt;br&gt;
flexible and debuggable but paying a latency tax at every hop. The frontier&lt;br&gt;
(Google's Gemini Live, OpenAI's Realtime API) is moving to &lt;strong&gt;speech-to-speech&lt;/strong&gt;.&lt;br&gt;
From the developer's perspective it's one model that takes audio in and emits&lt;br&gt;
audio out, with no separate transcription or synthesis step to wire up. It's&lt;br&gt;
lower latency and far better at tone, laughter, and interruptions. But it's a black box you can't inspect or swap pieces of, which is&lt;br&gt;
the exact problem that made me abandon "let the phone company do it" in the first&lt;br&gt;
place. Google is the tell here. Their &lt;em&gt;contact-center&lt;/em&gt; product is a cascade like&lt;br&gt;
mine, while their &lt;em&gt;frontier&lt;/em&gt; product is speech-to-speech: same company, two&lt;br&gt;
answers, because the right one depends on whether you value control or latency&lt;br&gt;
more.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    subgraph Cascade["Cascade: what I built (and Pipecat, Vapi, Deepgram...)"]
        direction LR
        a1([audio in]) --&amp;gt; a2[STT] --&amp;gt; a3[LLM] --&amp;gt; a4[TTS] --&amp;gt; a5([audio out])
    end
    subgraph S2S["Speech-to-speech: the frontier (Gemini Live, OpenAI Realtime)"]
        direction LR
        b1([audio in]) --&amp;gt; b2[one model] --&amp;gt; b3([audio out])
    end
    Cascade ~~~ S2S
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Three boxes, three latency hops, three things you can swap and debug. Versus one&lt;br&gt;
box that's faster and more natural but that you can't open up.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So here's the honest placement of this project. The core cascade architecture is&lt;br&gt;
industry-standard, the mechanisms are the simple-but-debuggable versions of what&lt;br&gt;
the specialists productize, and the next rung up the ladder is either swapping in&lt;br&gt;
better models for each stage or collapsing the whole cascade into a realtime&lt;br&gt;
speech-to-speech model.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it costs, and what I'd do next
&lt;/h2&gt;

&lt;p&gt;Every turn logs its budget: &lt;code&gt;STT time / AI time / TTS time&lt;/code&gt;. That single log line&lt;br&gt;
is the most useful thing I added, because on a phone call &lt;strong&gt;latency is the&lt;br&gt;
product.&lt;/strong&gt; A 4-second silence after someone asks a question feels broken even if&lt;br&gt;
the answer is perfect. In my setup, the LLM call dominated that budget, which points at the&lt;br&gt;
obvious next moves: stream the TTS so the caller hears the first words while the&lt;br&gt;
rest is still generating, and start synthesizing speech from the model's tokens&lt;br&gt;
as they arrive instead of waiting for the full reply.&lt;/p&gt;

&lt;p&gt;I'd also replace the loudness-based turn detection with a real endpointing model,&lt;br&gt;
and graduate the echo guard from "go deaf for a second" to actual acoustic echo&lt;br&gt;
cancellation. None of that was needed to ship something that works, though, and&lt;br&gt;
that's the point. The crude versions held up, and they were debuggable at 2am&lt;br&gt;
with a phone in one hand.&lt;/p&gt;




&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;The interesting part of voice AI turned out not to be the AI. It's the seam&lt;br&gt;
between a real-time audio stream and a slow, fallible language model: knowing when&lt;br&gt;
someone's done talking, stopping the bot from hearing itself, making the model&lt;br&gt;
robust to a transcriber that's wrong a third of the time, and watching your&lt;br&gt;
latency budget like it's the only metric that matters. Get those right with&lt;br&gt;
embarrassingly simple mechanisms, and the LLM part, the part everyone thinks is&lt;br&gt;
hard, is genuinely the easy bit you already built for chat.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Hermes Agent vs Openclaw</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Sun, 24 May 2026 02:58:43 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/hermes-vs-openclaw-2j4d</link>
      <guid>https://dev.to/wanjohichristopher/hermes-vs-openclaw-2j4d</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc" class="crayons-story__hidden-navigation-link"&gt;Hermes vs OpenClaw: The Two Most-Starred AI Agent Frameworks of 2026&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/wanjohichristopher" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F326317%2F0db4c2e4-befa-413b-acc7-1900e8e582a4.JPG" alt="wanjohichristopher profile" class="crayons-avatar__image" width="343" height="479"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/wanjohichristopher" class="crayons-story__secondary fw-medium m:hidden"&gt;
              WanjohiChristopher
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                WanjohiChristopher
                
              
              &lt;div id="story-author-preview-content-3728082" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/wanjohichristopher" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F326317%2F0db4c2e4-befa-413b-acc7-1900e8e582a4.JPG" class="crayons-avatar__image" alt="" width="343" height="479"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;WanjohiChristopher&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 22&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc" id="article-link-3728082"&gt;
          Hermes vs OpenClaw: The Two Most-Starred AI Agent Frameworks of 2026
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/comparison"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;comparison&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Hermes vs OpenClaw: The Two Most-Starred AI Agent Frameworks of 2026</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Fri, 22 May 2026 20:22:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc</link>
      <guid>https://dev.to/wanjohichristopher/hermes-vs-openclaw-the-two-most-starred-ai-agent-frameworks-of-2026-43pc</guid>
      <description>&lt;p&gt;The open-source agent space hit a real inflection point in 2026. Two projects now sit near the top of GitHub's charts, and they represent two very different ideas about what a personal AI agent should look like.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt;&lt;/strong&gt;: 163k stars, built by &lt;a href="https://nousresearch.com" rel="noopener noreferrer"&gt;Nous Research&lt;/a&gt;, written in Python. Tagline: &lt;em&gt;"The agent that grows with you."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;&lt;/strong&gt;: 374k stars, sponsored by OpenAI, GitHub, NVIDIA, and Vercel, written in TypeScript. Tagline: &lt;em&gt;"Your own personal AI assistant. Any OS. Any platform. The lobster way. 🦞"&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first glance they're solving the same problem: a personal assistant that lives across messaging platforms (Telegram, Discord, Slack, WhatsApp, Signal, iMessage…) and can reason, plan, and call tools. But once you dig in, they're going in pretty different directions. And one of them is already trying to migrate users away from the other.&lt;/p&gt;

&lt;p&gt;Here's what I learned reading the READMEs side by side.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 10-Second Summary
&lt;/h2&gt;

&lt;p&gt;Both projects ship the same baseline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-channel chat across Telegram, Discord, Slack, WhatsApp, Signal, iMessage, and others&lt;/li&gt;
&lt;li&gt;Tool calling for browser, shell, files, and scheduling&lt;/li&gt;
&lt;li&gt;Sandboxed execution&lt;/li&gt;
&lt;li&gt;Pluggable LLM providers (OpenAI, Anthropic, OpenRouter, local models)&lt;/li&gt;
&lt;li&gt;Persistent memory and per-user state&lt;/li&gt;
&lt;li&gt;MIT license&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The differences are where it gets interesting:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Built by&lt;/td&gt;
&lt;td&gt;Nous Research&lt;/td&gt;
&lt;td&gt;openclaw org (sponsored by OpenAI, GitHub, NVIDIA, Vercel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;TypeScript (Node 22.19+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub stars&lt;/td&gt;
&lt;td&gt;163k&lt;/td&gt;
&lt;td&gt;374k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standout feature&lt;/td&gt;
&lt;td&gt;A closed learning loop: self-improving skills and agent-curated memory&lt;/td&gt;
&lt;td&gt;Live Canvas: an agent-driven visual workspace, plus native macOS, iOS, and Android apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channels&lt;/td&gt;
&lt;td&gt;Telegram, Discord, Slack, WhatsApp, Signal, Email, CLI&lt;/td&gt;
&lt;td&gt;22+ including iMessage, Teams, Matrix, LINE, Feishu, Mattermost, WeChat, QQ, Nostr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills standard&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt; plus &lt;a href="https://github.com/plastic-labs/honcho" rel="noopener noreferrer"&gt;Honcho&lt;/a&gt; dialectic user modeling&lt;/td&gt;
&lt;td&gt;Bundled, managed, and workspace skills, plus the ClawHub registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;td&gt;MCP-native, 40+ built-in, RPC subagents&lt;/td&gt;
&lt;td&gt;Browser, canvas, nodes, cron, sessions, channel actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox&lt;/td&gt;
&lt;td&gt;Local Gateway as the control plane, plus companion macOS, iOS, and Android apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ideal user&lt;/td&gt;
&lt;td&gt;Developers who want an agent that learns from them across sessions&lt;/td&gt;
&lt;td&gt;People who want a polished personal assistant on every device and channel&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Makes Hermes Different: The Closed Learning Loop
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks treat memory like a database. You store facts, you retrieve them later, end of story. Hermes turns memory into a feedback loop instead.&lt;/p&gt;

&lt;p&gt;A few specifics worth calling out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous skill creation.&lt;/strong&gt; After a complex task, the agent can write its own skill (basically a reusable procedure) and save it for later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills self-improve during use.&lt;/strong&gt; When a skill fails or works well, the agent updates it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Periodic memory nudges.&lt;/strong&gt; The agent reviews and curates its own memory in the background, not just when you ask.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FTS5 session search with LLM summarization.&lt;/strong&gt; Past conversations are first-class context. Hermes can search and summarize what it has already done with you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honcho dialectic user modeling.&lt;/strong&gt; A separate component builds a persistent model of who you are across sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;agentskills.io standard.&lt;/strong&gt; Skills are portable across compatible agents, so you can share them or consume them from others.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bet behind all of this: an agent that gets sharper the more you use it is worth more than one that's smart on day one. As far as I can tell, Hermes is the only mainstream agent actually shipping this kind of closed loop today.&lt;/p&gt;

&lt;p&gt;The README puts it plainly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The self-improving AI agent. It creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Makes OpenClaw Different: Channel Breadth and the Live Canvas
&lt;/h2&gt;

&lt;p&gt;OpenClaw is optimizing for surface area, and two things really jump out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, the channel list is huge.&lt;/strong&gt; WhatsApp, Telegram, Slack, Discord, Google Chat, Signal, iMessage, IRC, Microsoft Teams, Matrix, Feishu, LINE, Mattermost, Nextcloud Talk, Nostr, Synology Chat, Tlon, Twitch, Zalo, WeChat, QQ, WebChat. Then add native macOS, iOS, and Android on top. If your team or your family is on it, OpenClaw probably bridges it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, the Live Canvas (with the A2UI protocol).&lt;/strong&gt; This is OpenClaw's most unique feature: an agent-driven visual workspace where the assistant can render and manipulate a live UI alongside the conversation. The agent draws a chart, builds a form, or sets up a kanban board on a shared canvas you can both see and edit. A2UI is the protocol that makes it work.&lt;/p&gt;

&lt;p&gt;Beyond those two, OpenClaw also ships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice Wake and Talk Mode.&lt;/strong&gt; Wake words on macOS and iOS, continuous voice on Android, with ElevenLabs as the default and system TTS as a fallback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A native macOS menu-bar app&lt;/strong&gt; with a push-to-talk overlay, gateway health, and WebChat built in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent routing.&lt;/strong&gt; Route inbound channels, accounts, and peers to isolated agents, each with its own workspaces and sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxing.&lt;/strong&gt; Docker by default, with SSH and OpenShell backends available.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenClaw's bet is essentially that most users don't want to live in a CLI. They want voice, vision, and presence on every device they already use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security: Same Primitives, Different Defaults
&lt;/h2&gt;

&lt;p&gt;Both projects take messaging-platform exposure seriously, and they share most of the primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DM pairing.&lt;/strong&gt; Unknown senders get a pairing code, and messages aren't processed until you approve.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Allowlist-based access control.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxed tool execution&lt;/strong&gt; for sessions that aren't your trusted main one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Doctor commands&lt;/strong&gt; (&lt;code&gt;hermes doctor&lt;/code&gt;, &lt;code&gt;openclaw doctor&lt;/code&gt;) that flag risky configs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where they diverge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw documents an explicit Gateway exposure runbook for anyone running the gateway on a publicly reachable network. Worth reading before you open the port.&lt;/li&gt;
&lt;li&gt;Hermes leans more on container and terminal isolation. Its seven terminal backends (Docker, Modal, Daytona, Vercel Sandbox, and others) let you scope exactly where tools actually run.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither one is meaningfully "safer by default." The real risk in both cases is the same: an agent connected to your messaging platforms is a fat target. Treat every inbound DM as untrusted input, and follow each project's security guide before any remote exposure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Migration Tool: A Competitive Tell
&lt;/h2&gt;

&lt;p&gt;The most revealing fact in the two READMEs (and the one most articles miss) is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hermes ships a built-in OpenClaw migration command.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes claw migrate              &lt;span class="c"&gt;# Interactive migration&lt;/span&gt;
hermes claw migrate &lt;span class="nt"&gt;--dry-run&lt;/span&gt;    &lt;span class="c"&gt;# Preview&lt;/span&gt;
hermes claw migrate &lt;span class="nt"&gt;--preset&lt;/span&gt; user-data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It imports SOUL.md persona files, MEMORY.md and USER.md entries, user-created skills (into &lt;code&gt;~/.hermes/skills/openclaw-imports/&lt;/code&gt;), command allowlists, messaging settings, allowlisted API keys, TTS assets, and workspace AGENTS.md instructions.&lt;/p&gt;

&lt;p&gt;That's not the behavior of a complementary project. That's a successor framework betting it can convert the larger user base. Nous Research is basically saying, in code, &lt;em&gt;if you're on OpenClaw, here's the door.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Whether the bet pays off depends on whether the closed learning loop matters more to users than channel breadth and the visual canvas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which One Should You Pick?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pick Hermes if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want an agent that actually &lt;em&gt;learns&lt;/em&gt;, improves its own skills, remembers you, and gets sharper over months.&lt;/li&gt;
&lt;li&gt;You live in Python and want MCP-native tool integration.&lt;/li&gt;
&lt;li&gt;You're a researcher or developer experimenting with agent cognition, trajectory training, or self-improvement.&lt;/li&gt;
&lt;li&gt;You're comfortable in a TUI and want serverless hosting (Modal, Daytona, Vercel Sandbox).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick OpenClaw if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want a polished personal assistant across every device: macOS menu bar, iOS, Android, voice.&lt;/li&gt;
&lt;li&gt;You need the niche messaging channels (iMessage, Teams, Matrix, WeChat, QQ, LINE, Feishu).&lt;/li&gt;
&lt;li&gt;The Live Canvas matters for your workflow (visual outputs, shared UIs).&lt;/li&gt;
&lt;li&gt;You're in a TypeScript shop and want it Node-native.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use both?&lt;/strong&gt; Probably not the move. They overlap heavily, and Hermes' migration tool suggests Nous expects you to pick one eventually.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Two years ago the agent debate was basically &lt;em&gt;can these systems do anything useful at all?&lt;/em&gt; In 2026 we've moved past that. The real question now is whether your agent should get smarter over time or just be everywhere you are.&lt;/p&gt;

&lt;p&gt;Hermes is the strongest bet on the first answer. OpenClaw is the strongest bet on the second. Both are MIT-licensed, both are production-grade, and both have raised the bar for what an open-source personal AI agent can be.&lt;/p&gt;

&lt;p&gt;The next interesting question is whether either project (or maybe some hybrid that hasn't shown up yet) manages to do both at scale.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Dig deeper:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hermes Agent: &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;github.com/NousResearch/hermes-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;OpenClaw: &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;github.com/openclaw/openclaw&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>comparison</category>
    </item>
    <item>
      <title>Building an AI-Powered Customer Churn Prediction Pipeline on AWS (Step-by-Step)</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Thu, 01 Jan 2026 00:56:43 +0000</pubDate>
      <link>https://dev.to/aws-builders/building-an-ai-powered-customer-churn-prediction-pipeline-on-aws-step-by-step-2l1f</link>
      <guid>https://dev.to/aws-builders/building-an-ai-powered-customer-churn-prediction-pipeline-on-aws-step-by-step-2l1f</guid>
      <description>&lt;p&gt;Hey folks! 👋&lt;/p&gt;

&lt;p&gt;I recently built a customer churn prediction system that not only predicts &lt;em&gt;who&lt;/em&gt; will leave — but also explains &lt;em&gt;why&lt;/em&gt; in plain English using Amazon Bedrock.&lt;/p&gt;

&lt;p&gt;In this tutorial, I'll walk you through building the entire pipeline from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we achieved:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;84.2% AUC&lt;/strong&gt; on validation data&lt;/li&gt;
&lt;li&gt;✅ Real-time predictions via SageMaker endpoint&lt;/li&gt;
&lt;li&gt;✅ Natural language explanations powered by Claude (Bedrock)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's dive in!&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 What We're Building
&lt;/h2&gt;

&lt;p&gt;An end-to-end ML pipeline that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingests&lt;/strong&gt; customer data into S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trains&lt;/strong&gt; a churn prediction model with SageMaker XGBoost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploys&lt;/strong&gt; a real-time inference endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explains&lt;/strong&gt; predictions using Amazon Bedrock (Claude)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exposes&lt;/strong&gt; everything via API Gateway + Lambda&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;: AWS account, basic Python knowledge&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodpfhf83fj77qn1aq76i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodpfhf83fj77qn1aq76i.png" alt="AWS Churn Prediction Architecture" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pipeline consists of 5 tiers:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Services&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Ingestion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Store raw customer data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SageMaker Training&lt;/td&gt;
&lt;td&gt;Train XGBoost model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Store model artifacts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference &amp;amp; AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SageMaker Endpoint, Bedrock&lt;/td&gt;
&lt;td&gt;Real-time predictions + NL explanations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API Gateway, Lambda&lt;/td&gt;
&lt;td&gt;Expose REST API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Step 1: Set Up S3 and Upload Data
&lt;/h2&gt;

&lt;p&gt;First, create an S3 bucket and upload the dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set bucket name with your account ID&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;churn-prediction-&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create bucket&lt;/span&gt;
aws s3 mb s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;

&lt;span class="c"&gt;# Upload your data&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;WA_Fn-UseC_-Telco-Customer-Churn.csv s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/raw/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;📥 &lt;strong&gt;Dataset&lt;/strong&gt;: Download the &lt;a href="https://www.kaggle.com/datasets/blastchar/telco-customer-churn" rel="noopener noreferrer"&gt;Telco Customer Churn dataset&lt;/a&gt; from Kaggle.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 2: Create SageMaker IAM Role
&lt;/h2&gt;

&lt;p&gt;In AWS Console:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;IAM&lt;/strong&gt; → &lt;strong&gt;Roles&lt;/strong&gt; → &lt;strong&gt;Create role&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;SageMaker - Execution&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add policies: &lt;code&gt;AmazonSageMakerFullAccess&lt;/code&gt; + &lt;code&gt;AmazonS3FullAccess&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Name it: &lt;code&gt;SageMakerChurnRole&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step 3: Train the Model
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;train_churn.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.inputs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TrainingInput&lt;/span&gt;

&lt;span class="c1"&gt;# Config
&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ROLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;PREFIX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn-prediction&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boto_region_name&lt;/span&gt;

&lt;span class="c1"&gt;# Load and prepare data
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;WA_Fn-UseC_-Telco-Customer-Churn.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TotalCharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TotalCharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coerce&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Churn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Churn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Encode categorical columns
&lt;/span&gt;&lt;span class="n"&gt;cat_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gender&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Partner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Dependents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PhoneService&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MultipleLines&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;InternetService&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OnlineSecurity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OnlineBackup&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DeviceProtection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TechSupport&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;StreamingTV&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;StreamingMovies&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Contract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PaperlessBilling&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PaymentMethod&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cat_cols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;

&lt;span class="c1"&gt;# Features
&lt;/span&gt;&lt;span class="n"&gt;feature_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SeniorCitizen&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tenure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MonthlyCharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TotalCharges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cat_cols&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;feature_cols&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Churn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Split and save
&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stratify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;test_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;train_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;train.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;test_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Upload to S3
&lt;/span&gt;&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;train.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PREFIX&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/train/train.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PREFIX&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/test/test.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Train XGBoost
&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_uris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;xgboost&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1.7-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;xgb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Estimator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ROLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml.m5.xlarge&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PREFIX&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sagemaker_session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_hyperparameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;binary:logistic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;auc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TrainingInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PREFIX&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/train&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TrainingInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PREFIX&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/test&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Deploy endpoint
&lt;/span&gt;&lt;span class="n"&gt;predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;initial_instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml.t2.medium&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;endpoint_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn-prediction-endpoint&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;serializer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serializers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CSVSerializer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;churn-prediction-YOUR_ACCOUNT_ID
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;arn:aws:iam::YOUR_ACCOUNT_ID:role/SageMakerChurnRole
python3 train_churn.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-01-01 00:24:27 Uploading - Uploading generated training model
2026-01-01 00:24:27 Completed - Training job completed
Training seconds: 103
Billable seconds: 103

✅ Training complete!
Model artifact: s3://churn-prediction-905418352184/churn-prediction/output/sagemaker-xgboost-2026-01-01-00-22-03-339/output/model.tar.gz

Deploying endpoint (3-5 min)...
INFO:sagemaker:Creating model with name: sagemaker-xgboost-2026-01-01-00-24-53-959
INFO:sagemaker:Creating endpoint-config with name churn-prediction-endpoint
INFO:sagemaker:Creating endpoint with name churn-prediction-endpoint
---------------!
✅ Endpoint deployed: churn-prediction-endpoint
Test prediction: 0.4% churn probability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4: Create Lambda with Bedrock Integration
&lt;/h2&gt;

&lt;p&gt;Create a Lambda function &lt;code&gt;ChurnPredictionAPI&lt;/code&gt; with this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;sagemaker_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ENDPOINT_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SAGEMAKER_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn-prediction-endpoint&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;

    &lt;span class="c1"&gt;# Get prediction from SageMaker
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;EndpointName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ENDPOINT_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ContentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text/csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;features&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;churn_prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate explanation with Bedrock Claude
&lt;/span&gt;    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A customer has &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;churn_prob&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; churn probability.
Customer: Tenure &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tenure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; months, $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;monthly_charges&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/month, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contract&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; contract.
In 2 sentences, explain the risk and suggest one retention action.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;bedrock_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-3-haiku-20240307-v1:0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;explanation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bedrock_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;risk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;High&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;churn_prob&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;churn_prob&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;headers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;churn_probability&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;churn_prob&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;risk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;explanation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;explanation&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmc4zh52rpmdaj0vguvb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmc4zh52rpmdaj0vguvb3.png" alt="Test Successful" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lambda configuration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime: Python 3.11&lt;/li&gt;
&lt;li&gt;Timeout: 30 seconds&lt;/li&gt;
&lt;li&gt;Role: &lt;code&gt;LambdaChurnRole&lt;/code&gt; (with SageMaker + Bedrock permissions)&lt;/li&gt;
&lt;li&gt;Environment variable: &lt;code&gt;SAGEMAKER_ENDPOINT=churn-prediction-endpoint&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 5: Create API Gateway
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create an &lt;strong&gt;HTTP API&lt;/strong&gt; in API Gateway&lt;/li&gt;
&lt;li&gt;Add Lambda integration → &lt;code&gt;ChurnPredictionAPI&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Create POST route: &lt;code&gt;/predict&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Deploy and get your invoke URL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh05fwunl0ri1uv0x56vy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh05fwunl0ri1uv0x56vy.png" alt="Create API Gateway" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dof7nenytw0lhbo1udl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dof7nenytw0lhbo1udl.png" alt="configure" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 Test the API
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://YOUR_API_URL/predict"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "features": "0,24,65.5,1500.0,1,0,1,2,0,0,1,1,0,0,1,0,2,1,1",
    "tenure": 24,
    "monthly_charges": 65.5,
    "contract": "Month-to-month"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgesmii5w1bkowmo61kha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgesmii5w1bkowmo61kha.png" alt="ChurnAPI" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
(.venv) server@DLG lambda_package % curl -X POST "https://jxairjovmi.execute-api.us-east-1.amazonaws.com/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "features": "0,24,65.5,1500.0,1,0,1,2,0,0,1,1,0,0,1,0,2,1,1",
    "tenure": 24,
    "monthly_charges": 65.5,
    "contract": "Month-to-month"
  }'

{"churn_probability": "0.6%", "risk_level": "Low", "explanation": "The customer's high churn probability of 0.6% and the month-to-month contract indicate a significant risk of losing the customer. To mitigate this risk, a retention action could be to offer the customer a longer-term contract with a discounted monthly rate or additional benefits, which may help increase their loyalty and reduce the likelihood of churn."}%     
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧹 Cleanup
&lt;/h2&gt;

&lt;p&gt;Don't forget to delete resources to avoid charges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete SageMaker endpoint (most expensive!)&lt;/span&gt;
aws sagemaker delete-endpoint &lt;span class="nt"&gt;--endpoint-name&lt;/span&gt; churn-prediction-endpoint
aws sagemaker delete-endpoint-config &lt;span class="nt"&gt;--endpoint-config-name&lt;/span&gt; churn-prediction-endpoint

&lt;span class="c"&gt;# Delete Lambda&lt;/span&gt;
aws lambda delete-function &lt;span class="nt"&gt;--function-name&lt;/span&gt; ChurnPredictionAPI

&lt;span class="c"&gt;# Delete S3 bucket&lt;/span&gt;
aws s3 rb s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  💡 Key Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SageMaker XGBoost is production-ready&lt;/strong&gt; — achieved 84% AUC with minimal tuning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock adds real business value&lt;/strong&gt; — converting predictions to actionable insights makes ML accessible to non-technical stakeholders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM permissions are tricky&lt;/strong&gt; — create roles via Console if CLI gives explicit deny errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost awareness matters&lt;/strong&gt; — always delete endpoints when not in use (~$0.05/hour adds up!)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/datasets/blastchar/telco-customer-churn" rel="noopener noreferrer"&gt;Telco Churn Dataset (Kaggle)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html" rel="noopener noreferrer"&gt;SageMaker XGBoost Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/" rel="noopener noreferrer"&gt;Amazon Bedrock Developer Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading! If this helped you, follow me for more AWS + Data Engineering content.&lt;/p&gt;

&lt;p&gt;Questions? Leave a comment below!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>dataengineering</category>
      <category>python</category>
    </item>
    <item>
      <title>𝗩𝗼𝗶𝗰𝗲 𝗔𝗜: 𝗧𝗧𝗦 - 𝗚𝗶𝘃𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗔𝗜 𝗮 𝗩𝗼𝗶𝗰𝗲</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Tue, 23 Dec 2025 13:45:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/--5a</link>
      <guid>https://dev.to/wanjohichristopher/--5a</guid>
      <description>&lt;p&gt;We've covered how Voice AI listens (ASR), understands (NLU), decides (Dialog Management), remembers (Context), and writes (NLG).&lt;/p&gt;

&lt;p&gt;Now for the final piece: 🔊 Making it speak.&lt;/p&gt;

&lt;p&gt;That's TTS - Text-to-Speech.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53a954o692eai0l0dv6m.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53a954o692eai0l0dv6m.gif" alt="TTS" width="600" height="247"&gt;&lt;/a&gt;&lt;br&gt;
𝗧𝗵𝗲 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:&lt;br&gt;
Input: "Great news! Your flight to Paris is confirmed."&lt;br&gt;
Output: 〰️〰️〰️ (audio waveform).&lt;/p&gt;

&lt;p&gt;𝗧𝗵𝗲 𝗧𝗧𝗦 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲:&lt;br&gt;
1️⃣ 𝗧𝗲𝘅𝘁 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 &lt;br&gt;
 • "How to pronounce this?"&lt;br&gt;
 • Normalization ($50 → "fifty dollars")&lt;br&gt;
 • Grapheme-to-phoneme conversion&lt;br&gt;
 • Homograph resolution (read vs read)&lt;br&gt;
2️⃣ 𝗣𝗿𝗼𝘀𝗼𝗱𝘆 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 &lt;br&gt;
 • How should it sound?&lt;br&gt;
 • Pitch contour (intonation)&lt;br&gt;
 • Duration (speed)&lt;br&gt;
 • Stress &amp;amp; emphasis&lt;br&gt;
 • Pauses&lt;br&gt;
3️⃣ 𝗔𝗰𝗼𝘂𝘀𝘁𝗶𝗰 𝗠𝗼𝗱𝗲𝗹 &lt;br&gt;
 • Generate mel spectrogram.&lt;br&gt;
 • Tacotron 2, FastSpeech 2, VITS.&lt;br&gt;
 • Maps phonemes → audio features.&lt;br&gt;
4️⃣ 𝗩𝗼𝗰𝗼𝗱𝗲𝗿 &lt;br&gt;
 • Convert to audio waveform.&lt;br&gt;
 • HiFi-GAN, WaveGlow, WaveNet.&lt;br&gt;
 • Spectrogram → actual audio.&lt;/p&gt;

&lt;p&gt;🎯 And that closes the loop:&lt;br&gt;
Listen → Think → Speak&lt;/p&gt;

&lt;p&gt;That’s the full Voice AI pipeline.&lt;/p&gt;

&lt;p&gt;Thanks for following along - next, I'll likely recap the full system and share a few real-world failure modes that make or break Voice AI in production. More coming soon. Keep building!!&lt;/p&gt;

&lt;p&gt;Cheers!!&lt;/p&gt;

</description>
      <category>tts</category>
      <category>ai</category>
    </item>
    <item>
      <title>𝗩𝗼𝗶𝗰𝗲 𝗔𝗜: 𝗡𝗟𝗚 - 𝗧𝘂𝗿𝗻𝗶𝗻𝗴 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻𝘀 𝗜𝗻𝘁𝗼 𝗪𝗼𝗿𝗱𝘀</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Mon, 22 Dec 2025 13:43:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/--2g92</link>
      <guid>https://dev.to/wanjohichristopher/--2g92</guid>
      <description>&lt;p&gt;Voice AI listens (ASR), understands (NLU), and decides (Dialog Management).&lt;/p&gt;

&lt;p&gt;But decisions aren't responses.&lt;br&gt;
The system knows: &lt;br&gt;
▶️ Action: inform &lt;br&gt;
▶️ Flight: booked &lt;br&gt;
▶️ Destination: Paris &lt;br&gt;
▶️ Date: Dec 20 &lt;br&gt;
▶️ Confirmation: AB123&lt;/p&gt;

&lt;p&gt;That's not what we say to a user.&lt;/p&gt;

&lt;p&gt;This is where 𝗡𝗟𝗚 (Natural Language Generation) comes in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj6ubvjql468a6x0yblk.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj6ubvjql468a6x0yblk.gif" alt="NLG" width="560" height="241"&gt;&lt;/a&gt;&lt;br&gt;
It transforms structured data into natural speech: &lt;br&gt;
Example:&lt;br&gt;
🤖 "Great news! Your flight to Paris on December 20th is confirmed. Your confirmation number is AB123. Have a wonderful trip!"&lt;/p&gt;

&lt;p&gt;𝗧𝗵𝗲 𝗡𝗟𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲:&lt;br&gt;
1️⃣ 𝗖𝗼𝗻𝘁𝗲𝗻𝘁 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 &lt;br&gt;
 🔹"What information to convey?" &lt;br&gt;
 🔹Select facts, order them, prioritize.&lt;br&gt;
2️⃣ 𝗦𝗲𝗻𝘁𝗲𝗻𝗰𝗲 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 &lt;br&gt;
 🔹"How to structure it?" &lt;br&gt;
 🔹One sentence or multiple? &lt;br&gt;
 🔹Combine facts?&lt;br&gt;
3️⃣ 𝗦𝘂𝗿𝗳𝗮𝗰𝗲 𝗥𝗲𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 &lt;br&gt;
 🔹"What exact words to use?" .&lt;br&gt;
 🔹Grammar, vocabulary, tone, fluency.&lt;/p&gt;

&lt;p&gt;𝗧𝗵𝗲 𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻:&lt;br&gt;
🔹Templates → slot-filling.&lt;br&gt;
🔹Statistical → n-grams, HMMs.&lt;br&gt;
🔹Neural → Seq2Seq, Transformers.&lt;br&gt;
🔹LLMs → GPT, Claude (SOTA) .&lt;br&gt;
Below are 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻s based on use case:&lt;br&gt;
🔹Need predictability → Templates.&lt;br&gt;
🔹Need natural variety → LLM.&lt;br&gt;
🔹Need both → Hybrid (LLM + guardrails).&lt;/p&gt;

&lt;p&gt;The difference between a robotic assistant and a delightful one? NLG.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>𝗩𝗼𝗶𝗰𝗲 𝗔𝗜: 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 &amp; 𝗠𝗲𝗺𝗼𝗿𝘆 - 𝗪𝗵𝘆 𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝘀 𝗗𝗼𝗻'𝘁 𝗥𝗲𝘀𝗲𝘁</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Sun, 21 Dec 2025 14:40:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/--4c64</link>
      <guid>https://dev.to/wanjohichristopher/--4c64</guid>
      <description>&lt;p&gt;Dialog Management means = deciding what to do next.&lt;/p&gt;

&lt;p&gt;But something else makes Voice AI feel human instead of robotic:&lt;/p&gt;

&lt;p&gt;🧠 Context and memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2uwu0umv4babqqqszqmt.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2uwu0umv4babqqqszqmt.gif" alt="Context and Memory" width="720" height="469"&gt;&lt;/a&gt;&lt;br&gt;
𝗪𝗵𝘆 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀&lt;br&gt;
Consider this exchange:&lt;br&gt;
🗣️ "Book me a flight to Paris." &lt;br&gt;
🗣️ "Make it business class."&lt;br&gt;
That second sentence only makes sense if the system remembers the first.&lt;br&gt;
That's context.&lt;br&gt;
𝗪𝗵𝗮𝘁 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 &amp;amp; 𝗺𝗲𝗺𝗼𝗿𝘆 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗶𝗻𝗰𝗹𝘂𝗱𝗲:&lt;br&gt;
→ 𝗦𝗵𝗼𝗿𝘁-𝘁𝗲𝗿𝗺 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 (session memory)&lt;br&gt;
 🔹Recent turns.&lt;br&gt;
 🔹Slot values.&lt;br&gt;
 🔹Corrections.&lt;br&gt;
 🔹Current dialog state.&lt;br&gt;
→ 𝗟𝗼𝗻𝗴-𝘁𝗲𝗿𝗺 𝗺𝗲𝗺𝗼𝗿𝘆&lt;br&gt;
 🔹User preferences.&lt;br&gt;
 🔹Past interactions.&lt;br&gt;
 🔹Frequent locations.&lt;br&gt;
 🔹Knowledge (RAG documents).&lt;br&gt;
This information feeds directly into Dialog Management so the system can make better decisions.&lt;/p&gt;

&lt;p&gt;Without memory, every interaction would feel like the first one.&lt;/p&gt;

&lt;p&gt;LLMs can reason - but the architecture decides what to remember, when to retrieve it, and when to forget.&lt;/p&gt;

&lt;p&gt;That balance is what makes Voice AI feel natural and safe.&lt;/p&gt;

</description>
      <category>context</category>
      <category>memory</category>
      <category>voiceai</category>
    </item>
    <item>
      <title>𝗩𝗼𝗶𝗰𝗲 𝗔𝗜: 𝗗𝗶𝗮𝗹𝗼𝗴 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 - 𝗧𝗵𝗲 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Sat, 20 Dec 2025 16:07:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/--5bjc</link>
      <guid>https://dev.to/wanjohichristopher/--5bjc</guid>
      <description>&lt;p&gt;We've talked about how Voice AI listens (ASR) and understands (NLU).&lt;/p&gt;

&lt;p&gt;But once the system understands the user, there's a harder question:&lt;br&gt;
👉 What should happen next?&lt;/p&gt;

&lt;p&gt;This is where Dialog Management comes in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3m90fnvkok8qu4fd5w5.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3m90fnvkok8qu4fd5w5.gif" alt="Dialogue Management - Conversational Flow" width="80" height="39"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's not about generating responses - it's about orchestrating decisions across multiple turns.&lt;/p&gt;

&lt;p&gt;E𝘅𝗮𝗺𝗽𝗹𝗲:&lt;br&gt;
👤 "Book a flight to Paris" &lt;br&gt;
🤖 [dest: Paris, origin: ❓] → "Where from?" &lt;br&gt;
👤 "New York" &lt;br&gt;
🤖 [all slots filled ] → "NYC → Paris. Confirm?"&lt;/p&gt;

&lt;p&gt;That decision flow? That's Dialog Management.&lt;/p&gt;

&lt;p&gt;𝗨𝗻𝗱𝗲𝗿 𝘁𝗵𝗲 𝗵𝗼𝗼𝗱, 𝗶𝘁 𝗵𝗮𝗻𝗱𝗹𝗲𝘀:&lt;br&gt;
→ Tracking conversation state across turns.&lt;br&gt;
→ Knowing what's been said vs what's missing.&lt;br&gt;
→ Deciding when to ask vs when to act.&lt;br&gt;
→ Handling corrections and errors.&lt;br&gt;
→ Executing actions and tools safely.&lt;/p&gt;

&lt;p&gt;This is what turns one-shot commands (from the user) into real conversations.&lt;/p&gt;

&lt;p&gt;Modern Voice AI agents may use LLMs here - but structure is still essential for reliability and safety.&lt;/p&gt;

&lt;p&gt;Without dialog management, even the best models feel unpredictable.&lt;/p&gt;

&lt;p&gt;➡️ Next up: How Voice AI remembers - context &amp;amp; memory management.&lt;/p&gt;

</description>
      <category>asr</category>
      <category>voiceai</category>
      <category>tts</category>
    </item>
    <item>
      <title>𝗩𝗼𝗶𝗰𝗲 𝗔𝗜: 𝗡𝗟𝗨 (𝗡𝗮𝘁𝘂𝗿𝗮𝗹 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴) - 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗪𝗵𝗮𝘁 𝗬𝗼𝘂 𝗔𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗠𝗲𝗮𝗻𝘁</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Fri, 19 Dec 2025 13:33:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/--1dp0</link>
      <guid>https://dev.to/wanjohichristopher/--1dp0</guid>
      <description>&lt;p&gt;What happens after the text arrives from ASR.&lt;/p&gt;

&lt;p&gt;🗣️ Say you tell a voice assistant:&lt;br&gt;
 "Book me a flight to Paris next Friday"&lt;/p&gt;

&lt;p&gt;ASR does its job and converts that into text.&lt;/p&gt;

&lt;p&gt;But at this point, the system still doesn’t really understand anything.&lt;br&gt;
It doesn’t know:&lt;br&gt;
 🔹what you’re trying to do.&lt;br&gt;
 🔹which parts of the sentence matter.&lt;br&gt;
 🔹or what information is missing.&lt;/p&gt;

&lt;p&gt;That’s where NLU (Natural Language Understanding) comes in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhv24lxqrp8gmypki72cn.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhv24lxqrp8gmypki72cn.gif" alt="NLU" width="760" height="308"&gt;&lt;/a&gt;&lt;br&gt;
Here’s what NLU figures out behind the scenes:&lt;/p&gt;

&lt;p&gt;1️⃣ - 𝗜𝗻𝘁𝗲𝗻𝘁 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻&lt;br&gt;
 What are you trying to do?&lt;br&gt;
 → You want to book a flight.&lt;/p&gt;

&lt;p&gt;2️⃣ - 𝗘𝗻𝘁𝗶𝘁𝘆 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻- details (entities)&lt;br&gt;
 → destination: Paris&lt;br&gt;
 → date: next Friday&lt;/p&gt;

&lt;p&gt;3️⃣ And finally - 𝗦𝗹𝗼𝘁 𝗙𝗶𝗹𝗹𝗶𝗻𝗴 - what’s missing&lt;br&gt;
 → where are you flying from?&lt;/p&gt;

&lt;p&gt;So the system knows it needs to ask a follow-up.&lt;/p&gt;

&lt;p&gt;That's the moment where the conversation starts to feel natural instead of scripted.&lt;/p&gt;

&lt;p&gt;With models like GPT-4 or Claude, etc, a lot of this NLU work can now happen in one step without training separate intent classifiers or entity models. The model reasons about intent, details, and gaps together.&lt;/p&gt;

&lt;p&gt;That’s a big reason modern Voice AI agents feel more flexible than the older "say it exactly this way" systems.&lt;/p&gt;

</description>
      <category>voiceai</category>
      <category>nlu</category>
      <category>tts</category>
      <category>agents</category>
    </item>
    <item>
      <title>ASR (Automatic Speech Recognition)</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Thu, 18 Dec 2025 22:30:00 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/asr-automatic-speech-recognition-nab</link>
      <guid>https://dev.to/wanjohichristopher/asr-automatic-speech-recognition-nab</guid>
      <description>&lt;p&gt;Yesterday I shared the full Voice AI pipeline.&lt;br&gt;
Today we're diving deep into Stage 1: ASR (Automatic Speech Recognition).&lt;/p&gt;

&lt;p&gt;You speak → It becomes text.&lt;/p&gt;

&lt;p&gt;Simple, right? Here's what actually happens:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33dp4n2fuenlgbljane6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33dp4n2fuenlgbljane6.png" alt="ASR" width="800" height="441"&gt;&lt;/a&gt;&lt;br&gt;
𝟭. 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻&lt;br&gt;
Raw audio → Digital representation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MFCCs (Mel-Frequency Cepstral Coefficients)&lt;/li&gt;
&lt;li&gt;Spectrograms&lt;/li&gt;
&lt;li&gt;Filter Banks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;𝟮. 𝗔𝗰𝗼𝘂𝘀𝘁𝗶𝗰 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴&lt;br&gt;
Maps audio features to phonemes&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traditional: HMM-GMM, DNN-HMM&lt;/li&gt;
&lt;li&gt;Modern: Transformers, Conformers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;𝟯. 𝗗𝗲𝗰𝗼𝗱𝗶𝗻𝗴 &amp;amp; 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴&lt;br&gt;
Phonemes → Words using probabilities&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Beam Search, CTC, Attention mechanisms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;𝟰. 𝗣𝗼𝘀𝘁-𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴&lt;br&gt;
Clean up the output&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spell checking, punctuation, capitalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The evolution has been wild:&lt;/p&gt;

&lt;p&gt;𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 (1980s-2010s):&lt;br&gt;
→ HMM + GMM&lt;br&gt;
→ Required phonetic alignment&lt;br&gt;
→ Separate components stitched together&lt;/p&gt;

&lt;p&gt;𝗦𝗧𝗔𝗧𝗘-𝗢𝗙-𝗧𝗛𝗘-𝗔𝗥𝗧 (Now):&lt;br&gt;
→ Whisper: 680K hours of training, 50+ languages&lt;br&gt;
→ Wav2Vec 2.0: Self-supervised, works with limited data&lt;/p&gt;

&lt;p&gt;Get ASR wrong and your entire voice pipeline fails. It's the foundation.&lt;/p&gt;

&lt;p&gt;I've attached a diagram breaking down the full ASR architecture.&lt;/p&gt;

&lt;p&gt;What ASR model are you using? Any surprises with accuracy or latency?&lt;/p&gt;

</description>
      <category>automatic</category>
      <category>asr</category>
      <category>voiceai</category>
      <category>tts</category>
    </item>
    <item>
      <title>VOICE AI SYSTEM ARCHITECTURE</title>
      <dc:creator>WanjohiChristopher</dc:creator>
      <pubDate>Thu, 18 Dec 2025 04:22:41 +0000</pubDate>
      <link>https://dev.to/wanjohichristopher/voice-ai-system-architecture-4h7</link>
      <guid>https://dev.to/wanjohichristopher/voice-ai-system-architecture-4h7</guid>
      <description>&lt;p&gt;🎙️I’ve been diving deep into Voice AI Agents and decided to map out how they actually work.&lt;/p&gt;

&lt;p&gt;You know when you ask Alexa or ChatGPT Voice a question and it just… responds intelligently?&lt;/p&gt;

&lt;p&gt;There’s a lot happening in that split second.&lt;/p&gt;

&lt;p&gt;How do voice agents work?&lt;/p&gt;

&lt;p&gt;At a high level, every voice agent needs to handle three tasks:&lt;/p&gt;

&lt;p&gt;👉Listen - capture audio and transcribe it&lt;br&gt;
👉Think - interpret intent, reason, plan&lt;br&gt;
👉Speak - generate audio and stream it back to the user&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8lwkwpgsu39l2c00vx99.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8lwkwpgsu39l2c00vx99.gif" alt="Voice AI Architecture" width="480" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A Voice AI Agent typically goes through five core stages:&lt;br&gt;
 🔹Speech is converted to text (ASR).&lt;br&gt;
 🔹The system understands intent and entities (NLU).&lt;br&gt;
 🔹It reasons about what action to take (Dialog Manager / Agent Logic).&lt;br&gt;
 🔹It generates a response (NLG).&lt;br&gt;
 🔹Speaks it back naturally (TTS).&lt;/p&gt;

&lt;p&gt;This same agent-style architecture powers Alexa, Siri, Google Assistant, and modern LLM-based voice agents like ChatGPT Voice.&lt;/p&gt;

&lt;p&gt;I put together a diagram to visualize the full end-to-end pipeline behind Voice AI Agents - from speech input to intelligent action and response.&lt;/p&gt;

&lt;p&gt;I’m planning to break down each component and share more on how agent-based voice systems are built.&lt;/p&gt;

&lt;p&gt;Which Voice AI agent do you interact with the most?&lt;/p&gt;

</description>
      <category>voice</category>
      <category>ai</category>
      <category>agents</category>
      <category>tts</category>
    </item>
  </channel>
</rss>
