<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kollaikal-rupesh</title>
    <description>The latest articles on DEV Community by kollaikal-rupesh (@kollaikalrupesh).</description>
    <link>https://dev.to/kollaikalrupesh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850832%2Fef4b439f-f3c0-435c-a48b-d9e9156cdbd2.png</url>
      <title>DEV Community: kollaikal-rupesh</title>
      <link>https://dev.to/kollaikalrupesh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kollaikalrupesh"/>
    <language>en</language>
    <item>
      <title>Just published my first post, 9 PRs I contributed to Pipecat: automatic service failover, fixing Smart Turn at 8kHz telephony, WebSocket reconnection loops, and more. Feedback welcome!</title>
      <dc:creator>kollaikal-rupesh</dc:creator>
      <pubDate>Mon, 30 Mar 2026 07:20:53 +0000</pubDate>
      <link>https://dev.to/kollaikalrupesh/just-published-my-first-post-9-prs-i-contributed-to-pipecat-automatic-service-failover-fixing-2n1h</link>
      <guid>https://dev.to/kollaikalrupesh/just-published-my-first-post-9-prs-i-contributed-to-pipecat-automatic-service-failover-fixing-2n1h</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l" class="crayons-story__hidden-navigation-link"&gt;Hardening Pipecat: A Month of Fixing What Matters&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/kollaikalrupesh" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850832%2Fef4b439f-f3c0-435c-a48b-d9e9156cdbd2.png" alt="kollaikalrupesh profile" class="crayons-avatar__image" width="420" height="420"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/kollaikalrupesh" class="crayons-story__secondary fw-medium m:hidden"&gt;
              kollaikal-rupesh
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                kollaikal-rupesh
                
              
              &lt;div id="story-author-preview-content-3429252" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/kollaikalrupesh" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850832%2Fef4b439f-f3c0-435c-a48b-d9e9156cdbd2.png" class="crayons-avatar__image" alt="" width="420" height="420"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;kollaikal-rupesh&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 30&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l" id="article-link-3429252"&gt;
          Hardening Pipecat: A Month of Fixing What Matters
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webdev"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webdev&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>python</category>
      <category>webdev</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Hardening Pipecat: A Month of Fixing What Matters</title>
      <dc:creator>kollaikal-rupesh</dc:creator>
      <pubDate>Mon, 30 Mar 2026 07:19:35 +0000</pubDate>
      <link>https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l</link>
      <guid>https://dev.to/kollaikalrupesh/hardening-pipecat-a-month-of-fixing-what-matters-44l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Over the past month, I've been contributing to &lt;a href="https://github.com/pipecat-ai/pipecat" rel="noopener noreferrer"&gt;Pipecat&lt;/a&gt;, the open-source Python framework for building real-time voice and multimodal AI agents. My focus has been on reliability: fixing race conditions, adding resilience mechanisms, and closing gaps that surface in production telephony deployments. This post covers 9 pull requests across &lt;code&gt;pipecat-ai/pipecat&lt;/code&gt; and &lt;code&gt;pipecat-ai/pipecat-flows&lt;/code&gt; — 4 merged, 2 still open, and 3 closed (superseded or folded into other work), plus code review contributions on other community PRs.&lt;/p&gt;

&lt;p&gt;Here's what I shipped and why it matters if you're building voice AI agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Automatic Service Failover for Production Resilience
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;PR &lt;a href="https://github.com/pipecat-ai/pipecat/pull/3870" rel="noopener noreferrer"&gt;#3870&lt;/a&gt;&lt;/strong&gt; — &lt;em&gt;Merged&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Pipecat's &lt;code&gt;ServiceSwitcher&lt;/code&gt; only supported manual switching between services (e.g., swapping STT providers). In production, when a service goes down, you want automatic failover — not a human operator pressing a button.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I built:&lt;/strong&gt; &lt;code&gt;ServiceSwitcherStrategyFailover&lt;/code&gt; - a strategy that listens for non-fatal &lt;code&gt;ErrorFrame&lt;/code&gt; emissions from the active service and automatically rotates to the next available service in the list.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ServiceSwitcherStrategyFailover&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ServiceSwitcherStrategy&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Automatically switches to the next service on failure.

    Recovery and fallback policies are left to application code
    via the on_service_switched event.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ErrorFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FrameProcessor&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="c1"&gt;# Switches to next service in the list
&lt;/span&gt;        &lt;span class="c1"&gt;# Failed service stays in list for manual recovery
&lt;/span&gt;        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The design deliberately separates &lt;em&gt;detection&lt;/em&gt; (automatic) from &lt;em&gt;recovery&lt;/em&gt; (application-defined). The framework handles failover; your code decides when to re-enable the failed service via the &lt;code&gt;on_service_switched&lt;/code&gt; event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;switcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ServiceSwitcher&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;primary_stt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;backup_stt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;strategy_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ServiceSwitcherStrategyFailover&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@switcher.strategy.event_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on_service_switched&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_switched&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Your recovery logic here — health checks, alerting, etc.
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also refactored the architecture: manual switching and &lt;code&gt;handle_error()&lt;/code&gt; moved into the base &lt;code&gt;ServiceSwitcherStrategy&lt;/code&gt; class, making them available to all strategy implementations. The default strategy type now works without explicit configuration, simplifying the common case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; 264 additions, 86 deletions across the switcher, LLM switcher, example, and 20 new tests. A subsequent bug fix (&lt;a href="https://github.com/pipecat-ai/pipecat/pull/4149" rel="noopener noreferrer"&gt;#4149&lt;/a&gt;) later tightened the &lt;code&gt;push_frame&lt;/code&gt; error check to only trigger &lt;code&gt;handle_error&lt;/code&gt; for errors originating from the &lt;em&gt;active&lt;/em&gt; managed service, preventing pass-through &lt;code&gt;ErrorFrame&lt;/code&gt;s from downstream processors (e.g., TTS errors propagating upstream through an LLM switcher) from incorrectly triggering failover.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Fixing Smart Turn Detection at Non-16kHz Sample Rates
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;PR &lt;a href="https://github.com/pipecat-ai/pipecat/pull/3857" rel="noopener noreferrer"&gt;#3857&lt;/a&gt;&lt;/strong&gt; — &lt;em&gt;Merged&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; &lt;code&gt;LocalSmartTurnAnalyzerV3&lt;/code&gt; — Pipecat's ML-based end-of-turn predictor — silently produced incorrect predictions when the pipeline ran at 8 kHz (standard for Twilio/telephony). The Whisper feature extractor inside it hardcoded 16 kHz in five places. At 8 kHz, the model perceived speech at 2x speed with shifted formant frequencies. No error, no warning — just wrong turn boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Added a resampling step before feature extraction using &lt;code&gt;soxr&lt;/code&gt; (already a core dependency) with VHQ quality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_MODEL_SAMPLE_RATE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_resample_to_model_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio_array&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;actual_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_sample_rate&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;_MODEL_SAMPLE_RATE&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;actual_rate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;_MODEL_SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;audio_array&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;soxr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actual_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_MODEL_SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quality&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VHQ&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replaced all five hardcoded &lt;code&gt;16000&lt;/code&gt; references with &lt;code&gt;_MODEL_SAMPLE_RATE&lt;/code&gt; and fixed the WAV debug logger to write correct sample rate headers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; This is the kind of bug that's invisible in development (most dev setups use 16 kHz) but breaks in production telephony. Turn detection is the backbone of conversational flow — when it's wrong, agents interrupt users mid-sentence or wait too long to respond.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Fixing Interrupted Transitions in Pipecat Flows
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;PR &lt;a href="https://github.com/pipecat-ai/pipecat-flows/pull/237" rel="noopener noreferrer"&gt;pipecat-flows#237&lt;/a&gt;&lt;/strong&gt; — &lt;em&gt;Merged&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; In Pipecat Flows (the state machine layer for multi-step conversations), a user interruption during a function call could permanently freeze the flow. The mechanism:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;_pending_transition&lt;/code&gt; gets set&lt;/li&gt;
&lt;li&gt;User interrupts, cancelling the function call&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;result_callback&lt;/code&gt; is never called&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_context_updated&lt;/code&gt; never fires&lt;/li&gt;
&lt;li&gt;The flow is stuck — no transition, no LLM run&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Changed &lt;code&gt;cancel_on_interruption&lt;/code&gt; default from &lt;code&gt;True&lt;/code&gt; to &lt;code&gt;False&lt;/code&gt; in both &lt;code&gt;FlowsFunctionSchema&lt;/code&gt; and the &lt;code&gt;@flows_direct_function&lt;/code&gt; decorator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: cancel_on_interruption: bool = True
# After:
&lt;/span&gt;&lt;span class="n"&gt;cancel_on_interruption&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures all function calls run to completion, even during interruptions. The transition mechanism depends on the result callback completing — interrupting it breaks the fundamental contract. Users can still opt into cancellation per-function when appropriate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; 18 additions, 4 deletions. A minimal, high-leverage fix for a production-breaking edge case. The root cause analysis was the hard part — the fix was straightforward once the failure mode was understood.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Stopping WebSocket Reconnection Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;PR &lt;a href="https://github.com/pipecat-ai/pipecat/pull/3824" rel="noopener noreferrer"&gt;#3824&lt;/a&gt;&lt;/strong&gt; — &lt;em&gt;Open&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; &lt;code&gt;WebsocketService._try_reconnect&lt;/code&gt; only counted failed &lt;em&gt;handshakes&lt;/em&gt; toward its retry limit. When a server accepts the WebSocket handshake but immediately closes the connection (e.g., close code &lt;code&gt;1008&lt;/code&gt; for invalid API key), the reconnection "succeeds" every time. The loop never exits, burning resources and flooding logs indefinitely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Two complementary mechanisms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Non-recoverable close codes — stop immediately
&lt;/span&gt;&lt;span class="n"&gt;_NON_RECOVERABLE_CLOSE_CODES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="mi"&gt;1002&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Protocol error
&lt;/span&gt;    &lt;span class="mi"&gt;1003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Unsupported data
&lt;/span&gt;    &lt;span class="mi"&gt;1008&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Policy violation (e.g., invalid API key)
&lt;/span&gt;    &lt;span class="mi"&gt;1009&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Message too big
&lt;/span&gt;    &lt;span class="mi"&gt;1010&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Mandatory extension
&lt;/span&gt;    &lt;span class="mi"&gt;1015&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# TLS handshake failure
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Rapid failure detection — 3 strikes for connections
#    that drop within 5 seconds of establishment
&lt;/span&gt;&lt;span class="n"&gt;_MIN_STABLE_CONNECTION_SECS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Non-recoverable close codes (RFC 6455 Section 7.4.1 + application-specific 4000-4999 range) emit a fatal &lt;code&gt;ErrorFrame&lt;/code&gt; without retrying. For ambiguous failures, a rapid-failure counter tracks connections that drop within 5 seconds — three strikes and it's fatal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; In telephony deployments, an infinite reconnection loop against a misconfigured endpoint can cascade — saturating connection pools, generating unbounded log volume, and masking the actual root cause (usually a bad API key or revoked credentials).&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Heartbeat Timeout as a First-Class Event
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;PR &lt;a href="https://github.com/pipecat-ai/pipecat/pull/3882" rel="noopener noreferrer"&gt;#3882&lt;/a&gt;&lt;/strong&gt; — &lt;em&gt;Open&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Pipecat's heartbeat monitor detects stuck pipelines, but when it fires, the only option is an internal log message. Production systems need to react programmatically — emit metrics, trigger alerts, or gracefully tear down the pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I added:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;on_heartbeat_timeout&lt;/code&gt; event handler on &lt;code&gt;PipelineTask&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Configurable &lt;code&gt;heartbeat_monitor_secs&lt;/code&gt; parameter (defaults to &lt;code&gt;heartbeats_period_secs * 10&lt;/code&gt;, preserving existing behavior)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cancel_on_heartbeat_timeout&lt;/code&gt; constructor arg (defaults to &lt;code&gt;False&lt;/code&gt;), mirroring the existing &lt;code&gt;cancel_on_idle_timeout&lt;/code&gt; pattern
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PipelineTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;PipelineParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;enable_heartbeats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;heartbeat_monitor_secs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cancel_on_heartbeat_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@task.event_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on_heartbeat_timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_heartbeat_timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pipeline stuck — heartbeat timeout reached&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;emit_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline.heartbeat_timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; In production, a stuck pipeline means a caller sitting in silence. The heartbeat timeout event lets you detect this and take action — restart the pipeline, fail over to a backup, or at minimum record it for post-incident analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other Notable Changes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bug Fixes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pipecat-ai/pipecat/pull/3871" rel="noopener noreferrer"&gt;#3871&lt;/a&gt; — Fix PipelineTask double-inserting RTVIProcessor&lt;/strong&gt; &lt;em&gt;(Merged)&lt;/em&gt;: When a user placed an &lt;code&gt;RTVIProcessor&lt;/code&gt; inside their pipeline and also provided a custom &lt;code&gt;RTVIObserver&lt;/code&gt;, &lt;code&gt;PipelineTask&lt;/code&gt; unconditionally prepended &lt;code&gt;self._rtvi&lt;/code&gt; to the pipeline — duplicating it. Added a &lt;code&gt;_rtvi_external&lt;/code&gt; flag to track whether the processor was found externally. 7 lines changed, clean fix for &lt;a href="https://github.com/pipecat-ai/pipecat/issues/3867" rel="noopener noreferrer"&gt;#3867&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pipecat-ai/pipecat/pull/3850" rel="noopener noreferrer"&gt;#3850&lt;/a&gt; — Remove verbose audio chunk logging from GenesysAudioHookSerializer&lt;/strong&gt; &lt;em&gt;(Merged)&lt;/em&gt;: A &lt;code&gt;logger.debug&lt;/code&gt; call in &lt;code&gt;deserialize()&lt;/code&gt; logged every incoming audio chunk (1600 bytes), flooding production logs. No other serializer (Twilio, Telnyx, Exotel, Plivo, Vonage) does this. One line removed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Closed PRs (Superseded)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pipecat-ai/pipecat/pull/3869" rel="noopener noreferrer"&gt;#3869&lt;/a&gt; — Fix tracing crashes on dataclass settings&lt;/strong&gt; &lt;em&gt;(Closed)&lt;/em&gt;: Tracing utilities called &lt;code&gt;.items()&lt;/code&gt; on dataclass settings objects, crashing on services like Google LLM and Inworld TTS. Added a &lt;code&gt;_settings_to_dict()&lt;/code&gt; helper that normalizes settings via &lt;code&gt;dataclasses.asdict()&lt;/code&gt;. Closed — the fix was picked up by the maintainers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pipecat-ai/pipecat/pull/3823" rel="noopener noreferrer"&gt;#3823&lt;/a&gt; — Fix assistant aggregator incorrectly aggregating transcription frames&lt;/strong&gt; &lt;em&gt;(Closed)&lt;/em&gt;: &lt;code&gt;TranscriptionFrame&lt;/code&gt; and &lt;code&gt;InterimTranscriptionFrame&lt;/code&gt; were caught by the generic &lt;code&gt;TextFrame&lt;/code&gt; handler, causing user speech to appear as assistant messages in the LLM context. Superseded by maintainer work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pipecat-ai/pipecat/pull/3801" rel="noopener noreferrer"&gt;#3801&lt;/a&gt; — PreemptiveUserTurnStopStrategy&lt;/strong&gt; &lt;em&gt;(Closed)&lt;/em&gt;: A new turn stop strategy that triggers LLM generation as soon as VAD detects silence and any transcription text is available, trading turn-boundary accuracy for lower latency. Closed — the concept influenced later turn strategy work by the core team.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Code Reviews
&lt;/h2&gt;

&lt;p&gt;Beyond my own PRs, I also reviewed community contributions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/pipecat-ai/pipecat/pull/3795" rel="noopener noreferrer"&gt;#3795&lt;/a&gt; — fix(realtime): handle response_cancel_not_active as non-fatal&lt;/strong&gt; by &lt;a href="https://github.com/omChauhanDev" rel="noopener noreferrer"&gt;@omChauhanDev&lt;/a&gt; &lt;em&gt;(Merged)&lt;/em&gt;: This PR fixed a bug where &lt;code&gt;response_cancel_not_active&lt;/code&gt; errors from the OpenAI Realtime API were fatally killing the WebSocket connection. I reviewed the approach (correct — this should not be fatal) and suggested using &lt;code&gt;logger.debug()&lt;/code&gt; instead of &lt;code&gt;logger.warning()&lt;/code&gt;, since this error is an expected, benign condition in push-to-talk mode that would create log noise in production. Also flagged a style convention (&lt;code&gt;{self}&lt;/code&gt; prefix for service identification in log messages).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What These Changes Mean
&lt;/h2&gt;

&lt;p&gt;The common thread across these contributions is &lt;strong&gt;production reliability for telephony deployments&lt;/strong&gt;. Voice AI agents run in environments where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Services fail (STT/TTS providers have outages) → automatic failover handles this&lt;/li&gt;
&lt;li&gt;Audio sample rates vary (8 kHz for telephony vs. 16 kHz for WebRTC) → correct resampling matters&lt;/li&gt;
&lt;li&gt;WebSocket connections drop with non-recoverable errors → infinite retry loops waste resources&lt;/li&gt;
&lt;li&gt;Pipelines get stuck → heartbeat timeouts provide observability&lt;/li&gt;
&lt;li&gt;State machines must be interrupt-safe → function calls need to complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't glamorous features — they're the kind of infrastructure work that prevents 3 AM pages. If you're building production voice agents with Pipecat, these fixes and features directly reduce the failure modes you'll encounter at scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  "For the story behind PR #3870, check out this post on Medium - &lt;a href="https://medium.com/@kollaikalrupesh/the-2am-call-that-voice-ai-developers-dread-and-how-i-helped-fix-it-fb607c06cb6b" rel="noopener noreferrer"&gt;https://medium.com/@kollaikalrupesh/the-2am-call-that-voice-ai-developers-dread-and-how-i-helped-fix-it-fb607c06cb6b&lt;/a&gt;"
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;All PRs authored by &lt;a href="https://github.com/kollaikal-rupesh" rel="noopener noreferrer"&gt;@kollaikal-rupesh&lt;/a&gt;. Open PRs are under review — feedback welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
