<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maddy Quinn</title>
    <description>The latest articles on DEV Community by Maddy Quinn (@maddysquinn).</description>
    <link>https://dev.to/maddysquinn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F973510%2F3cb97873-19ac-4184-9166-798b5d878eb5.jpeg</url>
      <title>DEV Community: Maddy Quinn</title>
      <link>https://dev.to/maddysquinn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maddysquinn"/>
    <language>en</language>
    <item>
      <title>Why chat.stop() doesn't cancel your LLM generation (and what to build instead)</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Fri, 26 Jun 2026 11:00:00 +0000</pubDate>
      <link>https://dev.to/ably/why-chatstop-doesnt-cancel-your-llm-generation-and-what-to-build-instead-4jd4</link>
      <guid>https://dev.to/ably/why-chatstop-doesnt-cancel-your-llm-generation-and-what-to-build-instead-4jd4</guid>
      <description>&lt;p&gt;You add a stop button to your AI chat app: a customer support agent, a coding assistant, a research tool the user can steer mid-task. A user clicks it mid-response. The frontend stops rendering. Then you check your backend logs and realize the underlying generation is still running, and you're still paying for every token.&lt;/p&gt;

&lt;p&gt;This is not a bug. The &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams" rel="noopener noreferrer"&gt;Vercel AI SDK docs&lt;/a&gt; document it explicitly: in a resumable stream setup, calling &lt;code&gt;stop()&lt;/code&gt; only closes the current HTTP connection and should not cancel the underlying generation. The same applies to closing a tab or refreshing the page. The client disconnects; the server keeps running.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calling &lt;code&gt;chat.stop()&lt;/code&gt; in the Vercel AI SDK closes the client connection but does not cancel server-side generation. The underlying generation keeps running, and billing continues.&lt;/li&gt;
&lt;li&gt;Fixing this requires a dedicated stop endpoint with idempotency checking, partial assistant snapshot persistence, and backend-specific cancellation logic. None of which the SDK provides.&lt;/li&gt;
&lt;li&gt;HTTP streaming is one-way. The server cannot distinguish an intentional stop from a network drop without an explicit signal sent separately from the stream.&lt;/li&gt;
&lt;li&gt;On an Ably session, cancel is an explicitly named signal. The server knows immediately whether to stop, wait, or redirect, with no additional endpoint required.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why &lt;code&gt;stop()&lt;/code&gt; and disconnect mean different things
&lt;/h2&gt;

&lt;p&gt;When you call &lt;code&gt;chat.stop()&lt;/code&gt; in &lt;code&gt;useChat&lt;/code&gt;, or when a user closes their browser tab, one thing happens: the HTTP connection closes. HTTP streaming is one-way: the server sends, the client receives. There is no signal in a closed connection that tells the server why it closed. A deliberate stop and a network drop look identical.&lt;/p&gt;

&lt;p&gt;This is intentional in resumable stream architectures. They are designed to survive disconnects: if the connection drops, the client should be able to reconnect and pick up where it left off. Keeping generation running through a connection loss is the correct behavior. But a user clicking stop triggers exactly the same response.&lt;/p&gt;

&lt;p&gt;The Vercel AI SDK &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams" rel="noopener noreferrer"&gt;docs are explicit about this&lt;/a&gt;: "a client-side abort (e.g. closing the page or refreshing) only closes the current HTTP connection. It is not a request to cancel the underlying work." If your stop button only calls &lt;code&gt;stop()&lt;/code&gt;, the model request, background job, workflow, or stream writer keeps running, and the client can reconnect to the same active stream.&lt;/p&gt;

&lt;p&gt;The same constraint applies to every other form of user control over a running agent. Say a user is running a research agent and wants to redirect mid-response: "actually, focus on flights only." There is no way to deliver that instruction over the existing stream. You need a separate endpoint, or some other mechanism alongside the stream. Server-Sent Events (SSE), the default transport for most AI SDK setups, cannot carry a signal back to the server. The stream flows one way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a correct stop implementation actually requires
&lt;/h2&gt;

&lt;p&gt;The Vercel AI SDK &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams" rel="noopener noreferrer"&gt;documents the correct approach&lt;/a&gt;: build a dedicated stop endpoint. And that endpoint needs to do four things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persist the partial assistant snapshot.&lt;/strong&gt; Before canceling, the client sends its current partial assistant message to the stop endpoint. This preserves what the user has already seen. Without this step, the assistant message disappears from the conversation when the stream closes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check the &lt;code&gt;activeStreamId&lt;/code&gt;.&lt;/strong&gt; Your application tracks which stream is active for each chat. The stop endpoint reads this value and compares it against the stream ID the client sent with the request. If a newer stream has started because the user sent a new message while the stop request was in flight, the stop request is stale and should be ignored.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cancel the active work.&lt;/strong&gt; This is the backend-specific step. In a Redis-backed resumable stream setup, you close the stored stream and abort the model request writing to it. In a workflow setup, you cancel the workflow run. In a job queue setup, you cancel the job or write a cancellation flag the job polls. The SDK cannot do this for you because it does not know your backend architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clear the &lt;code&gt;activeStreamId&lt;/code&gt;.&lt;/strong&gt; Once cancellation is confirmed, clear the stored stream reference, but only if it still matches the stream you intended to cancel. A newer stream may have started between the cancellation request and the completion of the cancel logic.&lt;/p&gt;

&lt;p&gt;Each step exists to address a specific race condition. Between the moment a user clicks stop and the moment the server processes the request, a new message can be sent, a new stream can start, or the partial assistant message can be overwritten by a server-side completion. The stop endpoint handles all of these correctly only if it checks every condition in sequence.&lt;/p&gt;

&lt;p&gt;This is buildable. The AI SDK docs &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams" rel="noopener noreferrer"&gt;provide a full implementation&lt;/a&gt;. But consider what you are actually shipping: a dedicated HTTP endpoint, a stream ID tracking layer, a partial message persistence mechanism, and backend-specific cancellation logic. The SDK provides none of it. All of it has to stay in sync with the rest of your streaming infrastructure. Most developers discover this after they ship their first stop button.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three questions to ask about your stop button before shipping
&lt;/h2&gt;

&lt;p&gt;Before you ship, answering these three questions will tell you whether your stop button actually does what it looks like it does.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does clicking stop actually stop backend generation, or does it only stop the client from receiving tokens? If you have not built a stop endpoint, the answer is the latter.&lt;/li&gt;
&lt;li&gt;What happens to the partial assistant message when stop is called? If you are not persisting a snapshot server-side, the message may disappear or be overwritten when the stream closes.&lt;/li&gt;
&lt;li&gt;What happens if a new message is sent while a stop request is in flight? If your stop endpoint does not check the &lt;code&gt;activeStreamId&lt;/code&gt;, it may cancel a stream the user has already moved past.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If all three have clean answers, your stop button works. If not, the gap will show up in production, usually after a user notices their coding assistant or support agent kept billing them for a response they clicked stop on.&lt;/p&gt;

&lt;p&gt;All three problems trace back to the same root cause: HTTP streaming gives the server no way to distinguish intent from a connection event. There is an approach that removes the problem at the transport level rather than working around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a bidirectional session changes the stop vs disconnect distinction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; is built on a different model. Instead of HTTP streaming, it uses a persistent bidirectional session. The client and server can both send signals at any time, over the same connection. That means cancel, stop, and redirect are first-class signals, not workarounds built on top.&lt;/p&gt;

&lt;p&gt;On an Ably session, cancel is a named signal rather than an inference from a dropped connection. The client publishes a cancel signal on the session: &lt;code&gt;session.cancel(runId)&lt;/code&gt;. The server receives it on the corresponding run, and its &lt;code&gt;abortSignal&lt;/code&gt; fires. Generation stops. The run ends with the reason &lt;code&gt;'cancelled'&lt;/code&gt;, and every subscriber receives the lifecycle update.&lt;/p&gt;

&lt;p&gt;Because the cancel is a session event rather than a TCP disconnection, the server knows exactly what happened. A network drop does not fire the cancel handler. A user clicking stop does. The session remains intact, and the next message starts a new run cleanly.&lt;/p&gt;

&lt;p&gt;The race condition that the stop endpoint exists to solve is handled natively. Each run has a unique &lt;code&gt;runId&lt;/code&gt;. A cancel signal targeting a run that has already ended is ignored, and multiple signals matching the same run cancel it once.&lt;/p&gt;

&lt;p&gt;For patterns beyond cancellation, the session supports cancel-then-send (cancel the active run and immediately send a new message) and send-alongside (send a new message while the active run continues). See the &lt;a href="https://ably.com/docs/ai-transport/features/interruption" rel="noopener noreferrer"&gt;interruption docs&lt;/a&gt; for full implementation guidance.&lt;/p&gt;

&lt;p&gt;For the Vercel AI SDK-specific analysis, including GitHub citations and billing evidence, see &lt;a href="https://ably.com/topic/ai-stack/why-vercel-ai-sdk-stop-doesnt-cancel-the-stream" rel="noopener noreferrer"&gt;why Vercel AI SDK stop doesn't cancel the stream&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6d6j8zr3yunmygom8ref.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6d6j8zr3yunmygom8ref.png" alt="HTTP · SSE closes the connection. The server keeps generating. On an Ably session, cancel is an explicit named event - generation stops immediately.&lt;br&gt;
" width="799" height="382"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Canceling a run with Ably AI Transport
&lt;/h2&gt;

&lt;p&gt;With Ably AI Transport, cancellation from the client is a single call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Cancel a specific run&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;activeRun&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Or cancel by runId, from any connected device&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the server, the abort signal fires automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createRun&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loadConversation&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// hydrate prior conversation history&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;convertToModelMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;abortSignal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;abortSignal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// fires when cancel() is called client-side&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;reason&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toUIMessageStream&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// reason is 'cancelled' when abort fires&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;abortSignal&lt;/code&gt; is passed directly to the model call. When the client cancels, the signal fires, generation stops, and the run ends with reason &lt;code&gt;'cancelled'&lt;/code&gt;. No stop endpoint to build, no &lt;code&gt;activeStreamId&lt;/code&gt; to track, no race condition to guard against.&lt;/p&gt;

&lt;p&gt;One edge case worth noting: cancellation is asynchronous, so a small tail of tokens may arrive after &lt;code&gt;cancel()&lt;/code&gt; returns and before the server's &lt;code&gt;abortSignal&lt;/code&gt; fires. Those tokens still belong to the cancelled run, not the next one. Also, any tool invocation that does not check the &lt;code&gt;abortSignal&lt;/code&gt; will keep running until it completes, so if your agent calls tools, pass the signal through to each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adopting Ably AI Transport: what changes in your stack
&lt;/h2&gt;

&lt;p&gt;Shifting from HTTP streaming to an Ably session does not change your LLM call, your model provider, or your agent framework. AI Transport sits at the delivery layer, below orchestration. Your Vercel AI SDK, LangGraph, or custom agent logic stays unchanged. For teams using the Vercel AI SDK specifically, Ably ships a drop-in transport adapter, &lt;code&gt;@ably/ai-transport/vercel&lt;/code&gt;, that swaps the transport underneath &lt;code&gt;useChat&lt;/code&gt; without changing the hook.&lt;/p&gt;

&lt;p&gt;What changes is the transport. Instead of an HTTP POST that returns a streaming response, the client opens an Ably session. Cancel, stop, and redirect become session signals, not HTTP endpoints.&lt;/p&gt;

&lt;p&gt;There is a trade-off: an Ably session adds a persistent connection to your architecture. If stop is the only signal you need, a stop endpoint is the lighter choice. The session model earns its place when you need several of these signals: cancel, redirect, steer, human handover, multi-device continuity. They all run on the same infrastructure, so if you are already building one of those patterns, you are building the foundation for all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The stop vs disconnect distinction is a structural property of HTTP streaming, not a framework bug. Closing an HTTP connection does not carry intent; only an explicit signal sent separately from the stream does.&lt;/p&gt;

&lt;p&gt;A correct stop endpoint is buildable, but it is four moving parts that have to stay in sync with your streaming infrastructure. Most developers discover the gaps after they ship.&lt;/p&gt;

&lt;p&gt;Ably AI Transport takes a different approach. On an Ably session, cancel is an explicit signal. Race conditions are handled at the transport level. The session persists through cancellation, and the next message starts a clean run.&lt;/p&gt;

&lt;p&gt;Docs go deeper: &lt;a href="https://ably.com/docs/ai-transport/features/cancellation" rel="noopener noreferrer"&gt;Ably AI Transport cancellation docs&lt;/a&gt; | &lt;a href="https://ably.com/docs/ai-transport/features/interruption" rel="noopener noreferrer"&gt;Interruption patterns&lt;/a&gt; | &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams" rel="noopener noreferrer"&gt;Vercel AI SDK stop documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does calling &lt;code&gt;chat.stop()&lt;/code&gt; in the Vercel AI SDK cancel the underlying generation?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. &lt;code&gt;chat.stop()&lt;/code&gt; closes the HTTP connection. The underlying generation — the model request, background job, or stream writer — keeps running until it completes. You are billed for every token. The Vercel AI SDK documents this explicitly: a client-side abort is a disconnect signal, not a cancellation. Stopping generation requires a dedicated stop endpoint that you build and maintain alongside your streaming infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why can't the server detect a client disconnect and stop generation automatically?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The server can detect that the HTTP connection is closed. It cannot tell whether this was an intentional stop, a network drop, a page refresh, or a tab crash. In a resumable stream architecture, all four are treated as disconnects by design: the stream should survive a network drop. Treating every disconnect as an intentional stop would cancel streams on network blips and prevent reconnection. Distinguishing them requires an explicit signal from the client, which is why a stop endpoint is necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is &lt;code&gt;activeStreamId&lt;/code&gt; checking, and why does my stop endpoint need it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;activeStreamId&lt;/code&gt; is a reference that your application stores, linking each chat to its currently active stream. The stop endpoint reads this value and compares it against the stream ID the client sends with the stop request. If a newer stream has started since the client initiated the stop, the stop request is stale and should be ignored. Without this check, the stop endpoint may cancel a stream the user has already moved past, leaving the conversation in an inconsistent state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Ably's session model handle the stop vs disconnect distinction?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On an Ably session, cancel is an explicit event published by the client, either via &lt;code&gt;activeRun.cancel()&lt;/code&gt; for the current run or &lt;code&gt;session.cancel(runId)&lt;/code&gt; to target a specific run by ID. The server receives it as a named session signal, not as a TCP disconnection. A network drop does not trigger the cancel handler. An intentional stop does. These two events have separate handling, without requiring a stop endpoint or idempotency logic. The session remains intact after cancellation, and the next user message starts a clean run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I build interruptible AI streaming, and is redirect or steer supported today?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You need a bidirectional session. With Ably AI Transport, calling &lt;code&gt;activeRun.cancel()&lt;/code&gt; or &lt;code&gt;session.cancel(runId)&lt;/code&gt; publishes an explicit cancel signal the server acts on immediately, regardless of connection state. &lt;code&gt;activeRun.cancel()&lt;/code&gt; is the typical client-side call; &lt;code&gt;session.cancel(runId)&lt;/code&gt; lets you target a specific run by ID, including from a different device. Beyond cancel, the session supports two interruption patterns: cancel-then-send, which cancels the active run before starting a new one, and send-alongside, which lets both runs continue concurrently. See the &lt;a href="https://ably.com/docs/ai-transport/features/interruption" rel="noopener noreferrer"&gt;interruption docs&lt;/a&gt; for full implementation guidance.&lt;/p&gt;




&lt;p&gt;What's your current approach to stop and cancellation in production? Do you have a stop endpoint, or are you relying on client-side disconnect? Would love to hear how others are handling this.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>websockets</category>
      <category>vercel</category>
      <category>typescript</category>
    </item>
    <item>
      <title>When should you replace DefaultChatTransport?</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Mon, 22 Jun 2026 14:18:09 +0000</pubDate>
      <link>https://dev.to/ably/when-should-you-replace-defaultchattransport-4c88</link>
      <guid>https://dev.to/ably/when-should-you-replace-defaultchattransport-4c88</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; &lt;code&gt;DefaultChatTransport&lt;/code&gt; uses HTTP POST and SSE. This is correct for a single user on a stable connection - but it reaches its design boundary when production requires cancellation that reaches the server, multi-device delivery, stream resumption without Redis, or multi-user sessions. This post covers the four limits, a four-question self-audit, and what a WebSocket-based session layer adds.&lt;/p&gt;

&lt;p&gt;You've built an AI chat app on the Vercel AI SDK. It works in development. The model responds, the stream comes through, and the UI updates cleanly. Then you ship to production, and the transport layer starts showing its edges.&lt;/p&gt;

&lt;p&gt;Most of these failures are quiet: things that work in demos and break in ways that are hard to pin down until you know where to look. They share a common cause: &lt;code&gt;DefaultChatTransport&lt;/code&gt; is built for HTTP, and HTTP has structural properties that some production requirements exceed. This piece explains what those limits are, which ones matter for your application, and what replacing the transport actually involves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;DefaultChatTransport&lt;/code&gt; uses HTTP POST and Server-Sent Events (SSE). These protocols are one-way and point-to-point. That's correct behavior for a stateless serverless platform, not a bug in the SDK.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stop()&lt;/code&gt; fires the abort signal client-side and returns immediately. &lt;a href="https://github.com/vercel/ai/issues/9707" rel="noopener noreferrer"&gt;GitHub issue #9707 (open, October 2025)&lt;/a&gt; confirms the server cannot distinguish an intentional stop from a dropped connection, and may continue generating and billing until completion.&lt;/li&gt;
&lt;li&gt;The official Vercel AI SDK stream resumption pattern requires Redis, the &lt;a href="https://www.npmjs.com/package/resumable-stream" rel="noopener noreferrer"&gt;resumable-stream package&lt;/a&gt;, two custom API endpoints, and a dedicated stop handler. In a resumable stream setup, &lt;code&gt;stop()&lt;/code&gt; is treated as a disconnect, not a cancel.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;ChatTransport&lt;/code&gt; interface is pluggable by design. Vercel's serverless platform cannot host persistent WebSocket connections, so they made the transport layer swappable. Replacing &lt;code&gt;DefaultChatTransport&lt;/code&gt; with a WebSocket-based transport layer creates a durable session between your agent and client, without changing your agent, tool calls, or UI rendering.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How DefaultChatTransport works, and the conditions it was built for
&lt;/h2&gt;

&lt;p&gt;When you call &lt;code&gt;useChat()&lt;/code&gt; without a transport option, or pass a default config, &lt;code&gt;DefaultChatTransport&lt;/code&gt; is what runs. It sends outgoing messages via HTTP POST and receives responses as an SSE stream.&lt;/p&gt;

&lt;p&gt;For a single user on a stable connection, sending a message and waiting for the response, this is the right choice. A stateless serverless function receives the request, calls the model, and streams the response back. HTTP is the right tool for that, and &lt;code&gt;DefaultChatTransport&lt;/code&gt; uses it correctly.&lt;/p&gt;

&lt;p&gt;That behavior follows from a platform constraint: Vercel's serverless functions terminate after responding, so there is no persistent process to hold a socket open. That's the root of all four limits. They're architectural, not configurable, because HTTP on a stateless platform simply can't do what they require. The Ably &lt;a href="https://ably.com/topic/ai-stack/websockets-on-vercel-why-serverless-functions-cant-host-them" rel="noopener noreferrer"&gt;guide to WebSockets on Vercel&lt;/a&gt; covers this constraint in depth if you want the full picture.&lt;/p&gt;

&lt;p&gt;That's also why Vercel made &lt;code&gt;ChatTransport&lt;/code&gt; pluggable in &lt;a href="https://vercel.com/blog/ai-sdk-5" rel="noopener noreferrer"&gt;AI SDK 5&lt;/a&gt;. &lt;code&gt;DefaultChatTransport&lt;/code&gt; is not broken: it's correct for the conditions it was built for. But Vercel designed the interface precisely so teams can swap in a transport that isn't bound by those conditions.&lt;/p&gt;

&lt;p&gt;It's not just &lt;code&gt;DefaultChatTransport&lt;/code&gt; that has this constraint. Even &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-ui/direct-chat-transport" rel="noopener noreferrer"&gt;&lt;code&gt;DirectChatTransport&lt;/code&gt;&lt;/a&gt;, the other built-in option, explicitly documents that it "does not support reconnection since there is no persistent server-side stream to reconnect to." Reconnection is a transport-layer property. The default implementations don't have it because the platform they're built for doesn't support it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four things DefaultChatTransport can't do in production
&lt;/h2&gt;

&lt;p&gt;These are the limits that surface when you move beyond a single-user chatbot: a customer support agent that hands off between devices, a chat interface where a human and an AI both participate, or any application where the connection dropping mid-generation has a visible cost to the user.&lt;/p&gt;

&lt;p&gt;Each follows from the same root: HTTP/SSE is built for one connection, one client, one response. When production asks for more, that constraint becomes visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cancellation is ambiguous, and you may be paying for it.&lt;/strong&gt; When a user clicks stop, &lt;code&gt;stop()&lt;/code&gt; closes the HTTP connection client-side, and returns immediately, without waiting for the server to acknowledge or terminate the generation. The server receives a connection close event. It has no way to distinguish that from a tab close, a network drop, or a mobile device going to sleep. So it keeps generating.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/vercel/ai/issues/9707" rel="noopener noreferrer"&gt;GitHub issue #9707 (filed October 2025, still open)&lt;/a&gt; documents this directly: &lt;code&gt;createUIMessageStream&lt;/code&gt; does not detect the abort signal server-side, making it "impossible to stop ongoing AI generation and leading to unnecessary costs and poor UX." &lt;a href="https://github.com/vercel/ai/issues/10844" rel="noopener noreferrer"&gt;GitHub issue #10844&lt;/a&gt; adds that Vercel's own &lt;code&gt;supportsCancellation: true&lt;/code&gt; config flag behaves unreliably in production deployments. The cost is real: orphaned generations run to completion, and there's no reliable mechanism to stop them without a custom server-side endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-device delivery silently fails.&lt;/strong&gt; SSE is one-to-one. One HTTP connection, one client, one stream. A user with the same session open on their laptop and phone receives the response only on the device that sent the request. The second device gets nothing: no error, no partial content, no indication that anything is in flight. This isn't a &lt;code&gt;useChat&lt;/code&gt; configuration gap. It's a structural property of HTTP. Multi-device fan-out is absent from the vast majority of AI transport implementations because SSE is one-to-one by design. &lt;code&gt;DefaultChatTransport&lt;/code&gt; is no exception.&lt;/p&gt;

&lt;p&gt;The same architectural root connects the next limit. Where multi-device delivery requires fan-out that HTTP cannot provide, stream resumption requires session persistence that HTTP cannot maintain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stream resumption requires infrastructure that you build and own.&lt;/strong&gt; The &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams" rel="noopener noreferrer"&gt;Vercel AI SDK stream resumption documentation&lt;/a&gt; lists the prerequisites directly: a Redis instance, the &lt;a href="https://www.npmjs.com/package/resumable-stream" rel="noopener noreferrer"&gt;resumable-stream package&lt;/a&gt;, a POST handler that creates resumable streams using &lt;code&gt;consumeSseStream&lt;/code&gt;, a GET handler at &lt;code&gt;/api/chat/[id]/stream&lt;/code&gt; that resumes them with &lt;code&gt;resumeExistingStream&lt;/code&gt;, and a dedicated stop endpoint.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;stop()&lt;/code&gt; and resumable streams are also architecturally incompatible. The docs state it directly: "In a resumable stream setup, client-side aborts are treated as disconnects. Closing a tab, refreshing the page, or calling &lt;code&gt;stop()&lt;/code&gt; only closes the current HTTP connection and should not cancel the underlying generation." Adding a working stop button requires a separate server-side endpoint to cancel the underlying work and clear the active stream record.&lt;/p&gt;

&lt;p&gt;Tab switches and mobile backgrounding are a further gap the &lt;code&gt;resumable-stream&lt;/code&gt; pattern doesn't cover in the same way as a page reload. The Ably guide on &lt;a href="https://ably.com/topic/ai-stack/vercel-ai-sdk-resumable-stream-what-it-covers-and-what-it-doesnt" rel="noopener noreferrer"&gt;Vercel AI SDK resumable streams&lt;/a&gt; covers the distinction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The single-response assumption breaks multi-user sessions.&lt;/strong&gt; Vercel designed &lt;code&gt;useChat&lt;/code&gt; around one user sending one message and receiving one response. It tracks one &lt;code&gt;activeResponse&lt;/code&gt; at a time. If a second user joins, or an observer device needs the same response lifecycle, the only available mechanism is &lt;code&gt;setMessages&lt;/code&gt;. This bypasses lifecycle hooks, tool-call notifications, and &lt;code&gt;onFinish&lt;/code&gt; callbacks entirely. It works, but it's a workaround. &lt;a href="https://ably.com/blog/custom-transport-vercel-ai-sdk" rel="noopener noreferrer"&gt;Zak Knill's post on building the Ably transport&lt;/a&gt; covers the implementation detail.&lt;/p&gt;

&lt;p&gt;Each of the four limits above has the same root cause but surfaces differently. The table below maps them to their production cost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;th&gt;What breaks&lt;/th&gt;
&lt;th&gt;Production cost&lt;/th&gt;
&lt;th&gt;Configurable in &lt;code&gt;DefaultChatTransport&lt;/code&gt;?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cancellation&lt;/td&gt;
&lt;td&gt;Server can't distinguish stop from disconnect&lt;/td&gt;
&lt;td&gt;Orphaned generations; ongoing billing&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-device&lt;/td&gt;
&lt;td&gt;SSE delivers to one client only&lt;/td&gt;
&lt;td&gt;Silent failure on second device&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stream resumption&lt;/td&gt;
&lt;td&gt;Requires Redis, two endpoints, stop handler&lt;/td&gt;
&lt;td&gt;Significant custom infrastructure&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-response assumption&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;setMessages&lt;/code&gt; bypasses lifecycle hooks&lt;/td&gt;
&lt;td&gt;Broken tool calls, missing &lt;code&gt;onFinish&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How a WebSocket-based transport layer creates a durable session between agent and client
&lt;/h2&gt;

&lt;p&gt;Replacing &lt;code&gt;DefaultChatTransport&lt;/code&gt; with a WebSocket-based transport layer replaces a stateless HTTP connection with a durable session between your agent and your users. One that persists beyond any single connection and addresses all four limits directly. It also removes the custom infrastructure that those limits force you to build. &lt;a href="https://ably.com/topic/ai-stack/vercel-ai-sdk-chattransport-implementing-a-custom-websocket-transport" rel="noopener noreferrer"&gt;The Ably topic page on implementing a custom &lt;code&gt;ChatTransport&lt;/code&gt;&lt;/a&gt; covers the full capability surface. This section covers what disappears from your backlog.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With a WebSocket-based transport layer, you no longer need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Redis buffer for resumable streams&lt;/li&gt;
&lt;li&gt;The stop endpoint with race condition protection&lt;/li&gt;
&lt;li&gt;The fan-out layer for multi-device delivery&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;setMessages&lt;/code&gt; workaround for multi-user sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fxm1wdxtr528t2jwjwm65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fxm1wdxtr528t2jwjwm65.png" alt="How a durable session works: session decoupled from connection, showing cancel signal and reconnect from position" width="800" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mechanism that makes this possible is straightforward. A session is decoupled from the connection. The session persists independently; a connection is how a client subscribes to it. When a client disconnects and reconnects, it presents its last position to the session and receives only the messages it missed. A cancel signal is sent explicitly on the session: the server reads it as intent, not as a connection close event it has to interpret.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport/framework-guides/vercel-ai-sdk" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; is built as the session layer for production AI applications: the infrastructure between your agent and your users that handles the delivery concerns that &lt;code&gt;DefaultChatTransport&lt;/code&gt; can't. It plugs into &lt;code&gt;useChat&lt;/code&gt; as a &lt;code&gt;ChatTransport&lt;/code&gt; implementation via a single configuration change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: default HTTP transport&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;stop&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useChat&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DefaultChatTransport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// After: Ably AI Transport (backed by an Ably session)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;chatTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useChatTransport&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// from &amp;lt;ChatTransportProvider&amp;gt;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sendMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useChat&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chatTransport&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice: &lt;code&gt;stop()&lt;/code&gt; sends a typed signal the server can act on, instead of a connection close event that it has to guess at. Any device subscribed to the same session receives the stream, so a user switching from laptop to phone doesn't lose the conversation. If the connection drops mid-generation, the client reconnects and catches up from where it left off, because the session persists independently of any single connection.&lt;/p&gt;

&lt;p&gt;What stays unchanged: your agent, tool calls, message persistence logic, and UI rendering. The swap is the &lt;code&gt;transport&lt;/code&gt; option in &lt;code&gt;useChat&lt;/code&gt;. Everything built on top of it carries over.&lt;/p&gt;

&lt;p&gt;For the implementation detail on own-turns, observer-turns, and &lt;code&gt;setMessages&lt;/code&gt; handling, &lt;a href="https://ably.com/blog/custom-transport-vercel-ai-sdk" rel="noopener noreferrer"&gt;see Zak Knill's post&lt;/a&gt;. For how transport options compare more broadly, see &lt;a href="https://ably.com/topic/ai-stack/vercel-ai-sdk-realtime-transport" rel="noopener noreferrer"&gt;the durable sessions guide for Vercel AI SDK applications&lt;/a&gt;. The four questions in the next section will help you work out whether you're at that decision point yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  When DefaultChatTransport is still the right choice
&lt;/h2&gt;

&lt;p&gt;The four limits above are real, but they only become blockers if you need cancellation that reaches the server, multi-device delivery, stream resumption beyond page reloads, or more than one user in the same conversation. For many applications, &lt;code&gt;DefaultChatTransport&lt;/code&gt; remains the right starting point.&lt;/p&gt;

&lt;p&gt;A practical way to assess your own situation is to work through four questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Do you need &lt;code&gt;stop()&lt;/code&gt; to reliably cancel server-side generation, not just the UI update, but the actual model call?&lt;/li&gt;
&lt;li&gt;Do users access the same session from more than one device or tab?&lt;/li&gt;
&lt;li&gt;Do you need stream resumption across tab switches or mobile backgrounding, not just full page reloads?&lt;/li&gt;
&lt;li&gt;Does more than one user participate in the same conversation?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the answer to all four is no, &lt;code&gt;DefaultChatTransport&lt;/code&gt; is a defensible choice. If any answer is yes, the relevant section above describes the specific limit you'll encounter. The right time to replace the transport is when those limits start costing you.&lt;/p&gt;

&lt;p&gt;If the self-audit above lands on yes for any of the four questions, &lt;code&gt;DefaultChatTransport&lt;/code&gt; has reached its limit for your use case. The transport layer is the right place to fix it, and replacing it changes nothing else in your application.&lt;/p&gt;

&lt;p&gt;The next step is understanding the &lt;code&gt;ChatTransport&lt;/code&gt; interface: what &lt;code&gt;sendMessages&lt;/code&gt; and &lt;code&gt;reconnectToStream&lt;/code&gt; require, and what to look for in an implementation. The &lt;a href="https://ably.com/topic/ai-stack/vercel-ai-sdk-chattransport-implementing-a-custom-websocket-transport" rel="noopener noreferrer"&gt;Ably ChatTransport topic page&lt;/a&gt; covers that in full. To get started with Ably AI Transport directly, &lt;a href="https://ably.com/docs/ai-transport/framework-guides/vercel-ai-sdk" rel="noopener noreferrer"&gt;the Vercel AI SDK integration guide&lt;/a&gt; is the right starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does the Vercel AI SDK support multi-device AI chat out of the box?
&lt;/h3&gt;

&lt;p&gt;Not with &lt;code&gt;DefaultChatTransport&lt;/code&gt;. SSE is scoped to a single HTTP connection, so a second device has no way to join a stream already in progress. Multi-device delivery requires a transport where the session exists independently of the connection, so any subscribed client receives it. The Ably guide on &lt;a href="https://ably.com/topic/ai-stack/why-vercel-ai-sdk-cant-stream-to-multiple-devices" rel="noopener noreferrer"&gt;why Vercel AI SDK can't stream to multiple devices&lt;/a&gt; provides the full picture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why doesn't stop() cancel server-side generation in Vercel AI SDK?
&lt;/h3&gt;

&lt;p&gt;Because &lt;code&gt;DefaultChatTransport&lt;/code&gt; has no signal path back to the server. When &lt;code&gt;stop()&lt;/code&gt; closes the HTTP connection, the server receives a TCP close it can't distinguish from a network drop, so generation continues and billing runs to completion. With a WebSocket-based transport layer, &lt;code&gt;stop()&lt;/code&gt; sends a typed cancel message on the session; the server reads it as intent, not inference. The Ably guide on &lt;a href="https://ably.com/topic/ai-stack/why-vercel-ai-sdk-stop-doesnt-cancel-the-stream" rel="noopener noreferrer"&gt;why stop() doesn't cancel the stream&lt;/a&gt; covers the full mechanism.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much infrastructure does Vercel AI SDK stream resumption require?
&lt;/h3&gt;

&lt;p&gt;The official pattern requires a Redis instance, the &lt;a href="https://www.npmjs.com/package/resumable-stream" rel="noopener noreferrer"&gt;&lt;code&gt;resumable-stream&lt;/code&gt; package&lt;/a&gt;, a POST handler with &lt;code&gt;consumeSseStream&lt;/code&gt;, a GET handler at &lt;code&gt;/api/chat/[id]/stream&lt;/code&gt;, and a dedicated stop endpoint with race condition handling. &lt;code&gt;stop()&lt;/code&gt; and resumable streams are also architecturally incompatible: in a resumable stream setup, a client abort is treated as a disconnect, not a cancel. See &lt;a href="https://ably.com/topic/ai-stack/vercel-ai-sdk-resumable-stream-what-it-covers-and-what-it-doesnt" rel="noopener noreferrer"&gt;the Ably guide to Vercel AI SDK resumable streams&lt;/a&gt; for the full breakdown.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I replace DefaultChatTransport?
&lt;/h3&gt;

&lt;p&gt;When the limits start affecting your production application. The four-question self-audit in the "When DefaultChatTransport is still the right choice" section gives a practical framework. In short: if you need &lt;code&gt;stop()&lt;/code&gt; to reliably cancel server-side generation, multi-device delivery, stream resumption beyond page reloads, or multi-user sessions, the default transport can't provide those. The &lt;a href="https://ably.com/topic/ai-stack/vercel-ai-sdk-realtime-transport" rel="noopener noreferrer"&gt;Ably durable sessions guide for Vercel AI SDK&lt;/a&gt; covers the transport options available once you've decided to move on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why replace DefaultChatTransport with a WebSocket-based transport layer?
&lt;/h3&gt;

&lt;p&gt;When &lt;code&gt;DefaultChatTransport&lt;/code&gt;'s design scope no longer fits your production requirements. If you're hitting unconfirmed cancellations, single-device delivery, Redis-dependent stream resumption, or the &lt;code&gt;setMessages&lt;/code&gt; workaround for multi-user sessions, those are properties of HTTP/SSE that a WebSocket-based transport layer resolves at the transport level. Your agent, tool calls, and UI code don't change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel AI SDK custom transport vs default transport, what actually changes?
&lt;/h3&gt;

&lt;p&gt;The delivery mechanism only. Your agent, tool calls, message persistence, and UI rendering stay the same. The swap is the &lt;code&gt;transport&lt;/code&gt; option in &lt;code&gt;useChat&lt;/code&gt;, one configuration change. For a full before/after and getting started guide, see &lt;a href="https://ably.com/docs/ai-transport/framework-guides/vercel-ai-sdk" rel="noopener noreferrer"&gt;the Ably AI Transport Vercel integration guide&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What transport issues have you hit building on the Vercel AI SDK? Would be interested to hear which of the four comes up most in practice.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vercel</category>
      <category>typescript</category>
      <category>websockets</category>
    </item>
    <item>
      <title>Why AWS ALB and Cloudflare silently kill your AI agent sessions</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Mon, 15 Jun 2026 11:51:00 +0000</pubDate>
      <link>https://dev.to/ably/transport-reconnection-session-recovery-508b</link>
      <guid>https://dev.to/ably/transport-reconnection-session-recovery-508b</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; WebSocket reconnection restores the transport. It doesn't restore the session. Tokens generated during the gap, tool call results that arrived while the client was offline, and the agent's position in the ongoing generation are all lost unless you have a session layer. This post covers the timeout sources that hit agentic applications specifically, why SSE is a bad fit for bidirectional agent communication, and how session recovery works in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI agents get disconnected in ways standard apps don't
&lt;/h2&gt;

&lt;p&gt;WebSocket reconnection has always been worth solving. What makes AI agents different is &lt;em&gt;when&lt;/em&gt; the disconnect happens.&lt;/p&gt;

&lt;p&gt;A standard chat interface goes quiet between user interactions — when there's genuinely nothing happening on the wire. An agent goes quiet mid-execution: during tool call waits, between reasoning steps, while the LLM is generating a response. That silence is the agent doing its most intensive work. To every load balancer and proxy in the path, it looks idle.&lt;/p&gt;

&lt;p&gt;AWS Application Load Balancer &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#target-group-attributes" rel="noopener noreferrer"&gt;defaults to closing connections after 60 seconds&lt;/a&gt; of inactivity. Cloudflare &lt;a href="https://developers.cloudflare.com/fundamentals/reference/connection-limits/" rel="noopener noreferrer"&gt;enforces a 100-second idle timeout&lt;/a&gt; on Free and Pro plans — fixed, cannot be raised. Corporate proxies and enterprise gateways add their own thresholds that you often can't inspect or configure.&lt;/p&gt;

&lt;p&gt;Plenty of production WebSocket applications have shipped without explicitly thinking about this. The reason: traditional server-side workloads tend to emit a trickle of traffic on their own — progress events, periodic state updates — which keeps the connection alive as a side effect. The timeouts stay invisible because something is always crossing the wire.&lt;/p&gt;

&lt;p&gt;Agentic applications don't have that property. A customer support agent goes quiet mid-answer while the user is typing a correction. A coding agent waits for the user to approve a tool call before continuing. A research agent sits in silence for 90 seconds while a downstream API responds. None of that is idleness from the agent's perspective. To the ALB, it's all the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why SSE doesn't fit
&lt;/h2&gt;

&lt;p&gt;If you're reaching for SSE as an alternative, it won't solve the session problem — and it introduces a new one.&lt;/p&gt;

&lt;p&gt;The applications this post is about — customer support agents, coding agents, research agents the user steers mid-task — require the client to send messages back to the agent on the same session while it's in flight. A user correcting an assumption, approving a tool call, or cancelling mid-implementation needs a channel in both directions.&lt;/p&gt;

&lt;p&gt;SSE streams server-to-client only. That rules it out at the transport level regardless of how well you've solved the replay problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fix for idle timeouts: server-side ping frames
&lt;/h2&gt;

&lt;p&gt;The WebSocket spec includes a mechanism designed exactly for this: server-side &lt;a href="https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.2" rel="noopener noreferrer"&gt;ping frames&lt;/a&gt;. The server sends a ping at a fixed interval; the browser responds automatically with a pong; both frames count as activity and reset every idle timer on the path.&lt;/p&gt;

&lt;p&gt;The interval needs to sit comfortably below the shortest timeout on the path. A &lt;strong&gt;50-second interval&lt;/strong&gt; covers both the AWS ALB 60-second default and Cloudflare's 100-second limit simultaneously. Browsers respond to ping frames automatically — no client-side code required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common idle timeouts to plan around
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Infrastructure&lt;/th&gt;
&lt;th&gt;Default timeout&lt;/th&gt;
&lt;th&gt;Configurable?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS Application Load Balancer&lt;/td&gt;
&lt;td&gt;60 seconds&lt;/td&gt;
&lt;td&gt;Yes — &lt;code&gt;idle_timeout.timeout_seconds&lt;/code&gt;, up to 4,000s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare (Free/Pro)&lt;/td&gt;
&lt;td&gt;100 seconds&lt;/td&gt;
&lt;td&gt;No — fixed. Enterprise customers can request custom values.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Corporate proxies, gateways&lt;/td&gt;
&lt;td&gt;Varies — often invisible&lt;/td&gt;
&lt;td&gt;Depends on deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For ALB, raise the limit if your workload genuinely needs a longer window. The &lt;code&gt;idle_timeout.timeout_seconds&lt;/code&gt; attribute is adjustable in the load balancer configuration and takes effect immediately without a redeployment.&lt;/p&gt;

&lt;p&gt;For Cloudflare Free and Pro plans, you can't raise the limit. The server-side ping approach at 50 seconds is the only viable mitigation.&lt;/p&gt;

&lt;p&gt;If connections die at exactly 100 seconds in production, check &lt;code&gt;EdgeStartTimestamp&lt;/code&gt; and &lt;code&gt;EdgeStopTimestamp&lt;/code&gt; in Cloudflare's HTTP request logs to confirm the source before debugging elsewhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other connection challenges to consider
&lt;/h3&gt;

&lt;p&gt;Not all disconnects come from idle timeouts. Two other patterns hit agentic applications in production:&lt;/p&gt;

&lt;h4&gt;
  
  
  Corporate VPN and enterprise proxy traversal
&lt;/h4&gt;

&lt;p&gt;Many enterprise networks don't forward the HTTP Upgrade header required to open a WebSocket connection. The connection never opens rather than dropping mid-session. The failure appears at the WebSocket handshake stage — typically a non-101 HTTP response — not as a silent close after inactivity.&lt;/p&gt;

&lt;p&gt;The fix is protocol fallback: when a proxy blocks the WebSocket upgrade, the transport degrades automatically to HTTP streaming or long-polling without per-deployment configuration.&lt;/p&gt;

&lt;h4&gt;
  
  
  Mobile network handoffs
&lt;/h4&gt;

&lt;p&gt;Switching from WiFi to cellular drops the underlying TCP connection immediately. On mobile, the client's &lt;code&gt;onclose&lt;/code&gt; event often doesn't fire — the OS terminates the connection without a clean close frame. On iOS specifically, background TCP connections are suspended within seconds of the app moving to the background, again without notification.&lt;/p&gt;

&lt;p&gt;Don't rely on &lt;code&gt;onclose&lt;/code&gt; to trigger reconnection for mobile users. Use failed-send detection and an application-level heartbeat timeout to catch silent closes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What transport reconnection recovers — and what it doesn't
&lt;/h2&gt;

&lt;p&gt;Here's where most teams discover the gap. Reconnecting the WebSocket connection restores the transport. It doesn't restore the state of the session that was in flight when the connection dropped.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transport reconnection recovers&lt;/th&gt;
&lt;th&gt;It doesn't recover&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;The WebSocket connection itself&lt;/td&gt;
&lt;td&gt;Tokens generated while disconnected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active session subscriptions&lt;/td&gt;
&lt;td&gt;Tool call results that arrived during the gap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The ability to send and receive new messages&lt;/td&gt;
&lt;td&gt;The agent's reasoning trace if streamed as events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The session ID and session name&lt;/td&gt;
&lt;td&gt;The position in the ongoing generation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After a successful reconnect with only transport-layer recovery, the client is back online, but the session is in an indeterminate state. The client holds a partial response from before the disconnect. The agent continued generating on the server side. Neither side knows where the other stopped.&lt;/p&gt;




&lt;h2&gt;
  
  
  How session recovery works
&lt;/h2&gt;

&lt;p&gt;This is where &lt;a href="https://ably.com/solutions/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; comes in. AI Transport acts as the session and delivery layer between your agent and your users.&lt;/p&gt;

&lt;p&gt;The agent publishes every event — each generated token, each tool call, each reasoning step — to a session. AI Transport stores those events and is responsible for delivering them to the client whenever the client is connected. From the agent's side, this is fire-and-forget: it doesn't care whether the client is online, offline, mid-reconnect, or freshly loaded into a new browser tab.&lt;/p&gt;

&lt;p&gt;When a client connects or reconnects, it asks for everything it hasn't already seen. AI Transport returns the missed events, in order, before the live stream resumes. There's no "live vs. history" boundary the application needs to reason about, and no difference in handling for a 30-second drop vs. a 30-minute disconnect vs. a fresh page load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One detail worth understanding:&lt;/strong&gt; the session doesn't store one event per token. Tokens are appended to a single message per agent response — conflation — so the session history contains one accumulated message per response, not thousands of token-sized events. A client reconnecting mid-stream receives the in-progress message in its current accumulated form and resumes streaming from there. A client loading the page fresh receives the same accumulated message as a single coherent block. The application doesn't write reconciliation logic for either case.&lt;/p&gt;

&lt;p&gt;For more on how this works in practice, see AI Transport's &lt;a href="https://ably.com/docs/ai-transport/reconnection" rel="noopener noreferrer"&gt;reconnection and recovery&lt;/a&gt; and &lt;a href="https://ably.com/docs/ai-transport/history" rel="noopener noreferrer"&gt;history and replay&lt;/a&gt; documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the user should see during a disconnect
&lt;/h2&gt;

&lt;p&gt;Session recovery handles the infrastructure layer. But a reconnect that works silently in the background still needs the right UI treatment to avoid looking like a failure.&lt;/p&gt;

&lt;p&gt;AI Transport exposes &lt;a href="https://ably.com/docs/connect/states" rel="noopener noreferrer"&gt;well-defined connection states&lt;/a&gt;. The key distinction: the &lt;strong&gt;disconnected state&lt;/strong&gt; (temporarily offline, retrying automatically) vs. the &lt;strong&gt;suspended state&lt;/strong&gt; (retry window exhausted).&lt;/p&gt;

&lt;p&gt;During disconnection: show a reconnecting indicator, not an error modal. In the suspended state: show a retry button. The session is intact and waiting — communicate that.&lt;/p&gt;




&lt;h2&gt;
  
  
  What building this yourself actually costs
&lt;/h2&gt;

&lt;p&gt;Building session recovery without a purpose-built layer means writing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A heartbeat loop&lt;/li&gt;
&lt;li&gt;A reconnection manager&lt;/li&gt;
&lt;li&gt;Manual state reconstruction logic&lt;/li&gt;
&lt;li&gt;A connection state component to surface each phase to the user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these is large in isolation. Together, they constitute infrastructure. And any infrastructure your team owns is infrastructure your team spends time and resources maintaining as requirements change.&lt;/p&gt;

&lt;p&gt;Ably AI Transport provides the session recovery layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic connection recovery within the two-minute window&lt;/li&gt;
&lt;li&gt;History compaction and replay so clients always receive clean, accumulated state on reconnect&lt;/li&gt;
&lt;li&gt;Protocol fallback from WebSocket to HTTP streaming to long-polling&lt;/li&gt;
&lt;li&gt;Bidirectional signaling on the same session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What remains in your application code is the connection state UI — surfacing the reconnecting and suspended states to the user — and that's a handful of lines rather than a system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I stop AI chat sessions from timing out?
&lt;/h3&gt;

&lt;p&gt;Configure your WebSocket server to send ping frames at a fixed interval below the shortest timeout on your path. A 50-second interval sits comfortably below both the AWS ALB 60-second default and Cloudflare's fixed 100-second limit on Free and Pro plans, with browsers responding automatically — no client-side code required. If your workload needs a longer window, raise the &lt;code&gt;idle_timeout.timeout_seconds&lt;/code&gt; attribute in your ALB configuration; it's adjustable up to 4,000 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens if a user disconnects during LLM streaming?
&lt;/h3&gt;

&lt;p&gt;With AI Transport, the session resumes automatically upon reconnect, with missed tokens delivered in order before new ones arrive, and no application code needed. For longer disconnects, AI Transport's &lt;a href="https://ably.com/docs/ai-transport/history" rel="noopener noreferrer"&gt;history and replay feature&lt;/a&gt; loads the full conversation from the session history. Without a session layer, tokens generated during the dropout are lost, and the agent can't resume from the point of interruption.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I avoid duplicate AI messages after a WebSocket reconnect?
&lt;/h3&gt;

&lt;p&gt;With AI Transport you don't need to — the SDK handles this through history compaction. Tokens are streamed as appends to a single message per agent response, and the session history stores one message per response rather than one per token. When a client reconnects or refreshes, it receives the single accumulated message rather than individual tokens to reconstruct.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?
&lt;/h3&gt;

&lt;p&gt;The AWS Application Load Balancer idle timeout defaults to 60 seconds and applies to all connection types, including WebSocket. Raise it by updating the &lt;code&gt;idle_timeout.timeout_seconds&lt;/code&gt; load balancer attribute. The valid range is one to 4,000 seconds; most AI agent workloads are well served by a value between 3,600 and 4,000 seconds. The change takes effect immediately without requiring a redeployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Cloudflare close WebSocket connections? What is the timeout?
&lt;/h3&gt;

&lt;p&gt;Yes. Cloudflare enforces a 100-second idle timeout on WebSocket connections for Free and Pro customers. The limit is fixed on those plans and can't be raised. Enterprise customers can configure a custom value through their account team. To keep connections alive on Free and Pro plans, configure your WebSocket server to send ping frames every 50 seconds. Browsers respond automatically with pong frames, which reset Cloudflare's idle timer and the 60-second AWS ALB default simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can WebSockets work behind a corporate VPN or enterprise proxy?
&lt;/h3&gt;

&lt;p&gt;They can, but many enterprise proxies don't forward the HTTP Upgrade header required to open a WebSocket connection. When that happens, the connection fails at the handshake stage rather than dropping mid-session. That failure is distinct from a timeout: the error occurs before any data flows, not after a period of inactivity. Protocol fallback to HTTP streaming or long-polling handles proxy blocking at the infrastructure layer without per-deployment configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does Ably retain channel history for session recovery?
&lt;/h3&gt;

&lt;p&gt;AI Transport replays missed messages automatically on reconnect, with no application code needed. For longer disconnects, session history loads the full conversation, persisting for 24 to 72 hours depending on your Ably plan, with extended retention available on higher tiers.&lt;/p&gt;




&lt;p&gt;What's your experience here — have you run into session state loss specifically, or mostly fought the transport reconnection side of the problem? Interested in what patterns teams are using to handle the UI side of a mid-stream disconnect.&lt;/p&gt;

</description>
      <category>websockets</category>
      <category>ai</category>
      <category>javascript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why your AI chat reconnects but your session doesn't</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Wed, 27 May 2026 10:05:10 +0000</pubDate>
      <link>https://dev.to/ably/why-your-ai-chat-reconnects-but-your-session-doesnt-36jg</link>
      <guid>https://dev.to/ably/why-your-ai-chat-reconnects-but-your-session-doesnt-36jg</guid>
      <description>&lt;p&gt;TL;DR: WebSockets are the right protocol for production AI chat. But the connection is stateless at the session level. When it drops — AWS ALB defaults to 60 seconds, Cloudflare to 100 seconds on Free and Pro plans — all in-flight tokens, tool call results, and agent context disappear. Reconnection logic restores the socket. It doesn't restore the session. That's the gap this post covers.&lt;/p&gt;

&lt;p&gt;WebSockets are the right protocol for production AI chat. But that fact doesn't prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context.&lt;/p&gt;

&lt;p&gt;The reconnected socket has no view of what happened while it was down. Three conditions cause this routinely: a proxy timeout mid-task, a page reload mid-generation, and a mobile network handoff. Each breaks for the same underlying reason: the WebSocket protocol handles transport, not session state, and reconnection logic doesn't change that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WebSockets are the right protocol for production AI chat: bidirectional, persistent, and suited to live steering and tool calls in ways SSE isn't.&lt;/li&gt;
&lt;li&gt;A WebSocket connection is stateless at the session level. When it closes through a proxy timeout, page reload, or device switch, all state disappears with it.&lt;/li&gt;
&lt;li&gt;Reconnection logic re-establishes the transport. It does not recover the tokens, tool calls, or agent context in flight when the connection is dropped.&lt;/li&gt;
&lt;li&gt;What fills the gap is a session layer: infrastructure that persists conversation state against a session ID and replays it to reconnecting clients.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What WebSockets get right for AI chat
&lt;/h2&gt;

&lt;p&gt;The protocol question is worth settling early, because the rest of this piece argues about the infrastructure layer above it. For production AI chat, the choice is WebSockets or SSE. Both stream tokens to the client, but only WebSockets let signals flow the other way.&lt;/p&gt;

&lt;p&gt;WebSockets are bidirectional. When your user cancels mid-stream, that signal travels back on the same channel; tool call confirmations and workflow approvals work the same way. When a workflow pauses for human input mid-execution, that input must arrive in-band, not via a polling endpoint.&lt;/p&gt;

&lt;p&gt;SSE is a one-way stream. For simple chatbots on stable networks, that doesn't matter. Add tool calls, mid-stream cancellation, or multi-device continuity, and it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where production AI connections actually fail
&lt;/h2&gt;

&lt;p&gt;Not all connection drops come from bad network conditions. The more common causes in production are infrastructure defaults designed for HTTP requests, not AI chat. A response can be mid-generation for tens of seconds, and most defaults weren't built for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Application Load Balancer idle timeout.&lt;/strong&gt; AWS ALB closes connections idle for 60 seconds by default, per the &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html" rel="noopener noreferrer"&gt;AWS Application Load Balancer documentation&lt;/a&gt;. For standard HTTP that's generous. For an agent waiting on a downstream API, 60 seconds of silence is routine, and the connection closes without warning. Your user's response stops mid-sentence with no explanation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloudflare proxy timeout.&lt;/strong&gt; On Cloudflare Free and Pro plans, WebSocket connections terminate after 100 seconds of inactivity, as documented in &lt;a href="https://developers.cloudflare.com/workers/observability/dev-tools/troubleshoot-websockets/" rel="noopener noreferrer"&gt;Cloudflare's WebSocket troubleshooting guide&lt;/a&gt;. Enterprise plans can raise this limit; on Free and Pro plans, the ceiling is fixed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile network handoffs.&lt;/strong&gt; Switching from WiFi to cellular drops the underlying TCP connection immediately, taking the WebSocket with it. On mobile this happens during normal use: walking between coverage areas, backgrounding the tab, entering a building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page reload and tab crash.&lt;/strong&gt; Your user reloads mid-generation, or the browser crashes, both of which are routine. The connection closes, and any session state tied to it is gone unless something stored it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why reconnection logic doesn't fix the session problem
&lt;/h2&gt;

&lt;p&gt;The standard reconnection pattern re-establishes the socket. Transport recovers in milliseconds. But it cannot restore the state that was in flight when the connection dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token stream position.&lt;/strong&gt; The response kept generating while the connection was dark. Those tokens went nowhere. When the client reconnects, it arrives mid-sentence or finds nothing at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool call results.&lt;/strong&gt; Some chat responses depend on realtime data: a lookup, a search, or an action your user triggered. If the connection dropped while the agent was waiting for that result, the response either never came — or ended before it could use the information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent context.&lt;/strong&gt; In a multi-turn exchange, the agent accumulates context: what was asked, what was answered, and what's in progress. When a session drops and reconnects without state recovery, the agent and the client are at different points in the same conversation. Your users experience this as a loss of thread: a response that ignores what came before, or one that repeats something already answered.&lt;/p&gt;

&lt;p&gt;The pattern most teams reach for is a Redis buffer: sequence number tracking, offset storage, and deduplication keys between the agent and the client. It handles full page reloads. It tends to break on deploy-triggered reconnects, mobile handoffs that hit the reconnect window twice, and anything that generates messages faster than the buffer drains.&lt;/p&gt;

&lt;p&gt;Even Vercel's AI SDK lead built a pluggable interface to fill this gap. Every team reaching this point builds the same infrastructure from scratch and chooses to own it indefinitely. Reconnection handles the protocol layer; session state sits one layer above it, and it's a separate problem entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  What production AI chat needs from the transport layer
&lt;/h2&gt;

&lt;p&gt;Any viable approach to production AI sessions needs to satisfy four requirements. These are implementation-neutral: what any infrastructure option has to provide, regardless of vendor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent state storage.&lt;/strong&gt; Conversation history, token positions, tool call inputs and outputs, and agent state must be stored against a stable session ID and survive connection drops. The session ID is the anchor: the same session must be addressable after any reconnect, from any device.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offset-based replay.&lt;/strong&gt; A returning client requests messages from its last received serial. The infrastructure delivers everything missed, in order, with no duplicates. The client supplies its offset; the infrastructure fills the gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocol fallback.&lt;/strong&gt; When a WebSocket upgrade is blocked by a proxy or firewall, the transport degrades to HTTP streaming or long-polling automatically. This should not require per-deployment configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-device delivery.&lt;/strong&gt; Any authenticated device subscribing to a session ID receives the current state plus history. The session is not bound to the tab, browser, or device that opened it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Ably AI Transport solves the session layer problem
&lt;/h2&gt;

&lt;p&gt;Thankfully, you don't need to build the infrastructure. Ably AI Transport is the durable session layer — the thing that makes the user experience survive what the WebSocket protocol cannot. The session lives in Ably; your application talks to it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjicewkeblgjqpotxqy85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjicewkeblgjqpotxqy85.png" alt="Channel-as-session diagram: agent publishes tokens and events, clients subscribe from any device and catch up on reconnect" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The five failures raised in this post each map directly to a capability:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection drops from proxy timeouts, mobile handoffs, and page reloads.&lt;/strong&gt; The transport degrades automatically — WebSocket first, then HTTP streaming, then long-polling — so the session survives the infrastructure defaults that break standard WebSocket connections. No per-deployment configuration required. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Reconnection and recovery&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tokens generated while the client was disconnected.&lt;/strong&gt; The token stream is stored against the session. On reconnect, the client receives everything it missed in order, with no duplicates. The developer doesn't track offsets or implement catch-up logic. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Token streaming&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool call results and agent context lost mid-task.&lt;/strong&gt; Agent state, tool call inputs and outputs, and conversation history are all published to the session as they generate. A reconnecting client recovers the full context, not just the tokens. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Reconnection and recovery&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mid-stream steering and human-in-the-loop signals.&lt;/strong&gt; Cancellations, approvals, and human input travel back to the agent on the same session channel. The bidirectional requirement that rules out SSE is covered without a separate signaling mechanism. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Human in the loop&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sessions tied to a single tab or device.&lt;/strong&gt; Any authenticated device subscribing to the session ID receives current state plus history. A conversation started on desktop continues on mobile without restart. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Multi-device sessions&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Get started: &lt;a href="https://ably.com/docs/ai-transport/vercel-ai-sdk" rel="noopener noreferrer"&gt;Vercel AI SDK&lt;/a&gt; · &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Core SDK&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When is SSE still the right choice for AI chat?
&lt;/h3&gt;

&lt;p&gt;SSE is a reasonable starting point for chatbots that follow a simple request-response pattern: a user submits a message, the server streams tokens, no interruption required. It deploys more easily than WebSockets, carries no persistent connection overhead, and works well on stable networks.&lt;/p&gt;

&lt;p&gt;The constraints appear when your application starts adding agentic behaviour: tool calls, mid-stream cancellation, multi-device continuity, and background tasks that complete while the user is offline. At that point, SSE's unidirectional architecture stops being a trade-off and becomes a blocker.&lt;/p&gt;

&lt;h3&gt;
  
  
  What timeout values should I configure to prevent AI connection drops in production?
&lt;/h3&gt;

&lt;p&gt;Set your AWS ALB idle timeout to at least 3,600 seconds for WebSocket connections. The 60-second default was designed for HTTP requests, not long-running agent tasks. On Cloudflare Free and Pro plans, the WebSocket timeout is fixed at 100 seconds. Send heartbeat pings at around 25-second intervals to stay well below that threshold.&lt;/p&gt;

&lt;p&gt;For Nginx, the equivalent setting is &lt;code&gt;proxy_read_timeout&lt;/code&gt;. These three changes cover most production timeout failures for AI chat deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does reconnection logic solve the session recovery problem?
&lt;/h3&gt;

&lt;p&gt;Reconnection logic solves the transport problem. It doesn't solve the state problem. Exponential backoff and heartbeats re-establish the socket.&lt;/p&gt;

&lt;p&gt;But they can't recover tokens generated during the gap, tool call results that arrived while the client was disconnected, or context accumulated across multiple steps. Preventing duplicate messages on reconnect requires sequence numbers or idempotency keys at the session layer, not the WebSocket layer. A client that reconnects without a session layer arrives at an empty context and either loses the conversation or restarts it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Ably replay missed messages after a WebSocket reconnect?
&lt;/h3&gt;

&lt;p&gt;Ably assigns every published message a serial number. When a client reconnects, the transport layer uses the internal &lt;code&gt;untilAttach&lt;/code&gt; mechanism to fetch messages published during the gap. This bounds the history query to the exact reconnection point.&lt;/p&gt;

&lt;p&gt;Ably delivers everything missed in order, with no overlap between historical and live messages. The client doesn't track its own offset or implement catch-up logic. Every plan includes two minutes of ephemeral history by default. Persisted channels extend this to 72 hours on Standard plans, or up to 365 days on Pro and Enterprise plans.&lt;/p&gt;




&lt;p&gt;Have you hit this in production? Curious what the failure looked like - was it the proxy timeout, a page reload, or something else that first surfaced it?&lt;/p&gt;

</description>
      <category>websockets</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How NASCAR delivers realtime racing data to millions of fans around the world</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Wed, 17 Jan 2024 09:27:12 +0000</pubDate>
      <link>https://dev.to/ably/how-nascar-delivers-realtime-racing-data-to-millions-of-fans-around-the-world-1a73</link>
      <guid>https://dev.to/ably/how-nascar-delivers-realtime-racing-data-to-millions-of-fans-around-the-world-1a73</guid>
      <description>&lt;p&gt;Playing around with streaming realtime data is one thing, but have you ever wondered how you would handle the challenge of streaming realtime data to millions of racing fans?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.nascar.com/drive" rel="noopener noreferrer"&gt;NASCAR Drive&lt;/a&gt; has built an industry-leading platform that handles the distribution of 1.3TB of telemetry data in a single race, while over 80 million fans immerse themselves in the race from an in-cockpit view that offers a live 360 camera feed and access to the same car telematics as the driver and team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wondering how they do it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We spoke to &lt;a href="https://www.linkedin.com/in/chad-larter/" rel="noopener noreferrer"&gt;Chad Larter&lt;/a&gt;, Senior Director of Technical Operations for NASCAR, in our webinar on January 31st - and you can now &lt;a href="https://hubs.la/Q02jCxH_0" rel="noopener noreferrer"&gt;watch it on demand&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Make sure not to miss it if you’re interested to learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How to stream 1.3TBs of data per race to over 80 million fans - complete with highly detailed stats that update in realtime&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why NASCAR decided to bring this solution in-house – and how they built the technology to achieve it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How NASCAR solved the data surge and streaming challenges&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/zzOY9NdTyI0"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;p&gt;If you've watched the video, you know there was a lot to take in, so here are some of the key points covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability:&lt;/strong&gt; Tens of thousands of users connect during major races like Daytona 500, with major traffic spikes occurring following in-race events.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data processing:&lt;/strong&gt; Over 100 data points are collected, filtered and downsampled to 2 updates/second for realtime fan consumption, across devices - 1.3 TB per race.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Platform efficiencies:&lt;/strong&gt; Only changes in data are broadcasted to clients, using binary deltas, reducing bandwidth consumption.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long polling vs WebSockets:&lt;/strong&gt; In comparison to their previous long-polling solution the use of a WebSockets platform proved much quicker and puts a lot less stress on networks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shared insights:&lt;/strong&gt; Fans gain access to the same detailed data used by teams and OEMs, providing a deeper understanding of the race.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fan engagement:&lt;/strong&gt; Consumers spend significant time (30 minutes to 3 hours) consuming race data, highlighting the success of delivering an enhanced engagement experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Customization:&lt;/strong&gt; Fans want to consume data their way, gaining insights on their favourite drivers/cars - not be limited by broadcasters focusing on the leading cars.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More to come:&lt;/strong&gt; NASCAR are exploring the use of realtime data for leaderboards, chat and additional content - moving away from polling methods.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions about how NASCAR uses Ably Pub/Sub, the applications it can power, or how it could work for your use case, please visit our &lt;a href="https://ably.com/fan-engagement" rel="noopener noreferrer"&gt;fan engagement page&lt;/a&gt; or &lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;sign up&lt;/a&gt; to get started for free.&lt;/p&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>webdev</category>
      <category>interview</category>
    </item>
  </channel>
</rss>
