DEV Community: Maddy Quinn

Why SSE breaks down for production AI customer support chat

Maddy Quinn — Tue, 07 Jul 2026 12:30:00 +0000

TL;DR: SSE and WebSockets handle AI chat differently once you add cancellation, escalation, or multi-device support. Here's when each one holds up in production.

WebSockets and SSE both stream AI chat responses to the browser, but they handle cancellation, reconnection, and device switching very differently. For a customer support chat product, that difference shows up exactly when a customer is already frustrated: mid-cancel, mid-escalation, or mid-device-switch. This post covers what each protocol does, where SSE's constraints surface in production support chat, and how to choose between them.

Key takeaways

When a customer needs to cancel a runaway response or get transferred to a human, SSE's one-way connection means that signal has to travel over a separate request from the response it's trying to interrupt. That introduces a coordination gap at the exact moment customer trust is most fragile.
SSE's built-in auto-reconnect resumes the connection, not the conversation. After a drop, the customer gets a fresh stream, not the context they had a moment before. If the agent was mid-way through a refund lookup, that work is gone: the customer has to re-explain the issue from scratch, or a support agent has to pick up manually with no record of what the AI had already found.
Moving a support conversation from mobile to desktop breaks an SSE connection outright, and switching to WebSockets alone doesn't fix this either, since session state lives with the connection, not the customer.

What are WebSockets and SSE?

WebSockets and SSE both let a server push data to a browser. They differ in almost every mechanical respect that matters for AI chat.

Property	WebSockets	SSE
Direction	Bidirectional: client and server both send on the same connection	One-way: server to client only
Transport	Dedicated `ws://` or `wss://` connection after an HTTP upgrade handshake	Standard HTTP connection, `text/event-stream` content type
Reconnection	Not built in: the application has to detect a drop and reconnect	Built in: the browser's `EventSource` API reconnects automatically
Data format	Text and binary	UTF-8 text only
Concurrent connections	No browser-imposed limit	Limited to six per domain under HTTP/1.1, per MDN's EventSource documentation; removed under HTTP/2
Proxy and firewall behavior	The upgrade handshake can be blocked by packet-inspecting firewalls	Standard HTTP, so it passes through most enterprise proxies without special handling

Neither protocol is "better" in the abstract. WebSockets give you a channel both sides can write to. SSE gives you a simpler one-way stream that reconnects on its own.

What changes the calculation is what your AI chat product needs to do once it is running in front of real customers.

Why the WebSockets vs SSE choice matters for production AI chat

A prototype AI chat feature rarely tests the cases that expose this choice. A single user sends a message, waits for the response, and closes the tab.

Production customer support chat looks nothing like that. Customers cancel responses that are heading in the wrong direction. Conversations get escalated to a human mid-stream. Customers also switch from a mobile app to a desktop browser partway through resolving an issue, expecting the conversation to still be there.

Each of these is a coordination problem, not only a streaming problem.

A cancellation signal has to reach the AI agent.
Escalation context has to reach the human agent who picks up.
Conversation state has to be available on whichever device the customer opens next.

All of this requires more than pushing tokens from server to client. The transport choice determines how much of that coordination comes for free.

The cost of getting this wrong is not abstract. Support agents receiving an escalated conversation with no record of what the AI had already established have to start over, and the customer notices immediately.

A customer who cannot stop a response from generating loses trust in the interface fast. A customer who reconnects to nothing continues the conversation with more friction than they had before.

How SSE breaks down under real AI chat conditions

SSE is a defensible choice for a chatbot that streams a single response with no client-to-server signaling required. Customer support AI chat asks more of the connection than that. The following are the specific points where the gap shows up.

Canceling or interrupting a response mid-stream

SSE has no channel for the client to send anything back while a stream is active. If a customer wants to stop a response that has gone off track, the application has to open a separate HTTP request to signal cancellation.

That separate request has to reach the same backend process generating the response. It has to be matched to the correct in-flight generation, and it has to stop that generation cleanly. None of this coordination is provided by SSE itself.

Building it introduces a class of bugs a single bidirectional channel does not have: a cancel request that arrives after the stream has already ended, or one that targets the wrong generation entirely.

Escalating from AI to a human agent mid-conversation

Handing a conversation from an AI agent to a human support agent is one of the most common flows in customer support chat. It depends on the receiving human getting full context instantly: what the customer asked, what the AI already tried, and where it got stuck.

SSE's one-way design does not provide a natural place for that handoff to happen. The context has to be assembled and transferred through a separate mechanism, built specifically for this flow, rather than falling out of the transport the conversation already runs on.

Continuing a conversation across devices

A customer who starts a support conversation on their phone during a commute expects it to be exactly where they left it when they pick it up on their laptop later. An SSE connection is tied to the browser tab that opened it.

Opening the same conversation on a second device starts an entirely separate SSE connection, with no relationship to the first. The session state has to be reconstructed from something other than the transport layer.

Enterprise proxy and firewall behavior

This is the one area where SSE has a genuine, durable advantage. SSE runs over a standard HTTP connection, so it passes through most enterprise proxies and corporate firewalls without special configuration.

WebSockets rely on an HTTP upgrade handshake, and some packet-inspecting firewalls do not handle that handshake cleanly. This can cause the connection to fail silently on corporate networks.

If your customer support product serves B2B customers on locked-down enterprise networks, this is a real constraint to weigh. It is not a reason to dismiss WebSockets outright: the failure mode is solvable with protocol fallback rather than by avoiding WebSockets entirely.

How to choose between WebSockets and SSE for AI chat

SSE

When it works:

Your AI chat is single-turn or short-lived
The customer never needs to send anything back once a response has started
There is no escalation-to-human flow or device-switching requirement

A simple FAQ bot fits this profile well, and SSE's HTTP-native behavior on enterprise networks is a genuine point in its favor.

When it hurts:

Customers need to cancel responses mid-stream
Conversations get escalated to human agents mid-conversation
Customers expect a conversation to continue across devices

Each of these requires building coordination logic on top of SSE that a bidirectional connection would give you by default.

WebSockets

When it works:

Your product needs cancellation, escalation, or multi-device coordination
A single bidirectional connection can carry all of these signals without a separate side-channel for each

This matches most production customer support AI chat, where escalation and interruption are core flows rather than edge cases.

When it hurts:

You have not accounted for enterprise proxy behavior, since the upgrade handshake can fail on networks with strict packet inspection
You have not built reconnection logic, since WebSockets do not reconnect automatically the way SSE does

Both are solvable, but neither comes for free with a raw WebSocket connection.

Adopting WebSockets solves the bidirectional signaling problem. It does not, by itself, solve session continuity. A reconnected WebSocket is a new connection, and without an added session layer, the conversation state that lived with the old connection is still gone.

How Ably AI Transport adds durable sessions on top of WebSockets

A durable session keeps conversation state tied to the conversation itself, rather than to any single connection. A reconnect, a device switch, or a human handoff does not lose context. WebSockets alone give you a bidirectional channel, but the session state still has to live somewhere, and most teams end up building that layer themselves.

Ably AI Transport is built on this idea. A session outlives its underlying connection: reconnection, multi-device delivery, and cancellation are properties of how the session works, not features you build on top. A client that reconnects after a drop picks up from where it left off, and any device the customer opens joins the same session in progress.

This comes with a real tradeoff. Ably AI Transport is a new dependency, working through Ably's infrastructure rather than a direct connection between your server and the client.

For teams already running other realtime features on Ably, this is a natural extension. For teams with no existing Ably footprint, it is a genuine build-versus-adopt decision, not a given.

Docs go deeper on how the session layer works.

FAQ

Is SSE ever the right choice for AI chat?
Yes. If your AI chat has no cancellation, escalation, or multi-device requirement, such as a simple single-turn assistant, SSE's simplicity and HTTP-native firewall behavior make it a reasonable starting point.

What does SSE not support that WebSockets would solve for AI streaming?
SSE cannot carry a signal from the client back to the server on the same connection. Cancellation, live steering, and any mid-stream client input all need a separate mechanism. WebSockets solve this by carrying both directions on one connection, though they still need an added session layer to solve reconnection and multi-device continuity.

Why does SSE make stream cancellation unreliable?
SSE only carries data from server to client. Canceling a response requires a separate HTTP request outside the stream itself, which has to be matched to the correct in-flight generation on the backend. That coordination is not part of the protocol and has to be built by the application.

Does WebSocket-based AI chat work behind enterprise proxies and corporate VPNs?
Not always by default. The WebSocket upgrade handshake can be blocked by firewalls that perform packet inspection, which is more common on corporate networks than on consumer ones. Protocol fallback, trying WebSockets first and falling back to HTTP streaming, addresses this without giving up WebSockets' capabilities elsewhere.

Does switching from SSE to WebSockets alone fix session continuity?
No. WebSockets solve the bidirectional signaling problem, since cancellation and interruption can travel over the same connection. But a reconnected WebSocket is still a new connection, and without an added session layer, conversation state does not automatically survive a reconnect or a device switch.

Curious how others are handling human handoff in production AI chat — SSE with a side-channel, or a bidirectional transport from the start?

Originally published on the Ably blog.

Why chat.stop() doesn't cancel your LLM generation (and what to build instead)

Maddy Quinn — Fri, 26 Jun 2026 11:00:00 +0000

You add a stop button to your AI chat app: a customer support agent, a coding assistant, a research tool the user can steer mid-task. A user clicks it mid-response. The frontend stops rendering. Then you check your backend logs and realize the underlying generation is still running, and you're still paying for every token.

This is not a bug. The Vercel AI SDK docs document it explicitly: in a resumable stream setup, calling stop() only closes the current HTTP connection and should not cancel the underlying generation. The same applies to closing a tab or refreshing the page. The client disconnects; the server keeps running.

Key takeaways

Calling chat.stop() in the Vercel AI SDK closes the client connection but does not cancel server-side generation. The underlying generation keeps running, and billing continues.
Fixing this requires a dedicated stop endpoint with idempotency checking, partial assistant snapshot persistence, and backend-specific cancellation logic. None of which the SDK provides.
HTTP streaming is one-way. The server cannot distinguish an intentional stop from a network drop without an explicit signal sent separately from the stream.
On an Ably session, cancel is an explicitly named signal. The server knows immediately whether to stop, wait, or redirect, with no additional endpoint required.

Why `stop()` and disconnect mean different things

When you call chat.stop() in useChat, or when a user closes their browser tab, one thing happens: the HTTP connection closes. HTTP streaming is one-way: the server sends, the client receives. There is no signal in a closed connection that tells the server why it closed. A deliberate stop and a network drop look identical.

This is intentional in resumable stream architectures. They are designed to survive disconnects: if the connection drops, the client should be able to reconnect and pick up where it left off. Keeping generation running through a connection loss is the correct behavior. But a user clicking stop triggers exactly the same response.

The Vercel AI SDK docs are explicit about this: "a client-side abort (e.g. closing the page or refreshing) only closes the current HTTP connection. It is not a request to cancel the underlying work." If your stop button only calls stop(), the model request, background job, workflow, or stream writer keeps running, and the client can reconnect to the same active stream.

The same constraint applies to every other form of user control over a running agent. Say a user is running a research agent and wants to redirect mid-response: "actually, focus on flights only." There is no way to deliver that instruction over the existing stream. You need a separate endpoint, or some other mechanism alongside the stream. Server-Sent Events (SSE), the default transport for most AI SDK setups, cannot carry a signal back to the server. The stream flows one way.

What a correct stop implementation actually requires

The Vercel AI SDK documents the correct approach: build a dedicated stop endpoint. And that endpoint needs to do four things.

Persist the partial assistant snapshot. Before canceling, the client sends its current partial assistant message to the stop endpoint. This preserves what the user has already seen. Without this step, the assistant message disappears from the conversation when the stream closes.

Check the activeStreamId. Your application tracks which stream is active for each chat. The stop endpoint reads this value and compares it against the stream ID the client sent with the request. If a newer stream has started because the user sent a new message while the stop request was in flight, the stop request is stale and should be ignored.

Cancel the active work. This is the backend-specific step. In a Redis-backed resumable stream setup, you close the stored stream and abort the model request writing to it. In a workflow setup, you cancel the workflow run. In a job queue setup, you cancel the job or write a cancellation flag the job polls. The SDK cannot do this for you because it does not know your backend architecture.

Clear the activeStreamId. Once cancellation is confirmed, clear the stored stream reference, but only if it still matches the stream you intended to cancel. A newer stream may have started between the cancellation request and the completion of the cancel logic.

Each step exists to address a specific race condition. Between the moment a user clicks stop and the moment the server processes the request, a new message can be sent, a new stream can start, or the partial assistant message can be overwritten by a server-side completion. The stop endpoint handles all of these correctly only if it checks every condition in sequence.

This is buildable. The AI SDK docs provide a full implementation. But consider what you are actually shipping: a dedicated HTTP endpoint, a stream ID tracking layer, a partial message persistence mechanism, and backend-specific cancellation logic. The SDK provides none of it. All of it has to stay in sync with the rest of your streaming infrastructure. Most developers discover this after they ship their first stop button.

Three questions to ask about your stop button before shipping

Before you ship, answering these three questions will tell you whether your stop button actually does what it looks like it does.

Does clicking stop actually stop backend generation, or does it only stop the client from receiving tokens? If you have not built a stop endpoint, the answer is the latter.
What happens to the partial assistant message when stop is called? If you are not persisting a snapshot server-side, the message may disappear or be overwritten when the stream closes.
What happens if a new message is sent while a stop request is in flight? If your stop endpoint does not check the activeStreamId, it may cancel a stream the user has already moved past.

If all three have clean answers, your stop button works. If not, the gap will show up in production, usually after a user notices their coding assistant or support agent kept billing them for a response they clicked stop on.

All three problems trace back to the same root cause: HTTP streaming gives the server no way to distinguish intent from a connection event. There is an approach that removes the problem at the transport level rather than working around it.

How a bidirectional session changes the stop vs disconnect distinction

Ably AI Transport is built on a different model. Instead of HTTP streaming, it uses a persistent bidirectional session. The client and server can both send signals at any time, over the same connection. That means cancel, stop, and redirect are first-class signals, not workarounds built on top.

On an Ably session, cancel is a named signal rather than an inference from a dropped connection. The client publishes a cancel signal on the session: session.cancel(runId). The server receives it on the corresponding run, and its abortSignal fires. Generation stops. The run ends with the reason 'cancelled', and every subscriber receives the lifecycle update.

Because the cancel is a session event rather than a TCP disconnection, the server knows exactly what happened. A network drop does not fire the cancel handler. A user clicking stop does. The session remains intact, and the next message starts a new run cleanly.

The race condition that the stop endpoint exists to solve is handled natively. Each run has a unique runId. A cancel signal targeting a run that has already ended is ignored, and multiple signals matching the same run cancel it once.

For patterns beyond cancellation, the session supports cancel-then-send (cancel the active run and immediately send a new message) and send-alongside (send a new message while the active run continues). See the interruption docs for full implementation guidance.

For the Vercel AI SDK-specific analysis, including GitHub citations and billing evidence, see why Vercel AI SDK stop doesn't cancel the stream.

" width="799" height="382">

Canceling a run with Ably AI Transport

With Ably AI Transport, cancellation from the client is a single call:

// Cancel a specific run
await activeRun.cancel();

// Or cancel by runId, from any connected device
await session.cancel(runId);

On the server, the abort signal fires automatically:

const run = session.createRun(invocation);
await run.start();
await run.loadConversation(); // hydrate prior conversation history

const result = streamText({
  model: anthropic('claude-sonnet-4-6'),
  messages: await convertToModelMessages(run.messages),
  abortSignal: run.abortSignal, // fires when cancel() is called client-side
});

const { reason } = await run.pipe(result.toUIMessageStream());
await run.end(reason); // reason is 'cancelled' when abort fires

The abortSignal is passed directly to the model call. When the client cancels, the signal fires, generation stops, and the run ends with reason 'cancelled'. No stop endpoint to build, no activeStreamId to track, no race condition to guard against.

One edge case worth noting: cancellation is asynchronous, so a small tail of tokens may arrive after cancel() returns and before the server's abortSignal fires. Those tokens still belong to the cancelled run, not the next one. Also, any tool invocation that does not check the abortSignal will keep running until it completes, so if your agent calls tools, pass the signal through to each one.

Adopting Ably AI Transport: what changes in your stack

Shifting from HTTP streaming to an Ably session does not change your LLM call, your model provider, or your agent framework. AI Transport sits at the delivery layer, below orchestration. Your Vercel AI SDK, LangGraph, or custom agent logic stays unchanged. For teams using the Vercel AI SDK specifically, Ably ships a drop-in transport adapter, @ably/ai-transport/vercel, that swaps the transport underneath useChat without changing the hook.

What changes is the transport. Instead of an HTTP POST that returns a streaming response, the client opens an Ably session. Cancel, stop, and redirect become session signals, not HTTP endpoints.

There is a trade-off: an Ably session adds a persistent connection to your architecture. If stop is the only signal you need, a stop endpoint is the lighter choice. The session model earns its place when you need several of these signals: cancel, redirect, steer, human handover, multi-device continuity. They all run on the same infrastructure, so if you are already building one of those patterns, you are building the foundation for all of them.

Conclusion

The stop vs disconnect distinction is a structural property of HTTP streaming, not a framework bug. Closing an HTTP connection does not carry intent; only an explicit signal sent separately from the stream does.

A correct stop endpoint is buildable, but it is four moving parts that have to stay in sync with your streaming infrastructure. Most developers discover the gaps after they ship.

Ably AI Transport takes a different approach. On an Ably session, cancel is an explicit signal. Race conditions are handled at the transport level. The session persists through cancellation, and the next message starts a clean run.

Docs go deeper: Ably AI Transport cancellation docs | Interruption patterns | Vercel AI SDK stop documentation

Frequently asked questions

Does calling chat.stop() in the Vercel AI SDK cancel the underlying generation?

No. chat.stop() closes the HTTP connection. The underlying generation — the model request, background job, or stream writer — keeps running until it completes. You are billed for every token. The Vercel AI SDK documents this explicitly: a client-side abort is a disconnect signal, not a cancellation. Stopping generation requires a dedicated stop endpoint that you build and maintain alongside your streaming infrastructure.

Why can't the server detect a client disconnect and stop generation automatically?

The server can detect that the HTTP connection is closed. It cannot tell whether this was an intentional stop, a network drop, a page refresh, or a tab crash. In a resumable stream architecture, all four are treated as disconnects by design: the stream should survive a network drop. Treating every disconnect as an intentional stop would cancel streams on network blips and prevent reconnection. Distinguishing them requires an explicit signal from the client, which is why a stop endpoint is necessary.

What is activeStreamId checking, and why does my stop endpoint need it?

activeStreamId is a reference that your application stores, linking each chat to its currently active stream. The stop endpoint reads this value and compares it against the stream ID the client sends with the stop request. If a newer stream has started since the client initiated the stop, the stop request is stale and should be ignored. Without this check, the stop endpoint may cancel a stream the user has already moved past, leaving the conversation in an inconsistent state.

How does Ably's session model handle the stop vs disconnect distinction?

On an Ably session, cancel is an explicit event published by the client, either via activeRun.cancel() for the current run or session.cancel(runId) to target a specific run by ID. The server receives it as a named session signal, not as a TCP disconnection. A network drop does not trigger the cancel handler. An intentional stop does. These two events have separate handling, without requiring a stop endpoint or idempotency logic. The session remains intact after cancellation, and the next user message starts a clean run.

How do I build interruptible AI streaming, and is redirect or steer supported today?

You need a bidirectional session. With Ably AI Transport, calling activeRun.cancel() or session.cancel(runId) publishes an explicit cancel signal the server acts on immediately, regardless of connection state. activeRun.cancel() is the typical client-side call; session.cancel(runId) lets you target a specific run by ID, including from a different device. Beyond cancel, the session supports two interruption patterns: cancel-then-send, which cancels the active run before starting a new one, and send-alongside, which lets both runs continue concurrently. See the interruption docs for full implementation guidance.

What's your current approach to stop and cancellation in production? Do you have a stop endpoint, or are you relying on client-side disconnect? Would love to hear how others are handling this.

When should you replace DefaultChatTransport?

Maddy Quinn — Mon, 22 Jun 2026 14:18:09 +0000

TL;DR: DefaultChatTransport uses HTTP POST and SSE. This is correct for a single user on a stable connection - but it reaches its design boundary when production requires cancellation that reaches the server, multi-device delivery, stream resumption without Redis, or multi-user sessions. This post covers the four limits, a four-question self-audit, and what a WebSocket-based session layer adds.

You've built an AI chat app on the Vercel AI SDK. It works in development. The model responds, the stream comes through, and the UI updates cleanly. Then you ship to production, and the transport layer starts showing its edges.

Most of these failures are quiet: things that work in demos and break in ways that are hard to pin down until you know where to look. They share a common cause: DefaultChatTransport is built for HTTP, and HTTP has structural properties that some production requirements exceed. This piece explains what those limits are, which ones matter for your application, and what replacing the transport actually involves.

Key takeaways

DefaultChatTransport uses HTTP POST and Server-Sent Events (SSE). These protocols are one-way and point-to-point. That's correct behavior for a stateless serverless platform, not a bug in the SDK.
stop() fires the abort signal client-side and returns immediately. GitHub issue #9707 (open, October 2025) confirms the server cannot distinguish an intentional stop from a dropped connection, and may continue generating and billing until completion.
The official Vercel AI SDK stream resumption pattern requires Redis, the resumable-stream package, two custom API endpoints, and a dedicated stop handler. In a resumable stream setup, stop() is treated as a disconnect, not a cancel.
The ChatTransport interface is pluggable by design. Vercel's serverless platform cannot host persistent WebSocket connections, so they made the transport layer swappable. Replacing DefaultChatTransport with a WebSocket-based transport layer creates a durable session between your agent and client, without changing your agent, tool calls, or UI rendering.

How DefaultChatTransport works, and the conditions it was built for

When you call useChat() without a transport option, or pass a default config, DefaultChatTransport is what runs. It sends outgoing messages via HTTP POST and receives responses as an SSE stream.

For a single user on a stable connection, sending a message and waiting for the response, this is the right choice. A stateless serverless function receives the request, calls the model, and streams the response back. HTTP is the right tool for that, and DefaultChatTransport uses it correctly.

That behavior follows from a platform constraint: Vercel's serverless functions terminate after responding, so there is no persistent process to hold a socket open. That's the root of all four limits. They're architectural, not configurable, because HTTP on a stateless platform simply can't do what they require. The Ably guide to WebSockets on Vercel covers this constraint in depth if you want the full picture.

That's also why Vercel made ChatTransport pluggable in AI SDK 5. DefaultChatTransport is not broken: it's correct for the conditions it was built for. But Vercel designed the interface precisely so teams can swap in a transport that isn't bound by those conditions.

It's not just DefaultChatTransport that has this constraint. Even DirectChatTransport, the other built-in option, explicitly documents that it "does not support reconnection since there is no persistent server-side stream to reconnect to." Reconnection is a transport-layer property. The default implementations don't have it because the platform they're built for doesn't support it.

Four things DefaultChatTransport can't do in production

These are the limits that surface when you move beyond a single-user chatbot: a customer support agent that hands off between devices, a chat interface where a human and an AI both participate, or any application where the connection dropping mid-generation has a visible cost to the user.

Each follows from the same root: HTTP/SSE is built for one connection, one client, one response. When production asks for more, that constraint becomes visible.

Cancellation is ambiguous, and you may be paying for it. When a user clicks stop, stop() closes the HTTP connection client-side, and returns immediately, without waiting for the server to acknowledge or terminate the generation. The server receives a connection close event. It has no way to distinguish that from a tab close, a network drop, or a mobile device going to sleep. So it keeps generating.

GitHub issue #9707 (filed October 2025, still open) documents this directly: createUIMessageStream does not detect the abort signal server-side, making it "impossible to stop ongoing AI generation and leading to unnecessary costs and poor UX." GitHub issue #10844 adds that Vercel's own supportsCancellation: true config flag behaves unreliably in production deployments. The cost is real: orphaned generations run to completion, and there's no reliable mechanism to stop them without a custom server-side endpoint.

Multi-device delivery silently fails. SSE is one-to-one. One HTTP connection, one client, one stream. A user with the same session open on their laptop and phone receives the response only on the device that sent the request. The second device gets nothing: no error, no partial content, no indication that anything is in flight. This isn't a useChat configuration gap. It's a structural property of HTTP. Multi-device fan-out is absent from the vast majority of AI transport implementations because SSE is one-to-one by design. DefaultChatTransport is no exception.

The same architectural root connects the next limit. Where multi-device delivery requires fan-out that HTTP cannot provide, stream resumption requires session persistence that HTTP cannot maintain.

Stream resumption requires infrastructure that you build and own. The Vercel AI SDK stream resumption documentation lists the prerequisites directly: a Redis instance, the resumable-stream package, a POST handler that creates resumable streams using consumeSseStream, a GET handler at /api/chat/[id]/stream that resumes them with resumeExistingStream, and a dedicated stop endpoint.

stop() and resumable streams are also architecturally incompatible. The docs state it directly: "In a resumable stream setup, client-side aborts are treated as disconnects. Closing a tab, refreshing the page, or calling stop() only closes the current HTTP connection and should not cancel the underlying generation." Adding a working stop button requires a separate server-side endpoint to cancel the underlying work and clear the active stream record.

Tab switches and mobile backgrounding are a further gap the resumable-stream pattern doesn't cover in the same way as a page reload. The Ably guide on Vercel AI SDK resumable streams covers the distinction.

The single-response assumption breaks multi-user sessions. Vercel designed useChat around one user sending one message and receiving one response. It tracks one activeResponse at a time. If a second user joins, or an observer device needs the same response lifecycle, the only available mechanism is setMessages. This bypasses lifecycle hooks, tool-call notifications, and onFinish callbacks entirely. It works, but it's a workaround. Zak Knill's post on building the Ably transport covers the implementation detail.

Each of the four limits above has the same root cause but surfaces differently. The table below maps them to their production cost:

Limit	What breaks	Production cost	Configurable in `DefaultChatTransport`?
Cancellation	Server can't distinguish stop from disconnect	Orphaned generations; ongoing billing	No
Multi-device	SSE delivers to one client only	Silent failure on second device	No
Stream resumption	Requires Redis, two endpoints, stop handler	Significant custom infrastructure	No
Single-response assumption	`setMessages` bypasses lifecycle hooks	Broken tool calls, missing `onFinish`	No

How a WebSocket-based transport layer creates a durable session between agent and client

Replacing DefaultChatTransport with a WebSocket-based transport layer replaces a stateless HTTP connection with a durable session between your agent and your users. One that persists beyond any single connection and addresses all four limits directly. It also removes the custom infrastructure that those limits force you to build. The Ably topic page on implementing a custom ChatTransport covers the full capability surface. This section covers what disappears from your backlog.

With a WebSocket-based transport layer, you no longer need:

The Redis buffer for resumable streams
The stop endpoint with race condition protection
The fan-out layer for multi-device delivery
The setMessages workaround for multi-user sessions

The mechanism that makes this possible is straightforward. A session is decoupled from the connection. The session persists independently; a connection is how a client subscribes to it. When a client disconnects and reconnects, it presents its last position to the session and receives only the messages it missed. A cancel signal is sent explicitly on the session: the server reads it as intent, not as a connection close event it has to interpret.

Ably AI Transport is built as the session layer for production AI applications: the infrastructure between your agent and your users that handles the delivery concerns that DefaultChatTransport can't. It plugs into useChat as a ChatTransport implementation via a single configuration change:

// Before: default HTTP transport
const { messages, sendMessage, stop } = useChat({
  transport: new DefaultChatTransport({ api: '/api/chat' }),
});

// After: Ably AI Transport (backed by an Ably session)
const { chatTransport } = useChatTransport(); // from <ChatTransportProvider>
const { messages, sendMessage } = useChat({ transport: chatTransport });

In practice: stop() sends a typed signal the server can act on, instead of a connection close event that it has to guess at. Any device subscribed to the same session receives the stream, so a user switching from laptop to phone doesn't lose the conversation. If the connection drops mid-generation, the client reconnects and catches up from where it left off, because the session persists independently of any single connection.

What stays unchanged: your agent, tool calls, message persistence logic, and UI rendering. The swap is the transport option in useChat. Everything built on top of it carries over.

For the implementation detail on own-turns, observer-turns, and setMessages handling, see Zak Knill's post. For how transport options compare more broadly, see the durable sessions guide for Vercel AI SDK applications. The four questions in the next section will help you work out whether you're at that decision point yet.

When DefaultChatTransport is still the right choice

The four limits above are real, but they only become blockers if you need cancellation that reaches the server, multi-device delivery, stream resumption beyond page reloads, or more than one user in the same conversation. For many applications, DefaultChatTransport remains the right starting point.

A practical way to assess your own situation is to work through four questions:

Do you need stop() to reliably cancel server-side generation, not just the UI update, but the actual model call?
Do users access the same session from more than one device or tab?
Do you need stream resumption across tab switches or mobile backgrounding, not just full page reloads?
Does more than one user participate in the same conversation?

If the answer to all four is no, DefaultChatTransport is a defensible choice. If any answer is yes, the relevant section above describes the specific limit you'll encounter. The right time to replace the transport is when those limits start costing you.

If the self-audit above lands on yes for any of the four questions, DefaultChatTransport has reached its limit for your use case. The transport layer is the right place to fix it, and replacing it changes nothing else in your application.

The next step is understanding the ChatTransport interface: what sendMessages and reconnectToStream require, and what to look for in an implementation. The Ably ChatTransport topic page covers that in full. To get started with Ably AI Transport directly, the Vercel AI SDK integration guide is the right starting point.

Frequently asked questions

Does the Vercel AI SDK support multi-device AI chat out of the box?

Not with DefaultChatTransport. SSE is scoped to a single HTTP connection, so a second device has no way to join a stream already in progress. Multi-device delivery requires a transport where the session exists independently of the connection, so any subscribed client receives it. The Ably guide on why Vercel AI SDK can't stream to multiple devices provides the full picture.

Why doesn't stop() cancel server-side generation in Vercel AI SDK?

Because DefaultChatTransport has no signal path back to the server. When stop() closes the HTTP connection, the server receives a TCP close it can't distinguish from a network drop, so generation continues and billing runs to completion. With a WebSocket-based transport layer, stop() sends a typed cancel message on the session; the server reads it as intent, not inference. The Ably guide on why stop() doesn't cancel the stream covers the full mechanism.

How much infrastructure does Vercel AI SDK stream resumption require?

The official pattern requires a Redis instance, the resumable-stream package, a POST handler with consumeSseStream, a GET handler at /api/chat/[id]/stream, and a dedicated stop endpoint with race condition handling. stop() and resumable streams are also architecturally incompatible: in a resumable stream setup, a client abort is treated as a disconnect, not a cancel. See the Ably guide to Vercel AI SDK resumable streams for the full breakdown.

When should I replace DefaultChatTransport?

When the limits start affecting your production application. The four-question self-audit in the "When DefaultChatTransport is still the right choice" section gives a practical framework. In short: if you need stop() to reliably cancel server-side generation, multi-device delivery, stream resumption beyond page reloads, or multi-user sessions, the default transport can't provide those. The Ably durable sessions guide for Vercel AI SDK covers the transport options available once you've decided to move on.

Why replace DefaultChatTransport with a WebSocket-based transport layer?

When DefaultChatTransport's design scope no longer fits your production requirements. If you're hitting unconfirmed cancellations, single-device delivery, Redis-dependent stream resumption, or the setMessages workaround for multi-user sessions, those are properties of HTTP/SSE that a WebSocket-based transport layer resolves at the transport level. Your agent, tool calls, and UI code don't change.

Vercel AI SDK custom transport vs default transport, what actually changes?

The delivery mechanism only. Your agent, tool calls, message persistence, and UI rendering stay the same. The swap is the transport option in useChat, one configuration change. For a full before/after and getting started guide, see the Ably AI Transport Vercel integration guide.

What transport issues have you hit building on the Vercel AI SDK? Would be interested to hear which of the four comes up most in practice.

Why AWS ALB and Cloudflare silently kill your AI agent sessions

Maddy Quinn — Mon, 15 Jun 2026 11:51:00 +0000

TL;DR: WebSocket reconnection restores the transport. It doesn't restore the session. Tokens generated during the gap, tool call results that arrived while the client was offline, and the agent's position in the ongoing generation are all lost unless you have a session layer. This post covers the timeout sources that hit agentic applications specifically, why SSE is a bad fit for bidirectional agent communication, and how session recovery works in practice.

Why AI agents get disconnected in ways standard apps don't

WebSocket reconnection has always been worth solving. What makes AI agents different is when the disconnect happens.

A standard chat interface goes quiet between user interactions — when there's genuinely nothing happening on the wire. An agent goes quiet mid-execution: during tool call waits, between reasoning steps, while the LLM is generating a response. That silence is the agent doing its most intensive work. To every load balancer and proxy in the path, it looks idle.

AWS Application Load Balancer defaults to closing connections after 60 seconds of inactivity. Cloudflare enforces a 100-second idle timeout on Free and Pro plans — fixed, cannot be raised. Corporate proxies and enterprise gateways add their own thresholds that you often can't inspect or configure.

Plenty of production WebSocket applications have shipped without explicitly thinking about this. The reason: traditional server-side workloads tend to emit a trickle of traffic on their own — progress events, periodic state updates — which keeps the connection alive as a side effect. The timeouts stay invisible because something is always crossing the wire.

Agentic applications don't have that property. A customer support agent goes quiet mid-answer while the user is typing a correction. A coding agent waits for the user to approve a tool call before continuing. A research agent sits in silence for 90 seconds while a downstream API responds. None of that is idleness from the agent's perspective. To the ALB, it's all the same.

Why SSE doesn't fit

If you're reaching for SSE as an alternative, it won't solve the session problem — and it introduces a new one.

The applications this post is about — customer support agents, coding agents, research agents the user steers mid-task — require the client to send messages back to the agent on the same session while it's in flight. A user correcting an assumption, approving a tool call, or cancelling mid-implementation needs a channel in both directions.

SSE streams server-to-client only. That rules it out at the transport level regardless of how well you've solved the replay problem.

The fix for idle timeouts: server-side ping frames

The WebSocket spec includes a mechanism designed exactly for this: server-side ping frames. The server sends a ping at a fixed interval; the browser responds automatically with a pong; both frames count as activity and reset every idle timer on the path.

The interval needs to sit comfortably below the shortest timeout on the path. A 50-second interval covers both the AWS ALB 60-second default and Cloudflare's 100-second limit simultaneously. Browsers respond to ping frames automatically — no client-side code required.

Common idle timeouts to plan around

Infrastructure	Default timeout	Configurable?
AWS Application Load Balancer	60 seconds	Yes — `idle_timeout.timeout_seconds`, up to 4,000s
Cloudflare (Free/Pro)	100 seconds	No — fixed. Enterprise customers can request custom values.
Corporate proxies, gateways	Varies — often invisible	Depends on deployment

For ALB, raise the limit if your workload genuinely needs a longer window. The idle_timeout.timeout_seconds attribute is adjustable in the load balancer configuration and takes effect immediately without a redeployment.

For Cloudflare Free and Pro plans, you can't raise the limit. The server-side ping approach at 50 seconds is the only viable mitigation.

If connections die at exactly 100 seconds in production, check EdgeStartTimestamp and EdgeStopTimestamp in Cloudflare's HTTP request logs to confirm the source before debugging elsewhere.

Other connection challenges to consider

Not all disconnects come from idle timeouts. Two other patterns hit agentic applications in production:

Corporate VPN and enterprise proxy traversal

Many enterprise networks don't forward the HTTP Upgrade header required to open a WebSocket connection. The connection never opens rather than dropping mid-session. The failure appears at the WebSocket handshake stage — typically a non-101 HTTP response — not as a silent close after inactivity.

The fix is protocol fallback: when a proxy blocks the WebSocket upgrade, the transport degrades automatically to HTTP streaming or long-polling without per-deployment configuration.

Mobile network handoffs

Switching from WiFi to cellular drops the underlying TCP connection immediately. On mobile, the client's onclose event often doesn't fire — the OS terminates the connection without a clean close frame. On iOS specifically, background TCP connections are suspended within seconds of the app moving to the background, again without notification.

Don't rely on onclose to trigger reconnection for mobile users. Use failed-send detection and an application-level heartbeat timeout to catch silent closes.

What transport reconnection recovers — and what it doesn't

Here's where most teams discover the gap. Reconnecting the WebSocket connection restores the transport. It doesn't restore the state of the session that was in flight when the connection dropped.

Transport reconnection recovers	It doesn't recover
The WebSocket connection itself	Tokens generated while disconnected
Active session subscriptions	Tool call results that arrived during the gap
The ability to send and receive new messages	The agent's reasoning trace if streamed as events
The session ID and session name	The position in the ongoing generation

After a successful reconnect with only transport-layer recovery, the client is back online, but the session is in an indeterminate state. The client holds a partial response from before the disconnect. The agent continued generating on the server side. Neither side knows where the other stopped.

How session recovery works

This is where Ably AI Transport comes in. AI Transport acts as the session and delivery layer between your agent and your users.

The agent publishes every event — each generated token, each tool call, each reasoning step — to a session. AI Transport stores those events and is responsible for delivering them to the client whenever the client is connected. From the agent's side, this is fire-and-forget: it doesn't care whether the client is online, offline, mid-reconnect, or freshly loaded into a new browser tab.

When a client connects or reconnects, it asks for everything it hasn't already seen. AI Transport returns the missed events, in order, before the live stream resumes. There's no "live vs. history" boundary the application needs to reason about, and no difference in handling for a 30-second drop vs. a 30-minute disconnect vs. a fresh page load.

One detail worth understanding: the session doesn't store one event per token. Tokens are appended to a single message per agent response — conflation — so the session history contains one accumulated message per response, not thousands of token-sized events. A client reconnecting mid-stream receives the in-progress message in its current accumulated form and resumes streaming from there. A client loading the page fresh receives the same accumulated message as a single coherent block. The application doesn't write reconciliation logic for either case.

For more on how this works in practice, see AI Transport's reconnection and recovery and history and replay documentation.

What the user should see during a disconnect

Session recovery handles the infrastructure layer. But a reconnect that works silently in the background still needs the right UI treatment to avoid looking like a failure.

AI Transport exposes well-defined connection states. The key distinction: the disconnected state (temporarily offline, retrying automatically) vs. the suspended state (retry window exhausted).

During disconnection: show a reconnecting indicator, not an error modal. In the suspended state: show a retry button. The session is intact and waiting — communicate that.

What building this yourself actually costs

Building session recovery without a purpose-built layer means writing:

A heartbeat loop
A reconnection manager
Manual state reconstruction logic
A connection state component to surface each phase to the user

None of these is large in isolation. Together, they constitute infrastructure. And any infrastructure your team owns is infrastructure your team spends time and resources maintaining as requirements change.

Ably AI Transport provides the session recovery layer:

Automatic connection recovery within the two-minute window
History compaction and replay so clients always receive clean, accumulated state on reconnect
Protocol fallback from WebSocket to HTTP streaming to long-polling
Bidirectional signaling on the same session

What remains in your application code is the connection state UI — surfacing the reconnecting and suspended states to the user — and that's a handful of lines rather than a system.

Frequently asked questions

How do I stop AI chat sessions from timing out?

Configure your WebSocket server to send ping frames at a fixed interval below the shortest timeout on your path. A 50-second interval sits comfortably below both the AWS ALB 60-second default and Cloudflare's fixed 100-second limit on Free and Pro plans, with browsers responding automatically — no client-side code required. If your workload needs a longer window, raise the idle_timeout.timeout_seconds attribute in your ALB configuration; it's adjustable up to 4,000 seconds.

What happens if a user disconnects during LLM streaming?

With AI Transport, the session resumes automatically upon reconnect, with missed tokens delivered in order before new ones arrive, and no application code needed. For longer disconnects, AI Transport's history and replay feature loads the full conversation from the session history. Without a session layer, tokens generated during the dropout are lost, and the agent can't resume from the point of interruption.

How do I avoid duplicate AI messages after a WebSocket reconnect?

With AI Transport you don't need to — the SDK handles this through history compaction. Tokens are streamed as appends to a single message per agent response, and the session history stores one message per response rather than one per token. When a client reconnects or refreshes, it receives the single accumulated message rather than individual tokens to reconstruct.

What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?

The AWS Application Load Balancer idle timeout defaults to 60 seconds and applies to all connection types, including WebSocket. Raise it by updating the idle_timeout.timeout_seconds load balancer attribute. The valid range is one to 4,000 seconds; most AI agent workloads are well served by a value between 3,600 and 4,000 seconds. The change takes effect immediately without requiring a redeployment.

Does Cloudflare close WebSocket connections? What is the timeout?

Yes. Cloudflare enforces a 100-second idle timeout on WebSocket connections for Free and Pro customers. The limit is fixed on those plans and can't be raised. Enterprise customers can configure a custom value through their account team. To keep connections alive on Free and Pro plans, configure your WebSocket server to send ping frames every 50 seconds. Browsers respond automatically with pong frames, which reset Cloudflare's idle timer and the 60-second AWS ALB default simultaneously.

Can WebSockets work behind a corporate VPN or enterprise proxy?

They can, but many enterprise proxies don't forward the HTTP Upgrade header required to open a WebSocket connection. When that happens, the connection fails at the handshake stage rather than dropping mid-session. That failure is distinct from a timeout: the error occurs before any data flows, not after a period of inactivity. Protocol fallback to HTTP streaming or long-polling handles proxy blocking at the infrastructure layer without per-deployment configuration.

How long does Ably retain channel history for session recovery?

AI Transport replays missed messages automatically on reconnect, with no application code needed. For longer disconnects, session history loads the full conversation, persisting for 24 to 72 hours depending on your Ably plan, with extended retention available on higher tiers.

What's your experience here — have you run into session state loss specifically, or mostly fought the transport reconnection side of the problem? Interested in what patterns teams are using to handle the UI side of a mid-stream disconnect.

Why your AI chat reconnects but your session doesn't

Maddy Quinn — Wed, 27 May 2026 10:05:10 +0000

TL;DR: WebSockets are the right protocol for production AI chat. But the connection is stateless at the session level. When it drops — AWS ALB defaults to 60 seconds, Cloudflare to 100 seconds on Free and Pro plans — all in-flight tokens, tool call results, and agent context disappear. Reconnection logic restores the socket. It doesn't restore the session. That's the gap this post covers.

WebSockets are the right protocol for production AI chat. But that fact doesn't prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context.

The reconnected socket has no view of what happened while it was down. Three conditions cause this routinely: a proxy timeout mid-task, a page reload mid-generation, and a mobile network handoff. Each breaks for the same underlying reason: the WebSocket protocol handles transport, not session state, and reconnection logic doesn't change that.

Key takeaways

WebSockets are the right protocol for production AI chat: bidirectional, persistent, and suited to live steering and tool calls in ways SSE isn't.
A WebSocket connection is stateless at the session level. When it closes through a proxy timeout, page reload, or device switch, all state disappears with it.
Reconnection logic re-establishes the transport. It does not recover the tokens, tool calls, or agent context in flight when the connection is dropped.
What fills the gap is a session layer: infrastructure that persists conversation state against a session ID and replays it to reconnecting clients.

What WebSockets get right for AI chat

The protocol question is worth settling early, because the rest of this piece argues about the infrastructure layer above it. For production AI chat, the choice is WebSockets or SSE. Both stream tokens to the client, but only WebSockets let signals flow the other way.

WebSockets are bidirectional. When your user cancels mid-stream, that signal travels back on the same channel; tool call confirmations and workflow approvals work the same way. When a workflow pauses for human input mid-execution, that input must arrive in-band, not via a polling endpoint.

SSE is a one-way stream. For simple chatbots on stable networks, that doesn't matter. Add tool calls, mid-stream cancellation, or multi-device continuity, and it does.

Where production AI connections actually fail

Not all connection drops come from bad network conditions. The more common causes in production are infrastructure defaults designed for HTTP requests, not AI chat. A response can be mid-generation for tens of seconds, and most defaults weren't built for that.

AWS Application Load Balancer idle timeout. AWS ALB closes connections idle for 60 seconds by default, per the AWS Application Load Balancer documentation. For standard HTTP that's generous. For an agent waiting on a downstream API, 60 seconds of silence is routine, and the connection closes without warning. Your user's response stops mid-sentence with no explanation.

Cloudflare proxy timeout. On Cloudflare Free and Pro plans, WebSocket connections terminate after 100 seconds of inactivity, as documented in Cloudflare's WebSocket troubleshooting guide. Enterprise plans can raise this limit; on Free and Pro plans, the ceiling is fixed.

Mobile network handoffs. Switching from WiFi to cellular drops the underlying TCP connection immediately, taking the WebSocket with it. On mobile this happens during normal use: walking between coverage areas, backgrounding the tab, entering a building.

Page reload and tab crash. Your user reloads mid-generation, or the browser crashes, both of which are routine. The connection closes, and any session state tied to it is gone unless something stored it.

Why reconnection logic doesn't fix the session problem

The standard reconnection pattern re-establishes the socket. Transport recovers in milliseconds. But it cannot restore the state that was in flight when the connection dropped.

Token stream position. The response kept generating while the connection was dark. Those tokens went nowhere. When the client reconnects, it arrives mid-sentence or finds nothing at all.

Tool call results. Some chat responses depend on realtime data: a lookup, a search, or an action your user triggered. If the connection dropped while the agent was waiting for that result, the response either never came — or ended before it could use the information.

Agent context. In a multi-turn exchange, the agent accumulates context: what was asked, what was answered, and what's in progress. When a session drops and reconnects without state recovery, the agent and the client are at different points in the same conversation. Your users experience this as a loss of thread: a response that ignores what came before, or one that repeats something already answered.

The pattern most teams reach for is a Redis buffer: sequence number tracking, offset storage, and deduplication keys between the agent and the client. It handles full page reloads. It tends to break on deploy-triggered reconnects, mobile handoffs that hit the reconnect window twice, and anything that generates messages faster than the buffer drains.

Even Vercel's AI SDK lead built a pluggable interface to fill this gap. Every team reaching this point builds the same infrastructure from scratch and chooses to own it indefinitely. Reconnection handles the protocol layer; session state sits one layer above it, and it's a separate problem entirely.

What production AI chat needs from the transport layer

Any viable approach to production AI sessions needs to satisfy four requirements. These are implementation-neutral: what any infrastructure option has to provide, regardless of vendor.

Persistent state storage. Conversation history, token positions, tool call inputs and outputs, and agent state must be stored against a stable session ID and survive connection drops. The session ID is the anchor: the same session must be addressable after any reconnect, from any device.

Offset-based replay. A returning client requests messages from its last received serial. The infrastructure delivers everything missed, in order, with no duplicates. The client supplies its offset; the infrastructure fills the gap.

Protocol fallback. When a WebSocket upgrade is blocked by a proxy or firewall, the transport degrades to HTTP streaming or long-polling automatically. This should not require per-deployment configuration.

Multi-device delivery. Any authenticated device subscribing to a session ID receives the current state plus history. The session is not bound to the tab, browser, or device that opened it.

How Ably AI Transport solves the session layer problem

Thankfully, you don't need to build the infrastructure. Ably AI Transport is the durable session layer — the thing that makes the user experience survive what the WebSocket protocol cannot. The session lives in Ably; your application talks to it.

The five failures raised in this post each map directly to a capability:

Connection drops from proxy timeouts, mobile handoffs, and page reloads. The transport degrades automatically — WebSocket first, then HTTP streaming, then long-polling — so the session survives the infrastructure defaults that break standard WebSocket connections. No per-deployment configuration required.
→ Reconnection and recovery

Tokens generated while the client was disconnected. The token stream is stored against the session. On reconnect, the client receives everything it missed in order, with no duplicates. The developer doesn't track offsets or implement catch-up logic.
→ Token streaming

Tool call results and agent context lost mid-task. Agent state, tool call inputs and outputs, and conversation history are all published to the session as they generate. A reconnecting client recovers the full context, not just the tokens.
→ Reconnection and recovery

Mid-stream steering and human-in-the-loop signals. Cancellations, approvals, and human input travel back to the agent on the same session channel. The bidirectional requirement that rules out SSE is covered without a separate signaling mechanism.
→ Human in the loop

Sessions tied to a single tab or device. Any authenticated device subscribing to the session ID receives current state plus history. A conversation started on desktop continues on mobile without restart.
→ Multi-device sessions

Get started: Vercel AI SDK · Core SDK

Frequently asked questions

When is SSE still the right choice for AI chat?

SSE is a reasonable starting point for chatbots that follow a simple request-response pattern: a user submits a message, the server streams tokens, no interruption required. It deploys more easily than WebSockets, carries no persistent connection overhead, and works well on stable networks.

The constraints appear when your application starts adding agentic behaviour: tool calls, mid-stream cancellation, multi-device continuity, and background tasks that complete while the user is offline. At that point, SSE's unidirectional architecture stops being a trade-off and becomes a blocker.

What timeout values should I configure to prevent AI connection drops in production?

Set your AWS ALB idle timeout to at least 3,600 seconds for WebSocket connections. The 60-second default was designed for HTTP requests, not long-running agent tasks. On Cloudflare Free and Pro plans, the WebSocket timeout is fixed at 100 seconds. Send heartbeat pings at around 25-second intervals to stay well below that threshold.

For Nginx, the equivalent setting is proxy_read_timeout. These three changes cover most production timeout failures for AI chat deployments.

Does reconnection logic solve the session recovery problem?

Reconnection logic solves the transport problem. It doesn't solve the state problem. Exponential backoff and heartbeats re-establish the socket.

But they can't recover tokens generated during the gap, tool call results that arrived while the client was disconnected, or context accumulated across multiple steps. Preventing duplicate messages on reconnect requires sequence numbers or idempotency keys at the session layer, not the WebSocket layer. A client that reconnects without a session layer arrives at an empty context and either loses the conversation or restarts it.

How does Ably replay missed messages after a WebSocket reconnect?

Ably assigns every published message a serial number. When a client reconnects, the transport layer uses the internal untilAttach mechanism to fetch messages published during the gap. This bounds the history query to the exact reconnection point.

Ably delivers everything missed in order, with no overlap between historical and live messages. The client doesn't track its own offset or implement catch-up logic. Every plan includes two minutes of ephemeral history by default. Persisted channels extend this to 72 hours on Standard plans, or up to 365 days on Pro and Enterprise plans.

Have you hit this in production? Curious what the failure looked like - was it the proxy timeout, a page reload, or something else that first surfaced it?

How NASCAR delivers realtime racing data to millions of fans around the world

Maddy Quinn — Wed, 17 Jan 2024 09:27:12 +0000

Playing around with streaming realtime data is one thing, but have you ever wondered how you would handle the challenge of streaming realtime data to millions of racing fans?

NASCAR Drive has built an industry-leading platform that handles the distribution of 1.3TB of telemetry data in a single race, while over 80 million fans immerse themselves in the race from an in-cockpit view that offers a live 360 camera feed and access to the same car telematics as the driver and team.

Wondering how they do it?

We spoke to Chad Larter, Senior Director of Technical Operations for NASCAR, in our webinar on January 31st - and you can now watch it on demand!

Make sure not to miss it if you’re interested to learn:

How to stream 1.3TBs of data per race to over 80 million fans - complete with highly detailed stats that update in realtime
Why NASCAR decided to bring this solution in-house – and how they built the technology to achieve it
How NASCAR solved the data surge and streaming challenges

Key takeaways

If you've watched the video, you know there was a lot to take in, so here are some of the key points covered:

Scalability: Tens of thousands of users connect during major races like Daytona 500, with major traffic spikes occurring following in-race events.
Data processing: Over 100 data points are collected, filtered and downsampled to 2 updates/second for realtime fan consumption, across devices - 1.3 TB per race.
Platform efficiencies: Only changes in data are broadcasted to clients, using binary deltas, reducing bandwidth consumption.
Long polling vs WebSockets: In comparison to their previous long-polling solution the use of a WebSockets platform proved much quicker and puts a lot less stress on networks.
Shared insights: Fans gain access to the same detailed data used by teams and OEMs, providing a deeper understanding of the race.
Fan engagement: Consumers spend significant time (30 minutes to 3 hours) consuming race data, highlighting the success of delivering an enhanced engagement experience.
Customization: Fans want to consume data their way, gaining insights on their favourite drivers/cars - not be limited by broadcasters focusing on the leading cars.
More to come: NASCAR are exploring the use of realtime data for leaderboards, chat and additional content - moving away from polling methods.

If you have any questions about how NASCAR uses Ably Pub/Sub, the applications it can power, or how it could work for your use case, please visit our fan engagement page or sign up to get started for free.

DEV Community: Maddy Quinn

Why SSE breaks down for production AI customer support chat

Key takeaways

What are WebSockets and SSE?

Why the WebSockets vs SSE choice matters for production AI chat

How SSE breaks down under real AI chat conditions

Canceling or interrupting a response mid-stream

Escalating from AI to a human agent mid-conversation

Continuing a conversation across devices

Enterprise proxy and firewall behavior

How to choose between WebSockets and SSE for AI chat

SSE

WebSockets

How Ably AI Transport adds durable sessions on top of WebSockets

FAQ

Why chat.stop() doesn't cancel your LLM generation (and what to build instead)

Why stop() and disconnect mean different things

What a correct stop implementation actually requires

Three questions to ask about your stop button before shipping

How a bidirectional session changes the stop vs disconnect distinction

Canceling a run with Ably AI Transport

Adopting Ably AI Transport: what changes in your stack

Conclusion

Frequently asked questions

When should you replace DefaultChatTransport?

Key takeaways

How DefaultChatTransport works, and the conditions it was built for

Four things DefaultChatTransport can't do in production

How a WebSocket-based transport layer creates a durable session between agent and client

When DefaultChatTransport is still the right choice

Frequently asked questions

Does the Vercel AI SDK support multi-device AI chat out of the box?

Why doesn't stop() cancel server-side generation in Vercel AI SDK?

How much infrastructure does Vercel AI SDK stream resumption require?

When should I replace DefaultChatTransport?

Why replace DefaultChatTransport with a WebSocket-based transport layer?

Vercel AI SDK custom transport vs default transport, what actually changes?

Why AWS ALB and Cloudflare silently kill your AI agent sessions

Why AI agents get disconnected in ways standard apps don't

Why SSE doesn't fit

The fix for idle timeouts: server-side ping frames

Common idle timeouts to plan around

Other connection challenges to consider

Corporate VPN and enterprise proxy traversal

Mobile network handoffs

What transport reconnection recovers — and what it doesn't

How session recovery works

What the user should see during a disconnect

What building this yourself actually costs

Frequently asked questions

How do I stop AI chat sessions from timing out?

What happens if a user disconnects during LLM streaming?

How do I avoid duplicate AI messages after a WebSocket reconnect?

What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?

Does Cloudflare close WebSocket connections? What is the timeout?

Can WebSockets work behind a corporate VPN or enterprise proxy?

How long does Ably retain channel history for session recovery?

Why your AI chat reconnects but your session doesn't

What WebSockets get right for AI chat

Where production AI connections actually fail

Why reconnection logic doesn't fix the session problem

What production AI chat needs from the transport layer

How Ably AI Transport solves the session layer problem

Frequently asked questions

When is SSE still the right choice for AI chat?

What timeout values should I configure to prevent AI connection drops in production?

Does reconnection logic solve the session recovery problem?

How does Ably replay missed messages after a WebSocket reconnect?

How NASCAR delivers realtime racing data to millions of fans around the world

Key takeaways

Why `stop()` and disconnect mean different things