DEV Community: Ably

Why SSE breaks down for production AI customer support chat

Maddy Quinn — Tue, 07 Jul 2026 12:30:00 +0000

TL;DR: SSE and WebSockets handle AI chat differently once you add cancellation, escalation, or multi-device support. Here's when each one holds up in production.

WebSockets and SSE both stream AI chat responses to the browser, but they handle cancellation, reconnection, and device switching very differently. For a customer support chat product, that difference shows up exactly when a customer is already frustrated: mid-cancel, mid-escalation, or mid-device-switch. This post covers what each protocol does, where SSE's constraints surface in production support chat, and how to choose between them.

Key takeaways

When a customer needs to cancel a runaway response or get transferred to a human, SSE's one-way connection means that signal has to travel over a separate request from the response it's trying to interrupt. That introduces a coordination gap at the exact moment customer trust is most fragile.
SSE's built-in auto-reconnect resumes the connection, not the conversation. After a drop, the customer gets a fresh stream, not the context they had a moment before. If the agent was mid-way through a refund lookup, that work is gone: the customer has to re-explain the issue from scratch, or a support agent has to pick up manually with no record of what the AI had already found.
Moving a support conversation from mobile to desktop breaks an SSE connection outright, and switching to WebSockets alone doesn't fix this either, since session state lives with the connection, not the customer.

What are WebSockets and SSE?

WebSockets and SSE both let a server push data to a browser. They differ in almost every mechanical respect that matters for AI chat.

Property	WebSockets	SSE
Direction	Bidirectional: client and server both send on the same connection	One-way: server to client only
Transport	Dedicated `ws://` or `wss://` connection after an HTTP upgrade handshake	Standard HTTP connection, `text/event-stream` content type
Reconnection	Not built in: the application has to detect a drop and reconnect	Built in: the browser's `EventSource` API reconnects automatically
Data format	Text and binary	UTF-8 text only
Concurrent connections	No browser-imposed limit	Limited to six per domain under HTTP/1.1, per MDN's EventSource documentation; removed under HTTP/2
Proxy and firewall behavior	The upgrade handshake can be blocked by packet-inspecting firewalls	Standard HTTP, so it passes through most enterprise proxies without special handling

Neither protocol is "better" in the abstract. WebSockets give you a channel both sides can write to. SSE gives you a simpler one-way stream that reconnects on its own.

What changes the calculation is what your AI chat product needs to do once it is running in front of real customers.

Why the WebSockets vs SSE choice matters for production AI chat

A prototype AI chat feature rarely tests the cases that expose this choice. A single user sends a message, waits for the response, and closes the tab.

Production customer support chat looks nothing like that. Customers cancel responses that are heading in the wrong direction. Conversations get escalated to a human mid-stream. Customers also switch from a mobile app to a desktop browser partway through resolving an issue, expecting the conversation to still be there.

Each of these is a coordination problem, not only a streaming problem.

A cancellation signal has to reach the AI agent.
Escalation context has to reach the human agent who picks up.
Conversation state has to be available on whichever device the customer opens next.

All of this requires more than pushing tokens from server to client. The transport choice determines how much of that coordination comes for free.

The cost of getting this wrong is not abstract. Support agents receiving an escalated conversation with no record of what the AI had already established have to start over, and the customer notices immediately.

A customer who cannot stop a response from generating loses trust in the interface fast. A customer who reconnects to nothing continues the conversation with more friction than they had before.

How SSE breaks down under real AI chat conditions

SSE is a defensible choice for a chatbot that streams a single response with no client-to-server signaling required. Customer support AI chat asks more of the connection than that. The following are the specific points where the gap shows up.

Canceling or interrupting a response mid-stream

SSE has no channel for the client to send anything back while a stream is active. If a customer wants to stop a response that has gone off track, the application has to open a separate HTTP request to signal cancellation.

That separate request has to reach the same backend process generating the response. It has to be matched to the correct in-flight generation, and it has to stop that generation cleanly. None of this coordination is provided by SSE itself.

Building it introduces a class of bugs a single bidirectional channel does not have: a cancel request that arrives after the stream has already ended, or one that targets the wrong generation entirely.

Escalating from AI to a human agent mid-conversation

Handing a conversation from an AI agent to a human support agent is one of the most common flows in customer support chat. It depends on the receiving human getting full context instantly: what the customer asked, what the AI already tried, and where it got stuck.

SSE's one-way design does not provide a natural place for that handoff to happen. The context has to be assembled and transferred through a separate mechanism, built specifically for this flow, rather than falling out of the transport the conversation already runs on.

Continuing a conversation across devices

A customer who starts a support conversation on their phone during a commute expects it to be exactly where they left it when they pick it up on their laptop later. An SSE connection is tied to the browser tab that opened it.

Opening the same conversation on a second device starts an entirely separate SSE connection, with no relationship to the first. The session state has to be reconstructed from something other than the transport layer.

Enterprise proxy and firewall behavior

This is the one area where SSE has a genuine, durable advantage. SSE runs over a standard HTTP connection, so it passes through most enterprise proxies and corporate firewalls without special configuration.

WebSockets rely on an HTTP upgrade handshake, and some packet-inspecting firewalls do not handle that handshake cleanly. This can cause the connection to fail silently on corporate networks.

If your customer support product serves B2B customers on locked-down enterprise networks, this is a real constraint to weigh. It is not a reason to dismiss WebSockets outright: the failure mode is solvable with protocol fallback rather than by avoiding WebSockets entirely.

How to choose between WebSockets and SSE for AI chat

SSE

When it works:

Your AI chat is single-turn or short-lived
The customer never needs to send anything back once a response has started
There is no escalation-to-human flow or device-switching requirement

A simple FAQ bot fits this profile well, and SSE's HTTP-native behavior on enterprise networks is a genuine point in its favor.

When it hurts:

Customers need to cancel responses mid-stream
Conversations get escalated to human agents mid-conversation
Customers expect a conversation to continue across devices

Each of these requires building coordination logic on top of SSE that a bidirectional connection would give you by default.

WebSockets

When it works:

Your product needs cancellation, escalation, or multi-device coordination
A single bidirectional connection can carry all of these signals without a separate side-channel for each

This matches most production customer support AI chat, where escalation and interruption are core flows rather than edge cases.

When it hurts:

You have not accounted for enterprise proxy behavior, since the upgrade handshake can fail on networks with strict packet inspection
You have not built reconnection logic, since WebSockets do not reconnect automatically the way SSE does

Both are solvable, but neither comes for free with a raw WebSocket connection.

Adopting WebSockets solves the bidirectional signaling problem. It does not, by itself, solve session continuity. A reconnected WebSocket is a new connection, and without an added session layer, the conversation state that lived with the old connection is still gone.

How Ably AI Transport adds durable sessions on top of WebSockets

A durable session keeps conversation state tied to the conversation itself, rather than to any single connection. A reconnect, a device switch, or a human handoff does not lose context. WebSockets alone give you a bidirectional channel, but the session state still has to live somewhere, and most teams end up building that layer themselves.

Ably AI Transport is built on this idea. A session outlives its underlying connection: reconnection, multi-device delivery, and cancellation are properties of how the session works, not features you build on top. A client that reconnects after a drop picks up from where it left off, and any device the customer opens joins the same session in progress.

This comes with a real tradeoff. Ably AI Transport is a new dependency, working through Ably's infrastructure rather than a direct connection between your server and the client.

For teams already running other realtime features on Ably, this is a natural extension. For teams with no existing Ably footprint, it is a genuine build-versus-adopt decision, not a given.

Docs go deeper on how the session layer works.

FAQ

Is SSE ever the right choice for AI chat?
Yes. If your AI chat has no cancellation, escalation, or multi-device requirement, such as a simple single-turn assistant, SSE's simplicity and HTTP-native firewall behavior make it a reasonable starting point.

What does SSE not support that WebSockets would solve for AI streaming?
SSE cannot carry a signal from the client back to the server on the same connection. Cancellation, live steering, and any mid-stream client input all need a separate mechanism. WebSockets solve this by carrying both directions on one connection, though they still need an added session layer to solve reconnection and multi-device continuity.

Why does SSE make stream cancellation unreliable?
SSE only carries data from server to client. Canceling a response requires a separate HTTP request outside the stream itself, which has to be matched to the correct in-flight generation on the backend. That coordination is not part of the protocol and has to be built by the application.

Does WebSocket-based AI chat work behind enterprise proxies and corporate VPNs?
Not always by default. The WebSocket upgrade handshake can be blocked by firewalls that perform packet inspection, which is more common on corporate networks than on consumer ones. Protocol fallback, trying WebSockets first and falling back to HTTP streaming, addresses this without giving up WebSockets' capabilities elsewhere.

Does switching from SSE to WebSockets alone fix session continuity?
No. WebSockets solve the bidirectional signaling problem, since cancellation and interruption can travel over the same connection. But a reconnected WebSocket is still a new connection, and without an added session layer, conversation state does not automatically survive a reconnect or a device switch.

Curious how others are handling human handoff in production AI chat — SSE with a side-channel, or a bidirectional transport from the start?

Originally published on the Ably blog.

Why chat.stop() doesn't cancel your LLM generation (and what to build instead)

Maddy Quinn — Fri, 26 Jun 2026 11:00:00 +0000

You add a stop button to your AI chat app: a customer support agent, a coding assistant, a research tool the user can steer mid-task. A user clicks it mid-response. The frontend stops rendering. Then you check your backend logs and realize the underlying generation is still running, and you're still paying for every token.

This is not a bug. The Vercel AI SDK docs document it explicitly: in a resumable stream setup, calling stop() only closes the current HTTP connection and should not cancel the underlying generation. The same applies to closing a tab or refreshing the page. The client disconnects; the server keeps running.

Key takeaways

Calling chat.stop() in the Vercel AI SDK closes the client connection but does not cancel server-side generation. The underlying generation keeps running, and billing continues.
Fixing this requires a dedicated stop endpoint with idempotency checking, partial assistant snapshot persistence, and backend-specific cancellation logic. None of which the SDK provides.
HTTP streaming is one-way. The server cannot distinguish an intentional stop from a network drop without an explicit signal sent separately from the stream.
On an Ably session, cancel is an explicitly named signal. The server knows immediately whether to stop, wait, or redirect, with no additional endpoint required.

Why `stop()` and disconnect mean different things

When you call chat.stop() in useChat, or when a user closes their browser tab, one thing happens: the HTTP connection closes. HTTP streaming is one-way: the server sends, the client receives. There is no signal in a closed connection that tells the server why it closed. A deliberate stop and a network drop look identical.

This is intentional in resumable stream architectures. They are designed to survive disconnects: if the connection drops, the client should be able to reconnect and pick up where it left off. Keeping generation running through a connection loss is the correct behavior. But a user clicking stop triggers exactly the same response.

The Vercel AI SDK docs are explicit about this: "a client-side abort (e.g. closing the page or refreshing) only closes the current HTTP connection. It is not a request to cancel the underlying work." If your stop button only calls stop(), the model request, background job, workflow, or stream writer keeps running, and the client can reconnect to the same active stream.

The same constraint applies to every other form of user control over a running agent. Say a user is running a research agent and wants to redirect mid-response: "actually, focus on flights only." There is no way to deliver that instruction over the existing stream. You need a separate endpoint, or some other mechanism alongside the stream. Server-Sent Events (SSE), the default transport for most AI SDK setups, cannot carry a signal back to the server. The stream flows one way.

What a correct stop implementation actually requires

The Vercel AI SDK documents the correct approach: build a dedicated stop endpoint. And that endpoint needs to do four things.

Persist the partial assistant snapshot. Before canceling, the client sends its current partial assistant message to the stop endpoint. This preserves what the user has already seen. Without this step, the assistant message disappears from the conversation when the stream closes.

Check the activeStreamId. Your application tracks which stream is active for each chat. The stop endpoint reads this value and compares it against the stream ID the client sent with the request. If a newer stream has started because the user sent a new message while the stop request was in flight, the stop request is stale and should be ignored.

Cancel the active work. This is the backend-specific step. In a Redis-backed resumable stream setup, you close the stored stream and abort the model request writing to it. In a workflow setup, you cancel the workflow run. In a job queue setup, you cancel the job or write a cancellation flag the job polls. The SDK cannot do this for you because it does not know your backend architecture.

Clear the activeStreamId. Once cancellation is confirmed, clear the stored stream reference, but only if it still matches the stream you intended to cancel. A newer stream may have started between the cancellation request and the completion of the cancel logic.

Each step exists to address a specific race condition. Between the moment a user clicks stop and the moment the server processes the request, a new message can be sent, a new stream can start, or the partial assistant message can be overwritten by a server-side completion. The stop endpoint handles all of these correctly only if it checks every condition in sequence.

This is buildable. The AI SDK docs provide a full implementation. But consider what you are actually shipping: a dedicated HTTP endpoint, a stream ID tracking layer, a partial message persistence mechanism, and backend-specific cancellation logic. The SDK provides none of it. All of it has to stay in sync with the rest of your streaming infrastructure. Most developers discover this after they ship their first stop button.

Three questions to ask about your stop button before shipping

Before you ship, answering these three questions will tell you whether your stop button actually does what it looks like it does.

Does clicking stop actually stop backend generation, or does it only stop the client from receiving tokens? If you have not built a stop endpoint, the answer is the latter.
What happens to the partial assistant message when stop is called? If you are not persisting a snapshot server-side, the message may disappear or be overwritten when the stream closes.
What happens if a new message is sent while a stop request is in flight? If your stop endpoint does not check the activeStreamId, it may cancel a stream the user has already moved past.

If all three have clean answers, your stop button works. If not, the gap will show up in production, usually after a user notices their coding assistant or support agent kept billing them for a response they clicked stop on.

All three problems trace back to the same root cause: HTTP streaming gives the server no way to distinguish intent from a connection event. There is an approach that removes the problem at the transport level rather than working around it.

How a bidirectional session changes the stop vs disconnect distinction

Ably AI Transport is built on a different model. Instead of HTTP streaming, it uses a persistent bidirectional session. The client and server can both send signals at any time, over the same connection. That means cancel, stop, and redirect are first-class signals, not workarounds built on top.

On an Ably session, cancel is a named signal rather than an inference from a dropped connection. The client publishes a cancel signal on the session: session.cancel(runId). The server receives it on the corresponding run, and its abortSignal fires. Generation stops. The run ends with the reason 'cancelled', and every subscriber receives the lifecycle update.

Because the cancel is a session event rather than a TCP disconnection, the server knows exactly what happened. A network drop does not fire the cancel handler. A user clicking stop does. The session remains intact, and the next message starts a new run cleanly.

The race condition that the stop endpoint exists to solve is handled natively. Each run has a unique runId. A cancel signal targeting a run that has already ended is ignored, and multiple signals matching the same run cancel it once.

For patterns beyond cancellation, the session supports cancel-then-send (cancel the active run and immediately send a new message) and send-alongside (send a new message while the active run continues). See the interruption docs for full implementation guidance.

For the Vercel AI SDK-specific analysis, including GitHub citations and billing evidence, see why Vercel AI SDK stop doesn't cancel the stream.

" width="799" height="382">

Canceling a run with Ably AI Transport

With Ably AI Transport, cancellation from the client is a single call:

// Cancel a specific run
await activeRun.cancel();

// Or cancel by runId, from any connected device
await session.cancel(runId);

On the server, the abort signal fires automatically:

const run = session.createRun(invocation);
await run.start();
await run.loadConversation(); // hydrate prior conversation history

const result = streamText({
  model: anthropic('claude-sonnet-4-6'),
  messages: await convertToModelMessages(run.messages),
  abortSignal: run.abortSignal, // fires when cancel() is called client-side
});

const { reason } = await run.pipe(result.toUIMessageStream());
await run.end(reason); // reason is 'cancelled' when abort fires

The abortSignal is passed directly to the model call. When the client cancels, the signal fires, generation stops, and the run ends with reason 'cancelled'. No stop endpoint to build, no activeStreamId to track, no race condition to guard against.

One edge case worth noting: cancellation is asynchronous, so a small tail of tokens may arrive after cancel() returns and before the server's abortSignal fires. Those tokens still belong to the cancelled run, not the next one. Also, any tool invocation that does not check the abortSignal will keep running until it completes, so if your agent calls tools, pass the signal through to each one.

Adopting Ably AI Transport: what changes in your stack

Shifting from HTTP streaming to an Ably session does not change your LLM call, your model provider, or your agent framework. AI Transport sits at the delivery layer, below orchestration. Your Vercel AI SDK, LangGraph, or custom agent logic stays unchanged. For teams using the Vercel AI SDK specifically, Ably ships a drop-in transport adapter, @ably/ai-transport/vercel, that swaps the transport underneath useChat without changing the hook.

What changes is the transport. Instead of an HTTP POST that returns a streaming response, the client opens an Ably session. Cancel, stop, and redirect become session signals, not HTTP endpoints.

There is a trade-off: an Ably session adds a persistent connection to your architecture. If stop is the only signal you need, a stop endpoint is the lighter choice. The session model earns its place when you need several of these signals: cancel, redirect, steer, human handover, multi-device continuity. They all run on the same infrastructure, so if you are already building one of those patterns, you are building the foundation for all of them.

Conclusion

The stop vs disconnect distinction is a structural property of HTTP streaming, not a framework bug. Closing an HTTP connection does not carry intent; only an explicit signal sent separately from the stream does.

A correct stop endpoint is buildable, but it is four moving parts that have to stay in sync with your streaming infrastructure. Most developers discover the gaps after they ship.

Ably AI Transport takes a different approach. On an Ably session, cancel is an explicit signal. Race conditions are handled at the transport level. The session persists through cancellation, and the next message starts a clean run.

Docs go deeper: Ably AI Transport cancellation docs | Interruption patterns | Vercel AI SDK stop documentation

Frequently asked questions

Does calling chat.stop() in the Vercel AI SDK cancel the underlying generation?

No. chat.stop() closes the HTTP connection. The underlying generation — the model request, background job, or stream writer — keeps running until it completes. You are billed for every token. The Vercel AI SDK documents this explicitly: a client-side abort is a disconnect signal, not a cancellation. Stopping generation requires a dedicated stop endpoint that you build and maintain alongside your streaming infrastructure.

Why can't the server detect a client disconnect and stop generation automatically?

The server can detect that the HTTP connection is closed. It cannot tell whether this was an intentional stop, a network drop, a page refresh, or a tab crash. In a resumable stream architecture, all four are treated as disconnects by design: the stream should survive a network drop. Treating every disconnect as an intentional stop would cancel streams on network blips and prevent reconnection. Distinguishing them requires an explicit signal from the client, which is why a stop endpoint is necessary.

What is activeStreamId checking, and why does my stop endpoint need it?

activeStreamId is a reference that your application stores, linking each chat to its currently active stream. The stop endpoint reads this value and compares it against the stream ID the client sends with the stop request. If a newer stream has started since the client initiated the stop, the stop request is stale and should be ignored. Without this check, the stop endpoint may cancel a stream the user has already moved past, leaving the conversation in an inconsistent state.

How does Ably's session model handle the stop vs disconnect distinction?

On an Ably session, cancel is an explicit event published by the client, either via activeRun.cancel() for the current run or session.cancel(runId) to target a specific run by ID. The server receives it as a named session signal, not as a TCP disconnection. A network drop does not trigger the cancel handler. An intentional stop does. These two events have separate handling, without requiring a stop endpoint or idempotency logic. The session remains intact after cancellation, and the next user message starts a clean run.

How do I build interruptible AI streaming, and is redirect or steer supported today?

You need a bidirectional session. With Ably AI Transport, calling activeRun.cancel() or session.cancel(runId) publishes an explicit cancel signal the server acts on immediately, regardless of connection state. activeRun.cancel() is the typical client-side call; session.cancel(runId) lets you target a specific run by ID, including from a different device. Beyond cancel, the session supports two interruption patterns: cancel-then-send, which cancels the active run before starting a new one, and send-alongside, which lets both runs continue concurrently. See the interruption docs for full implementation guidance.

What's your current approach to stop and cancellation in production? Do you have a stop endpoint, or are you relying on client-side disconnect? Would love to hear how others are handling this.

When should you replace DefaultChatTransport?

Maddy Quinn — Mon, 22 Jun 2026 14:18:09 +0000

TL;DR: DefaultChatTransport uses HTTP POST and SSE. This is correct for a single user on a stable connection - but it reaches its design boundary when production requires cancellation that reaches the server, multi-device delivery, stream resumption without Redis, or multi-user sessions. This post covers the four limits, a four-question self-audit, and what a WebSocket-based session layer adds.

You've built an AI chat app on the Vercel AI SDK. It works in development. The model responds, the stream comes through, and the UI updates cleanly. Then you ship to production, and the transport layer starts showing its edges.

Most of these failures are quiet: things that work in demos and break in ways that are hard to pin down until you know where to look. They share a common cause: DefaultChatTransport is built for HTTP, and HTTP has structural properties that some production requirements exceed. This piece explains what those limits are, which ones matter for your application, and what replacing the transport actually involves.

Key takeaways

DefaultChatTransport uses HTTP POST and Server-Sent Events (SSE). These protocols are one-way and point-to-point. That's correct behavior for a stateless serverless platform, not a bug in the SDK.
stop() fires the abort signal client-side and returns immediately. GitHub issue #9707 (open, October 2025) confirms the server cannot distinguish an intentional stop from a dropped connection, and may continue generating and billing until completion.
The official Vercel AI SDK stream resumption pattern requires Redis, the resumable-stream package, two custom API endpoints, and a dedicated stop handler. In a resumable stream setup, stop() is treated as a disconnect, not a cancel.
The ChatTransport interface is pluggable by design. Vercel's serverless platform cannot host persistent WebSocket connections, so they made the transport layer swappable. Replacing DefaultChatTransport with a WebSocket-based transport layer creates a durable session between your agent and client, without changing your agent, tool calls, or UI rendering.

How DefaultChatTransport works, and the conditions it was built for

When you call useChat() without a transport option, or pass a default config, DefaultChatTransport is what runs. It sends outgoing messages via HTTP POST and receives responses as an SSE stream.

For a single user on a stable connection, sending a message and waiting for the response, this is the right choice. A stateless serverless function receives the request, calls the model, and streams the response back. HTTP is the right tool for that, and DefaultChatTransport uses it correctly.

That behavior follows from a platform constraint: Vercel's serverless functions terminate after responding, so there is no persistent process to hold a socket open. That's the root of all four limits. They're architectural, not configurable, because HTTP on a stateless platform simply can't do what they require. The Ably guide to WebSockets on Vercel covers this constraint in depth if you want the full picture.

That's also why Vercel made ChatTransport pluggable in AI SDK 5. DefaultChatTransport is not broken: it's correct for the conditions it was built for. But Vercel designed the interface precisely so teams can swap in a transport that isn't bound by those conditions.

It's not just DefaultChatTransport that has this constraint. Even DirectChatTransport, the other built-in option, explicitly documents that it "does not support reconnection since there is no persistent server-side stream to reconnect to." Reconnection is a transport-layer property. The default implementations don't have it because the platform they're built for doesn't support it.

Four things DefaultChatTransport can't do in production

These are the limits that surface when you move beyond a single-user chatbot: a customer support agent that hands off between devices, a chat interface where a human and an AI both participate, or any application where the connection dropping mid-generation has a visible cost to the user.

Each follows from the same root: HTTP/SSE is built for one connection, one client, one response. When production asks for more, that constraint becomes visible.

Cancellation is ambiguous, and you may be paying for it. When a user clicks stop, stop() closes the HTTP connection client-side, and returns immediately, without waiting for the server to acknowledge or terminate the generation. The server receives a connection close event. It has no way to distinguish that from a tab close, a network drop, or a mobile device going to sleep. So it keeps generating.

GitHub issue #9707 (filed October 2025, still open) documents this directly: createUIMessageStream does not detect the abort signal server-side, making it "impossible to stop ongoing AI generation and leading to unnecessary costs and poor UX." GitHub issue #10844 adds that Vercel's own supportsCancellation: true config flag behaves unreliably in production deployments. The cost is real: orphaned generations run to completion, and there's no reliable mechanism to stop them without a custom server-side endpoint.

Multi-device delivery silently fails. SSE is one-to-one. One HTTP connection, one client, one stream. A user with the same session open on their laptop and phone receives the response only on the device that sent the request. The second device gets nothing: no error, no partial content, no indication that anything is in flight. This isn't a useChat configuration gap. It's a structural property of HTTP. Multi-device fan-out is absent from the vast majority of AI transport implementations because SSE is one-to-one by design. DefaultChatTransport is no exception.

The same architectural root connects the next limit. Where multi-device delivery requires fan-out that HTTP cannot provide, stream resumption requires session persistence that HTTP cannot maintain.

Stream resumption requires infrastructure that you build and own. The Vercel AI SDK stream resumption documentation lists the prerequisites directly: a Redis instance, the resumable-stream package, a POST handler that creates resumable streams using consumeSseStream, a GET handler at /api/chat/[id]/stream that resumes them with resumeExistingStream, and a dedicated stop endpoint.

stop() and resumable streams are also architecturally incompatible. The docs state it directly: "In a resumable stream setup, client-side aborts are treated as disconnects. Closing a tab, refreshing the page, or calling stop() only closes the current HTTP connection and should not cancel the underlying generation." Adding a working stop button requires a separate server-side endpoint to cancel the underlying work and clear the active stream record.

Tab switches and mobile backgrounding are a further gap the resumable-stream pattern doesn't cover in the same way as a page reload. The Ably guide on Vercel AI SDK resumable streams covers the distinction.

The single-response assumption breaks multi-user sessions. Vercel designed useChat around one user sending one message and receiving one response. It tracks one activeResponse at a time. If a second user joins, or an observer device needs the same response lifecycle, the only available mechanism is setMessages. This bypasses lifecycle hooks, tool-call notifications, and onFinish callbacks entirely. It works, but it's a workaround. Zak Knill's post on building the Ably transport covers the implementation detail.

Each of the four limits above has the same root cause but surfaces differently. The table below maps them to their production cost:

Limit	What breaks	Production cost	Configurable in `DefaultChatTransport`?
Cancellation	Server can't distinguish stop from disconnect	Orphaned generations; ongoing billing	No
Multi-device	SSE delivers to one client only	Silent failure on second device	No
Stream resumption	Requires Redis, two endpoints, stop handler	Significant custom infrastructure	No
Single-response assumption	`setMessages` bypasses lifecycle hooks	Broken tool calls, missing `onFinish`	No

How a WebSocket-based transport layer creates a durable session between agent and client

Replacing DefaultChatTransport with a WebSocket-based transport layer replaces a stateless HTTP connection with a durable session between your agent and your users. One that persists beyond any single connection and addresses all four limits directly. It also removes the custom infrastructure that those limits force you to build. The Ably topic page on implementing a custom ChatTransport covers the full capability surface. This section covers what disappears from your backlog.

With a WebSocket-based transport layer, you no longer need:

The Redis buffer for resumable streams
The stop endpoint with race condition protection
The fan-out layer for multi-device delivery
The setMessages workaround for multi-user sessions

The mechanism that makes this possible is straightforward. A session is decoupled from the connection. The session persists independently; a connection is how a client subscribes to it. When a client disconnects and reconnects, it presents its last position to the session and receives only the messages it missed. A cancel signal is sent explicitly on the session: the server reads it as intent, not as a connection close event it has to interpret.

Ably AI Transport is built as the session layer for production AI applications: the infrastructure between your agent and your users that handles the delivery concerns that DefaultChatTransport can't. It plugs into useChat as a ChatTransport implementation via a single configuration change:

// Before: default HTTP transport
const { messages, sendMessage, stop } = useChat({
  transport: new DefaultChatTransport({ api: '/api/chat' }),
});

// After: Ably AI Transport (backed by an Ably session)
const { chatTransport } = useChatTransport(); // from <ChatTransportProvider>
const { messages, sendMessage } = useChat({ transport: chatTransport });

In practice: stop() sends a typed signal the server can act on, instead of a connection close event that it has to guess at. Any device subscribed to the same session receives the stream, so a user switching from laptop to phone doesn't lose the conversation. If the connection drops mid-generation, the client reconnects and catches up from where it left off, because the session persists independently of any single connection.

What stays unchanged: your agent, tool calls, message persistence logic, and UI rendering. The swap is the transport option in useChat. Everything built on top of it carries over.

For the implementation detail on own-turns, observer-turns, and setMessages handling, see Zak Knill's post. For how transport options compare more broadly, see the durable sessions guide for Vercel AI SDK applications. The four questions in the next section will help you work out whether you're at that decision point yet.

When DefaultChatTransport is still the right choice

The four limits above are real, but they only become blockers if you need cancellation that reaches the server, multi-device delivery, stream resumption beyond page reloads, or more than one user in the same conversation. For many applications, DefaultChatTransport remains the right starting point.

A practical way to assess your own situation is to work through four questions:

Do you need stop() to reliably cancel server-side generation, not just the UI update, but the actual model call?
Do users access the same session from more than one device or tab?
Do you need stream resumption across tab switches or mobile backgrounding, not just full page reloads?
Does more than one user participate in the same conversation?

If the answer to all four is no, DefaultChatTransport is a defensible choice. If any answer is yes, the relevant section above describes the specific limit you'll encounter. The right time to replace the transport is when those limits start costing you.

If the self-audit above lands on yes for any of the four questions, DefaultChatTransport has reached its limit for your use case. The transport layer is the right place to fix it, and replacing it changes nothing else in your application.

The next step is understanding the ChatTransport interface: what sendMessages and reconnectToStream require, and what to look for in an implementation. The Ably ChatTransport topic page covers that in full. To get started with Ably AI Transport directly, the Vercel AI SDK integration guide is the right starting point.

Frequently asked questions

Does the Vercel AI SDK support multi-device AI chat out of the box?

Not with DefaultChatTransport. SSE is scoped to a single HTTP connection, so a second device has no way to join a stream already in progress. Multi-device delivery requires a transport where the session exists independently of the connection, so any subscribed client receives it. The Ably guide on why Vercel AI SDK can't stream to multiple devices provides the full picture.

Why doesn't stop() cancel server-side generation in Vercel AI SDK?

Because DefaultChatTransport has no signal path back to the server. When stop() closes the HTTP connection, the server receives a TCP close it can't distinguish from a network drop, so generation continues and billing runs to completion. With a WebSocket-based transport layer, stop() sends a typed cancel message on the session; the server reads it as intent, not inference. The Ably guide on why stop() doesn't cancel the stream covers the full mechanism.

How much infrastructure does Vercel AI SDK stream resumption require?

The official pattern requires a Redis instance, the resumable-stream package, a POST handler with consumeSseStream, a GET handler at /api/chat/[id]/stream, and a dedicated stop endpoint with race condition handling. stop() and resumable streams are also architecturally incompatible: in a resumable stream setup, a client abort is treated as a disconnect, not a cancel. See the Ably guide to Vercel AI SDK resumable streams for the full breakdown.

When should I replace DefaultChatTransport?

When the limits start affecting your production application. The four-question self-audit in the "When DefaultChatTransport is still the right choice" section gives a practical framework. In short: if you need stop() to reliably cancel server-side generation, multi-device delivery, stream resumption beyond page reloads, or multi-user sessions, the default transport can't provide those. The Ably durable sessions guide for Vercel AI SDK covers the transport options available once you've decided to move on.

Why replace DefaultChatTransport with a WebSocket-based transport layer?

When DefaultChatTransport's design scope no longer fits your production requirements. If you're hitting unconfirmed cancellations, single-device delivery, Redis-dependent stream resumption, or the setMessages workaround for multi-user sessions, those are properties of HTTP/SSE that a WebSocket-based transport layer resolves at the transport level. Your agent, tool calls, and UI code don't change.

Vercel AI SDK custom transport vs default transport, what actually changes?

The delivery mechanism only. Your agent, tool calls, message persistence, and UI rendering stay the same. The swap is the transport option in useChat, one configuration change. For a full before/after and getting started guide, see the Ably AI Transport Vercel integration guide.

What transport issues have you hit building on the Vercel AI SDK? Would be interested to hear which of the four comes up most in practice.

Why AWS ALB and Cloudflare silently kill your AI agent sessions

Maddy Quinn — Mon, 15 Jun 2026 11:51:00 +0000

TL;DR: WebSocket reconnection restores the transport. It doesn't restore the session. Tokens generated during the gap, tool call results that arrived while the client was offline, and the agent's position in the ongoing generation are all lost unless you have a session layer. This post covers the timeout sources that hit agentic applications specifically, why SSE is a bad fit for bidirectional agent communication, and how session recovery works in practice.

Why AI agents get disconnected in ways standard apps don't

WebSocket reconnection has always been worth solving. What makes AI agents different is when the disconnect happens.

A standard chat interface goes quiet between user interactions — when there's genuinely nothing happening on the wire. An agent goes quiet mid-execution: during tool call waits, between reasoning steps, while the LLM is generating a response. That silence is the agent doing its most intensive work. To every load balancer and proxy in the path, it looks idle.

AWS Application Load Balancer defaults to closing connections after 60 seconds of inactivity. Cloudflare enforces a 100-second idle timeout on Free and Pro plans — fixed, cannot be raised. Corporate proxies and enterprise gateways add their own thresholds that you often can't inspect or configure.

Plenty of production WebSocket applications have shipped without explicitly thinking about this. The reason: traditional server-side workloads tend to emit a trickle of traffic on their own — progress events, periodic state updates — which keeps the connection alive as a side effect. The timeouts stay invisible because something is always crossing the wire.

Agentic applications don't have that property. A customer support agent goes quiet mid-answer while the user is typing a correction. A coding agent waits for the user to approve a tool call before continuing. A research agent sits in silence for 90 seconds while a downstream API responds. None of that is idleness from the agent's perspective. To the ALB, it's all the same.

Why SSE doesn't fit

If you're reaching for SSE as an alternative, it won't solve the session problem — and it introduces a new one.

The applications this post is about — customer support agents, coding agents, research agents the user steers mid-task — require the client to send messages back to the agent on the same session while it's in flight. A user correcting an assumption, approving a tool call, or cancelling mid-implementation needs a channel in both directions.

SSE streams server-to-client only. That rules it out at the transport level regardless of how well you've solved the replay problem.

The fix for idle timeouts: server-side ping frames

The WebSocket spec includes a mechanism designed exactly for this: server-side ping frames. The server sends a ping at a fixed interval; the browser responds automatically with a pong; both frames count as activity and reset every idle timer on the path.

The interval needs to sit comfortably below the shortest timeout on the path. A 50-second interval covers both the AWS ALB 60-second default and Cloudflare's 100-second limit simultaneously. Browsers respond to ping frames automatically — no client-side code required.

Common idle timeouts to plan around

Infrastructure	Default timeout	Configurable?
AWS Application Load Balancer	60 seconds	Yes — `idle_timeout.timeout_seconds`, up to 4,000s
Cloudflare (Free/Pro)	100 seconds	No — fixed. Enterprise customers can request custom values.
Corporate proxies, gateways	Varies — often invisible	Depends on deployment

For ALB, raise the limit if your workload genuinely needs a longer window. The idle_timeout.timeout_seconds attribute is adjustable in the load balancer configuration and takes effect immediately without a redeployment.

For Cloudflare Free and Pro plans, you can't raise the limit. The server-side ping approach at 50 seconds is the only viable mitigation.

If connections die at exactly 100 seconds in production, check EdgeStartTimestamp and EdgeStopTimestamp in Cloudflare's HTTP request logs to confirm the source before debugging elsewhere.

Other connection challenges to consider

Not all disconnects come from idle timeouts. Two other patterns hit agentic applications in production:

Corporate VPN and enterprise proxy traversal

Many enterprise networks don't forward the HTTP Upgrade header required to open a WebSocket connection. The connection never opens rather than dropping mid-session. The failure appears at the WebSocket handshake stage — typically a non-101 HTTP response — not as a silent close after inactivity.

The fix is protocol fallback: when a proxy blocks the WebSocket upgrade, the transport degrades automatically to HTTP streaming or long-polling without per-deployment configuration.

Mobile network handoffs

Switching from WiFi to cellular drops the underlying TCP connection immediately. On mobile, the client's onclose event often doesn't fire — the OS terminates the connection without a clean close frame. On iOS specifically, background TCP connections are suspended within seconds of the app moving to the background, again without notification.

Don't rely on onclose to trigger reconnection for mobile users. Use failed-send detection and an application-level heartbeat timeout to catch silent closes.

What transport reconnection recovers — and what it doesn't

Here's where most teams discover the gap. Reconnecting the WebSocket connection restores the transport. It doesn't restore the state of the session that was in flight when the connection dropped.

Transport reconnection recovers	It doesn't recover
The WebSocket connection itself	Tokens generated while disconnected
Active session subscriptions	Tool call results that arrived during the gap
The ability to send and receive new messages	The agent's reasoning trace if streamed as events
The session ID and session name	The position in the ongoing generation

After a successful reconnect with only transport-layer recovery, the client is back online, but the session is in an indeterminate state. The client holds a partial response from before the disconnect. The agent continued generating on the server side. Neither side knows where the other stopped.

How session recovery works

This is where Ably AI Transport comes in. AI Transport acts as the session and delivery layer between your agent and your users.

The agent publishes every event — each generated token, each tool call, each reasoning step — to a session. AI Transport stores those events and is responsible for delivering them to the client whenever the client is connected. From the agent's side, this is fire-and-forget: it doesn't care whether the client is online, offline, mid-reconnect, or freshly loaded into a new browser tab.

When a client connects or reconnects, it asks for everything it hasn't already seen. AI Transport returns the missed events, in order, before the live stream resumes. There's no "live vs. history" boundary the application needs to reason about, and no difference in handling for a 30-second drop vs. a 30-minute disconnect vs. a fresh page load.

One detail worth understanding: the session doesn't store one event per token. Tokens are appended to a single message per agent response — conflation — so the session history contains one accumulated message per response, not thousands of token-sized events. A client reconnecting mid-stream receives the in-progress message in its current accumulated form and resumes streaming from there. A client loading the page fresh receives the same accumulated message as a single coherent block. The application doesn't write reconciliation logic for either case.

For more on how this works in practice, see AI Transport's reconnection and recovery and history and replay documentation.

What the user should see during a disconnect

Session recovery handles the infrastructure layer. But a reconnect that works silently in the background still needs the right UI treatment to avoid looking like a failure.

AI Transport exposes well-defined connection states. The key distinction: the disconnected state (temporarily offline, retrying automatically) vs. the suspended state (retry window exhausted).

During disconnection: show a reconnecting indicator, not an error modal. In the suspended state: show a retry button. The session is intact and waiting — communicate that.

What building this yourself actually costs

Building session recovery without a purpose-built layer means writing:

A heartbeat loop
A reconnection manager
Manual state reconstruction logic
A connection state component to surface each phase to the user

None of these is large in isolation. Together, they constitute infrastructure. And any infrastructure your team owns is infrastructure your team spends time and resources maintaining as requirements change.

Ably AI Transport provides the session recovery layer:

Automatic connection recovery within the two-minute window
History compaction and replay so clients always receive clean, accumulated state on reconnect
Protocol fallback from WebSocket to HTTP streaming to long-polling
Bidirectional signaling on the same session

What remains in your application code is the connection state UI — surfacing the reconnecting and suspended states to the user — and that's a handful of lines rather than a system.

Frequently asked questions

How do I stop AI chat sessions from timing out?

Configure your WebSocket server to send ping frames at a fixed interval below the shortest timeout on your path. A 50-second interval sits comfortably below both the AWS ALB 60-second default and Cloudflare's fixed 100-second limit on Free and Pro plans, with browsers responding automatically — no client-side code required. If your workload needs a longer window, raise the idle_timeout.timeout_seconds attribute in your ALB configuration; it's adjustable up to 4,000 seconds.

What happens if a user disconnects during LLM streaming?

With AI Transport, the session resumes automatically upon reconnect, with missed tokens delivered in order before new ones arrive, and no application code needed. For longer disconnects, AI Transport's history and replay feature loads the full conversation from the session history. Without a session layer, tokens generated during the dropout are lost, and the agent can't resume from the point of interruption.

How do I avoid duplicate AI messages after a WebSocket reconnect?

With AI Transport you don't need to — the SDK handles this through history compaction. Tokens are streamed as appends to a single message per agent response, and the session history stores one message per response rather than one per token. When a client reconnects or refreshes, it receives the single accumulated message rather than individual tokens to reconstruct.

What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?

The AWS Application Load Balancer idle timeout defaults to 60 seconds and applies to all connection types, including WebSocket. Raise it by updating the idle_timeout.timeout_seconds load balancer attribute. The valid range is one to 4,000 seconds; most AI agent workloads are well served by a value between 3,600 and 4,000 seconds. The change takes effect immediately without requiring a redeployment.

Does Cloudflare close WebSocket connections? What is the timeout?

Yes. Cloudflare enforces a 100-second idle timeout on WebSocket connections for Free and Pro customers. The limit is fixed on those plans and can't be raised. Enterprise customers can configure a custom value through their account team. To keep connections alive on Free and Pro plans, configure your WebSocket server to send ping frames every 50 seconds. Browsers respond automatically with pong frames, which reset Cloudflare's idle timer and the 60-second AWS ALB default simultaneously.

Can WebSockets work behind a corporate VPN or enterprise proxy?

They can, but many enterprise proxies don't forward the HTTP Upgrade header required to open a WebSocket connection. When that happens, the connection fails at the handshake stage rather than dropping mid-session. That failure is distinct from a timeout: the error occurs before any data flows, not after a period of inactivity. Protocol fallback to HTTP streaming or long-polling handles proxy blocking at the infrastructure layer without per-deployment configuration.

How long does Ably retain channel history for session recovery?

AI Transport replays missed messages automatically on reconnect, with no application code needed. For longer disconnects, session history loads the full conversation, persisting for 24 to 72 hours depending on your Ably plan, with extended retention available on higher tiers.

What's your experience here — have you run into session state loss specifically, or mostly fought the transport reconnection side of the problem? Interested in what patterns teams are using to handle the UI side of a mid-stream disconnect.

Why your AI chat reconnects but your session doesn't

Maddy Quinn — Wed, 27 May 2026 10:05:10 +0000

TL;DR: WebSockets are the right protocol for production AI chat. But the connection is stateless at the session level. When it drops — AWS ALB defaults to 60 seconds, Cloudflare to 100 seconds on Free and Pro plans — all in-flight tokens, tool call results, and agent context disappear. Reconnection logic restores the socket. It doesn't restore the session. That's the gap this post covers.

WebSockets are the right protocol for production AI chat. But that fact doesn't prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context.

The reconnected socket has no view of what happened while it was down. Three conditions cause this routinely: a proxy timeout mid-task, a page reload mid-generation, and a mobile network handoff. Each breaks for the same underlying reason: the WebSocket protocol handles transport, not session state, and reconnection logic doesn't change that.

Key takeaways

WebSockets are the right protocol for production AI chat: bidirectional, persistent, and suited to live steering and tool calls in ways SSE isn't.
A WebSocket connection is stateless at the session level. When it closes through a proxy timeout, page reload, or device switch, all state disappears with it.
Reconnection logic re-establishes the transport. It does not recover the tokens, tool calls, or agent context in flight when the connection is dropped.
What fills the gap is a session layer: infrastructure that persists conversation state against a session ID and replays it to reconnecting clients.

What WebSockets get right for AI chat

The protocol question is worth settling early, because the rest of this piece argues about the infrastructure layer above it. For production AI chat, the choice is WebSockets or SSE. Both stream tokens to the client, but only WebSockets let signals flow the other way.

WebSockets are bidirectional. When your user cancels mid-stream, that signal travels back on the same channel; tool call confirmations and workflow approvals work the same way. When a workflow pauses for human input mid-execution, that input must arrive in-band, not via a polling endpoint.

SSE is a one-way stream. For simple chatbots on stable networks, that doesn't matter. Add tool calls, mid-stream cancellation, or multi-device continuity, and it does.

Where production AI connections actually fail

Not all connection drops come from bad network conditions. The more common causes in production are infrastructure defaults designed for HTTP requests, not AI chat. A response can be mid-generation for tens of seconds, and most defaults weren't built for that.

AWS Application Load Balancer idle timeout. AWS ALB closes connections idle for 60 seconds by default, per the AWS Application Load Balancer documentation. For standard HTTP that's generous. For an agent waiting on a downstream API, 60 seconds of silence is routine, and the connection closes without warning. Your user's response stops mid-sentence with no explanation.

Cloudflare proxy timeout. On Cloudflare Free and Pro plans, WebSocket connections terminate after 100 seconds of inactivity, as documented in Cloudflare's WebSocket troubleshooting guide. Enterprise plans can raise this limit; on Free and Pro plans, the ceiling is fixed.

Mobile network handoffs. Switching from WiFi to cellular drops the underlying TCP connection immediately, taking the WebSocket with it. On mobile this happens during normal use: walking between coverage areas, backgrounding the tab, entering a building.

Page reload and tab crash. Your user reloads mid-generation, or the browser crashes, both of which are routine. The connection closes, and any session state tied to it is gone unless something stored it.

Why reconnection logic doesn't fix the session problem

The standard reconnection pattern re-establishes the socket. Transport recovers in milliseconds. But it cannot restore the state that was in flight when the connection dropped.

Token stream position. The response kept generating while the connection was dark. Those tokens went nowhere. When the client reconnects, it arrives mid-sentence or finds nothing at all.

Tool call results. Some chat responses depend on realtime data: a lookup, a search, or an action your user triggered. If the connection dropped while the agent was waiting for that result, the response either never came — or ended before it could use the information.

Agent context. In a multi-turn exchange, the agent accumulates context: what was asked, what was answered, and what's in progress. When a session drops and reconnects without state recovery, the agent and the client are at different points in the same conversation. Your users experience this as a loss of thread: a response that ignores what came before, or one that repeats something already answered.

The pattern most teams reach for is a Redis buffer: sequence number tracking, offset storage, and deduplication keys between the agent and the client. It handles full page reloads. It tends to break on deploy-triggered reconnects, mobile handoffs that hit the reconnect window twice, and anything that generates messages faster than the buffer drains.

Even Vercel's AI SDK lead built a pluggable interface to fill this gap. Every team reaching this point builds the same infrastructure from scratch and chooses to own it indefinitely. Reconnection handles the protocol layer; session state sits one layer above it, and it's a separate problem entirely.

What production AI chat needs from the transport layer

Any viable approach to production AI sessions needs to satisfy four requirements. These are implementation-neutral: what any infrastructure option has to provide, regardless of vendor.

Persistent state storage. Conversation history, token positions, tool call inputs and outputs, and agent state must be stored against a stable session ID and survive connection drops. The session ID is the anchor: the same session must be addressable after any reconnect, from any device.

Offset-based replay. A returning client requests messages from its last received serial. The infrastructure delivers everything missed, in order, with no duplicates. The client supplies its offset; the infrastructure fills the gap.

Protocol fallback. When a WebSocket upgrade is blocked by a proxy or firewall, the transport degrades to HTTP streaming or long-polling automatically. This should not require per-deployment configuration.

Multi-device delivery. Any authenticated device subscribing to a session ID receives the current state plus history. The session is not bound to the tab, browser, or device that opened it.

How Ably AI Transport solves the session layer problem

Thankfully, you don't need to build the infrastructure. Ably AI Transport is the durable session layer — the thing that makes the user experience survive what the WebSocket protocol cannot. The session lives in Ably; your application talks to it.

The five failures raised in this post each map directly to a capability:

Connection drops from proxy timeouts, mobile handoffs, and page reloads. The transport degrades automatically — WebSocket first, then HTTP streaming, then long-polling — so the session survives the infrastructure defaults that break standard WebSocket connections. No per-deployment configuration required.
→ Reconnection and recovery

Tokens generated while the client was disconnected. The token stream is stored against the session. On reconnect, the client receives everything it missed in order, with no duplicates. The developer doesn't track offsets or implement catch-up logic.
→ Token streaming

Tool call results and agent context lost mid-task. Agent state, tool call inputs and outputs, and conversation history are all published to the session as they generate. A reconnecting client recovers the full context, not just the tokens.
→ Reconnection and recovery

Mid-stream steering and human-in-the-loop signals. Cancellations, approvals, and human input travel back to the agent on the same session channel. The bidirectional requirement that rules out SSE is covered without a separate signaling mechanism.
→ Human in the loop

Sessions tied to a single tab or device. Any authenticated device subscribing to the session ID receives current state plus history. A conversation started on desktop continues on mobile without restart.
→ Multi-device sessions

Get started: Vercel AI SDK · Core SDK

Frequently asked questions

When is SSE still the right choice for AI chat?

SSE is a reasonable starting point for chatbots that follow a simple request-response pattern: a user submits a message, the server streams tokens, no interruption required. It deploys more easily than WebSockets, carries no persistent connection overhead, and works well on stable networks.

The constraints appear when your application starts adding agentic behaviour: tool calls, mid-stream cancellation, multi-device continuity, and background tasks that complete while the user is offline. At that point, SSE's unidirectional architecture stops being a trade-off and becomes a blocker.

What timeout values should I configure to prevent AI connection drops in production?

Set your AWS ALB idle timeout to at least 3,600 seconds for WebSocket connections. The 60-second default was designed for HTTP requests, not long-running agent tasks. On Cloudflare Free and Pro plans, the WebSocket timeout is fixed at 100 seconds. Send heartbeat pings at around 25-second intervals to stay well below that threshold.

For Nginx, the equivalent setting is proxy_read_timeout. These three changes cover most production timeout failures for AI chat deployments.

Does reconnection logic solve the session recovery problem?

Reconnection logic solves the transport problem. It doesn't solve the state problem. Exponential backoff and heartbeats re-establish the socket.

But they can't recover tokens generated during the gap, tool call results that arrived while the client was disconnected, or context accumulated across multiple steps. Preventing duplicate messages on reconnect requires sequence numbers or idempotency keys at the session layer, not the WebSocket layer. A client that reconnects without a session layer arrives at an empty context and either loses the conversation or restarts it.

How does Ably replay missed messages after a WebSocket reconnect?

Ably assigns every published message a serial number. When a client reconnects, the transport layer uses the internal untilAttach mechanism to fetch messages published during the gap. This bounds the history query to the exact reconnection point.

Ably delivers everything missed in order, with no overlap between historical and live messages. The client doesn't track its own offset or implement catch-up logic. Every plan includes two minutes of ephemeral history by default. Persisted channels extend this to 72 hours on Standard plans, or up to 365 days on Pro and Enterprise plans.

Have you hit this in production? Curious what the failure looked like - was it the proxy timeout, a page reload, or something else that first surfaced it?

Ably AI Transport is now available

Ably Blog — Tue, 20 Jan 2026 17:49:29 +0000

Today we’re launching Ably AI Transport: a drop-in realtime delivery and session layer that sits between agents and devices, so AI experiences stay continuous across refreshes, reconnects, and device switches — without an architecture rewrite.

The gap: HTTP streaming breaks down for stateful AI UX

AI has moved from “type and wait” requests to experiences that are long-running and stateful: responses stream, users steer mid-flight, and work needs to carry across tabs and devices. That shift changes what “working” means in production. It’s not just whether the model can generate tokens, it’s whether the experience stays continuous when real users behave like real users do.

Most AI apps still start with a connection-oriented setup: the client opens a streaming connection (SSE, fetch streaming, sometimes WebSockets), the agent generates tokens, and the UI renders them as they arrive. It’s low friction and demos well.

But HTTP streaming really solves only the first part of the problem, and it’s not a good place to end.

First: continuity. When output is tied to a specific connection, the experience becomes fragile by default. Refreshes, network changes, backgrounding, multiple tabs, device switches, agent handovers (even agent crashes) are normal behaviour. And they’re exactly where teams see partial output, missing tokens, duplicated messages, drifting state, and “start again” recovery paths. That’s where user trust gets lost.

Second: capability. A connection-first transport layer doesn’t just make UX fragile. It limits what you can build. Once you want true collaborative patterns like barge-in, live steering, copilot-style bidirectional exchange, multi-agent coordination, or a seamless human takeover with full context, you need more than “a stream.” You need a stateful conversation layer that can support multiple participants, resumable delivery, and shared session state.

So teams patch it: buffering, replay, offsets, reconnection logic, session IDs, routing rules for interrupts and tool results, multi-subscriber consistency, and observability once production incidents start. It’s critical work — but it’s not differentiation.

What Ably AI Transport does

AI Transport gives each AI conversation a durable bi-directional session that isn’t tied to one tab, connection or agent. Agents publish output into a session channel, clients subscribe from any device, and Ably handles the delivery guarantees you’d otherwise rebuild yourself: ordered delivery, recovery after reconnects, and fan-out to multiple subscribers.

It’s deliberately model and framework-agnostic. You keep your agent runtime and orchestration. AI Transport handles the delivery and session layer underneath.

The key shift: sessions become channels

In a connection-oriented setup, the “session” effectively lives inside the streaming pipe. When the pipe breaks, continuity becomes a headache.

With AI Transport, the session is created once and represented as a durable channel. Agents and clients can join independently. Refresh becomes reattach and hydrate. Device switching becomes another subscriber joining the same session. Multi-device behaviour becomes fan-out rather than custom routing. Agents and humans become truly connected over a transport designed for AI bi-directional low latency conversations.

How Ably AI Transport ensures a resilient, stateful AI UX

Resumable, ordered token streaming: A great AI UX depends on durable streaming. Output is treated as session data, so clients can catch up cleanly after refreshes, brief dropouts, and network handoffs.

Multi-device continuity: Conversations are user-scoped, not tab-scoped. Multiple clients can join the same session without split threads, duplication, or drifting state.

Live steering and interruption: Modern AI UX needs control, not just output. Interrupts, redirects, and approvals route through the same bi-directional session fabric as the response stream, so steering works even across reconnects and devices.

Presence-aware sessions: Once agents do real work, wasted compute becomes a serious cost problem. Presence provides a reliable signal for whether the user is currently connected (or fully offline across devices), so you can throttle, defer, or resume work accordingly.

Agents that collaborate and act with awareness: As soon as you have more than one agent (or an agent plus tools/workers), coordination becomes the product. Shared session state and routing prevent clashing replies, duplicated context, and “two brains answering at once,” so multiple agents can communicate directly with users coherently.

Seamless human takeover when it really matters: When an agent hits a boundary (risk, uncertainty, or policy) a human should be able to step in with full context and continue the session immediately. The handoff keeps the same session history and controls, so there’s no repeated questions, no “start again,” and no losing track of what happened mid-flight.

Identity and access control: Beyond toy demos, you need to know who can read, write, steer, or approve actions. Verified identity plus fine-grained permissions let multi-party sessions stay secure without inventing a bespoke access model.

Observability and governance: When AI UX breaks in production, it’s rarely obvious where. Built-in visibility into session delivery and continuity makes failures diagnosable and auditable instead of “black box streaming incidents.”

Concrete examples

Multi-device copilots: A user starts a long-running answer on desktop, switches to mobile mid-response, and the session continues without restarting. Steering and approvals apply to the same session regardless of device.

Long-running agents: A research agent runs multi-step tool work for minutes. If the user disconnects, the work continues; when the user returns, the client hydrates from session history instead of resetting.

Getting started (low friction)

You can get a basic session running in minutes:


js
import Ably from 'ably';

// Initialize Ably Realtime client
const realtime = new Ably.Realtime({ key: 'API_KEY' });

// Create a channel for publishing streamed AI responses
const channel = realtime.channels.get('my-channel');

// Publish initial message and capture the serial for appending tokens
const { serials: [msgSerial] } = await channel.publish('response', { data: '' });

// Example: stream returns events like { type: 'token', text: 'Hello' }
for await (const event of stream) {
  // Append each token as it arrives
  if (event.type === 'token') {
    channel.appendMessage(msgSerial, event.text);
  }
}

AWS us-east-1 outage: How Ably’s multi-region architecture held up

Ably Blog — Fri, 24 Oct 2025 12:24:17 +0000

Resilience in action: zero service disruption

During this week’s AWS us-east-1 outage, Ably maintained full service continuity with no customer impact. This was our multi-region architecture working exactly as designed; error rates were negligibly low and unchanged throughout. Any additional round trip latency was limited to 12ms, which is below the typical variance in any client-to-endpoint connection, and well below our 40–50ms global median; this is imperceptible to users and below monitoring thresholds. There were no user reports of issues. Taken together this means there was zero service disruption.

The technical sequence

Ably provides a globally-distributed system hosted on AWS with services provisioned in multiple regions globally. Each scales independently in response to the level of traffic in the regions, and us-east-1 is normally the busiest region.

From the onset of the AWS incident what we saw was that the infrastructure already existing in that region continued to provide error-free service. However, issues with various ancillary AWS services meant that our control plane in the region was disrupted, and it was clear that we would not be able to add capacity in the region as traffic levels increased during the day.

As a result, at around 1200 UTC we made DNS changes so that new connections were not routed to us-east-1; traffic that would have ordinarily been routed there (based on latency) were instead handled in us-east-2. This is a routine intervention that we make in response to disruption in a region. Pre-existing connections in us-east-1 remained untouched, continuing to serve traffic without errors and with normal latency throughout the incident. Our monitoring systems, via connections established before the failover, confirmed this directly.

Latency impact: negligible

We continuously test real-world performance in multiple ways. Monitors operated by Ably, in proximity to regional datacenter endpoints, indicated that the worst case impact on latency - which would have been clients directly adjacent to the us-east-1 datacenter, but which now have to connect to us-east-2 - was 12ms at p50. We also have real browser round-trip latency measurements using Uptrends, which more closely simulate real users, with actual browser instances publishing and receiving messages between various global monitoring locations.

These measurements taken during the incident are shown below; real-world clients experienced even lower latency impact, since from each of the cities tested, there is negligible difference in distance, and latency, between that location and us-east-2 versus us-east-1. Taken across all US cities that are monitoring locations, the measured latency difference averaged 3ms. That actual difference is substantially lower than normal variance in client connection latencies, and is therefore imperceptible to users and well below monitoring thresholds.

We restored us-east-1 routing on 21 October following validation from AWS and our own internal testing.

The architecture at work

This incident validated our multi-region architecture in production:

Each region operates independently, isolating failures
Latency-based DNS adapts routing to regional availability
Existing persistent connections are unaffected if the only change is to the routing of new connections
A further layer of defense, not used in this case, provides automatic client-side failover to up to five globally-distributed endpoints

That final layer matters. Even if us-east-1 infrastructure had failed entirely (it didn’t), client SDKs would have automatically failed over to alternative regions, maintaining connectivity at the cost of increased latency. It didn’t activate this time, since regional operations continued normally, but it’s a core part of our defense-in-depth strategy.

Lessons reinforced

The key takeaways for us from this incident:

A genuinely distributed system spanning multiple regions, not just availability zones, is essential for ultimate continuity of service
Planning for, and drilling, responses to this type of event is critical to ensuring that your resilience is real and not just theoretical
A multi-layered approach, with mitigations both in the infrastructure and SDKs, ensures redundancy and continuity even without active intervention. AWS continues to be an outstandingly good global service, but occasional regional failures must be expected. Well-architected systems on AWS infrastructure are capable of supporting the most critical business needs

Keep your realtime apps running smoothly, even when the internet breaks. Try Ably for free today!

How doxy.me turned realtime from a liability into a strategic asset

Ably Blog — Thu, 17 Jul 2025 16:30:49 +0000

When Realtime Breaks, Virtual Care Breaks.

Here’s the hard reality of telehealth: when infrastructure cracks, care collapses. A missed chat ping or delayed check-in notification isn’t just a glitch, it’s a broken line of communication between a provider and a patient. That can be the difference between timely diagnosis and uncertainty, between trust and frustration.

Doxy.me runs over 250,000 virtual visits every day. Their core product delivers browser-based telemedicine that works without downloads or installations. But behind the scenes, one part of the stack had become a chronic liability: realtime infrastructure.

It wasn’t just fragile - it was feared. VP of Engineering, Ben Anderson-Dukes recalled their engineering team viewing it as a “black box.” Over time, that black box became a bottleneck. Small issues like ghost check-in chimes and out-of-order messages became symptoms of deeper instability. Lag was unpredictable. Full-day outages were a looming threat. Worse still, this fragile infrastructure had become the second-largest operating expense after video delivery.

That wasn’t sustainable. But replacing their realtime provider wasn’t a simple procurement problem. It required a mindset shift from seeing realtime as a necessary evil to treating it like a core product surface that deserved strategic investment.

A Strategic Overhaul, Not Just a Quick Fix.

That partner was Ably. Unlike previous vendors, Ably took a collaborative, hands-on approach from day one, offering architectural guidance and flexible support throughout the migration.

“We needed a partner who could not only help us rebuild realtime into something reliable, scalable, and secure, but one that was developer-friendly and of a similar mindset to doxy.me.”

With support from external dev agency Walter Code, doxy.me and Ably planned a phased migration away from their existing realtime provider to Ably’s modern WebSocket-based infrastructure. The process was methodical: time it right, plan it right, implement it right. Nothing was rushed.

“Initially, we thought this would take a year. But with Ably, we went from design to 100% migrated in under six months.”

Despite handling more than 250,000 calls a day, the migration was completed with zero downtime. The transition not only modernized their infrastructure but also demystified it. Engineers gained new visibility into system behavior, and the team collectively regained confidence.

“They helped us not only get there, but raise the bar in internal education about realtime as well.”

Real Results, Not Just Promises.

Post-migration, the transformation was visible across both technical and business metrics:
• 65% reduction in realtime infrastructure costs
• 95% fewer patient queue issues
• 99% drop in app crashes caused by signaling issues
• 100% elimination of ghost check-in chimes

But beyond metrics, the real value came from stability. doxy.me’s CTO, initially skeptical, came to view realtime as a stable part of core infrastructure. The engineering team moved from firefighting to forward planning.

“Support tickets dropped, realtime errors disappeared, and our Datadog logs became clean and readable.”

The financial impact was just as compelling:

“We saw a full ROI in under six months, despite using an external team to handle the migration. That’s unheard of.”

“For me personally, it’s been a really great win. It’s bolstered Engineering’s reputation internally. There was much kudos served all-round.”

Looking Forward: Realtime as Innovation Engine.

With Ably in place, doxy.me isn’t just maintaining a stable stack, they’re building on top of it. New use cases are now in scope, including advanced presence, session orchestration, and collaboration tooling designed to make virtual care more intuitive and human.

“Even now, we’re exploring roadmap innovations together.”

One of doxy.me’s largest customers, accounting for nearly 30% of realtime traffic, now operates smoothly with no degradation in performance. Internally, engineers have better observability, faster diagnostics, and more freedom to innovate.

“Ably is helping us push realtime beyond the basics, into new opportunities that let providers be with their patients more reliably and securely.”

At its core, doxy.me is about connection - between patient and provider, between care and access.

“We believe that providers are the real heroes, and that doxy.me is their superpower.”

With Ably behind the scenes, that superpower now has a reliable, resilient realtime engine built for scale.

“Ably has helped us turn realtime from a liability into a strategic asset.”

Doxy.me built this on top of Ably Pub/Sub, the core messaging product in the Ably platform. Please see our docs if you're interested in the technical details.

Achieving low latency with pub/sub

Ably Blog — Wed, 22 Jan 2025 22:39:19 +0000

In pub/sub messaging systems, getting messages to flow quickly between publishers and users isn't just critical to its general performance, but central to its basic usability. Achieving this at scale introduces some extra challenges that require thoughtful architecture design and strategies for handling unexpected behavior (e.g. traffic spikes).

To better understand the best practices we can implement to our architecture to overcome these challenges, let's revisit how the pub/sub pattern works.

What is pub/sub?

Pub/sub (or publish/subscribe) is an architectural design pattern used in distributed systems for asynchronous communication between different components or services. Although publish/subscribe is based on earlier design patterns like message queuing and event brokers, it is more flexible and scalable. The key to this is the fact that pub/sub enables the movement of messages between different components of the system without the components being aware of each other’s identity (they are decoupled).

For a deeper dive into pub/sub, including examples and comparisons to other messaging patterns, see our guide: What is pub/sub?.

Why latency is crucial to pub/sub realtime systems

Latency is the time it takes for data to travel from the backend (like a datacenter) to the end-user’s device. Latency levels of <100ms are hard to achieve in general, but for pub/sub systems, it’s essential that those speeds remain not only consistent but also undetectable so that users remain engaged and don’t quit the app entirely. Applications that focus on, for example, crucial broadcasting updates, realtime chat, and live streaming services, need to be able to deliver seamless experiences with ultra-low latency to maintain their user bases.

This becomes especially important at scale for a global audience: If a pub/sub system can’t maintain these speeds as it scales up and reaches a global user base, message delays could render it unusable, even if your infrastructure has the raw capacity. Serving a single region is significantly simpler than achieving consistent low latency across a distributed global audience, where factors like inter-region data replication and network variability come into play. Global median latency is a good measure of average global latency if you’re operating at scale, and it’s the metric we use to measure our speeds at Ably.

Some architectural decisions you can make to achieve low latency are:

Global datacenter coverage: The physical proximity of datacenters or edge points of presence (PoPs) to end users significantly impacts round-trip times for messages. If you distribute datacenters and PoPs globally, you can drive down latency for your users.
Protocol efficiency: The choice of protocol affects how efficiently messages are transmitted. For example, WebSocket is highly efficient for realtime communication compared to HTTP long polling. (WebSockets are a particularly good protocol for achieving low latency in pub/sub systems since they maintain an open connection between the client and server without the need for frequent HTTP responses. For a deeper dive into how WebSockets compare to other protocols in pub/sub systems, check out our guide Pub/Sub vs WebSockets.)
Network robustness: A reliable, fault-tolerant underlying network infrastructure can ensure consistent low latency even under high traffic volumes.

Challenges to achieving low latency

The most straightforward obstacle to latency is network speeds - latency is inherently affected by the distance between clients and the server. The farther a client is from a datacenter, the longer it takes for messages to reach them.This is a critical consideration for global systems, where distances between users and datacenters can span continents. But there are other factors that can affect end-user latency:

Message routing: Poorly optimized routing can lead to bottlenecks, especially in use cases with high fanout where a single message is delivered to thousands or millions of subscribers.
Load balancing: If you don’t have a load balancer, or an improperly configured one, imbalances can cause overloading of certain nodes, resulting in delays for subscribers.
System resource contention: High message volumes can strain CPU, memory, and storage resources, leading to increased latency. This is particularly true during traffic spikes.
Encoding: Inefficient message encoding increases latency by slowing down the system’s ability to translate data into a transmittable format and back again.

Best practices for achieving low latency

Best practices for achieving low latency are, on paper, straightforward fixes to the points discussed above. However, making these changes to your architecture requires significant engineering effort and potentially a rehaul of your existing infrastructure. Here’s what we recommend you do:

Use a globally-distributed architecture

Deploying servers in multiple regions reduces the physical distance between clients and the server, minimizing network latency. Make sure that your infrastructure includes a combination of core datacenters and edge points of presence (PoPs). This secures fast round-trip and consistent round trip times for users anywhere in the world.

Optimize message routing

Efficient routing algorithms, such as consistent hashing, can ensure that messages are delivered to subscribers quickly and reliably. For systems with high fanout, prioritize techniques that minimize duplication and ensure messages are processed efficiently.

Have a load balancer

Dynamic load balancing distributes traffic evenly across servers, preventing overloading. For pub/sub systems, load balancers must account for both connection count and message throughput.

Use message delta compression

Compressing messages reduces their size, enabling faster transmission over the network. Use lightweight, efficient compression algorithms to minimize processing overhead.

Autoscale to reduce resource consumption

Optimize resource usage by scaling infrastructure elastically during traffic spikes. Use dynamic autoscaling to add capacity on demand and maintain a significant resource buffer.

Have redundancy and failover

Build redundancy into servers and have failover mechanisms that reroute traffic during outages. For global systems, failover strategies should account for regional redundancies to make sure that if one region experiences an outage, traffic can seamlessly shift to another without impacting users worldwide. This minimizes latency spikes during failover events and ensures uninterrupted service.

How Ably can help

For many people, building a system with all of these components from the ground up is impractical and is a huge time and skill investment. That investment also tends to be more expensive than initially expected because of maintenance costs and other challenges - like scalability and data integrity - that make maintaining a low enough latency even more difficult.

At Ably, our team is very familiar with the amount of work building a low-latency pub/sub system takes - and all the edge cases around optimum performance. We’ve made it our mission to provide the most reliable realtime service for you - and Ably Pub/Sub is devoted to pub/sub use cases.

Choosing a managed pub/sub service like Ably can save you and your team the headache of managing the architectural challenges of low latency at scale. Performance is one of Ably’s core pillars, and it’s built into what we do. Here’s how:

Predictable performance: A low-latency and high-throughput global edge network, with median latencies of <50ms.
Guaranteed ordering & delivery: Messages are delivered in order and exactly once, with automatic reconnections.
Fault-tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs. 99.999999% (8x9s) message availability and survivability, even with datacenter failures.
High scalability & availability: Built and battle-tested to handle millions of concurrent connections at scale.
Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.

Low latency is non-negotiable for any pub/sub system that aims to deliver realtime experiences at scale. If you’re looking for a solution that scales up and ensures some of the lowest latencies in the business, Ably provides a robust and reliable platform to power your pub/sub needs. Sign up for a free account to try it for yourself.

How to use Ably LiveSync’s MongoDB Connector for realtime and offline data sync

Carolina Carriazo — Thu, 16 Jan 2025 13:01:19 +0000

In light of the recent deprecation of MongoDB Atlas Device Sync (ADS), developers are seeking alternative solutions to synchronize on-device data with cloud databases. Ably LiveSync offers a potential alternative and can replace some of ADS’s functionality, enabling realtime synchronization of database changes to devices at scale. LiveSync allows for a large number of changes to MongoDB to be propagated to end user devices in realtime and store the changes in any number of local storage options - from an embedded database to in-memory storage.

For instance, imagine an inventory app that needs to broadcast stock updates to multiple devices in realtime. Ably LiveSync allows you to automatically subscribe to inventory changes in your database and broadcast this data to millions of clients at scale, allowing them to remain synchronized with the state of your inventory in realtime.

This article explains why on-device storage is critical, explores existing solutions, and demonstrates how Ably LiveSync’s MongoDB connector can help with a brief code tutorial.

Why keep information on-device?

Local storage is a must for apps that need offline access or fast performance— like e-commerce inventory apps, or news apps downloading content for offline browsing. But not every app needs it. If your app is always online or just streams read-only data, you can skip the complexity of a local database. Thankfully, with Ably, you can adapt to your use case, whether you need offline support or just realtime updates. Some of the benefits of on-device storage are:

Offline access: Storing data directly on the device ensures users can seamlessly access and interact with information even when they have no internet connection or are in areas with poor connectivity. This is particularly crucial for users who frequently work in offline environments or travel to locations with unreliable network coverage.
Performance: Applications demonstrate significantly improved response times and reduced latency when accessing data stored locally, as opposed to making time-consuming server calls across the network. This local data access eliminates network-related delays and provides instantaneous data retrieval for critical operations. Cost efficiency: Users experience substantial savings on their data usage and associated costs since the application doesn't need to repeatedly download information from remote servers. This is especially beneficial for users with limited data plans or in regions where mobile data is expensive.
User experience: Users benefit from a consistently smooth and reliable application experience, maintaining uninterrupted access to their data regardless of their network status or connection quality. This reliability helps build user trust and satisfaction with the application.

Options for storing information on device

Modern mobile operating systems provide a variety of ways to store information on device:

iOS: Includes UserDefaults, CoreData, and SQLite, with flexibility for additional solutions based on specific needs.
Android: Provides shared preferences, Room database, and file storage.
Cross-platform frameworks: With React Native, react-native-async-storage is a popular starting library for simple needs. However, for advanced use cases requiring NoSQL-like abilities, some good choices here would be RealmDB (which, unfortunately, as we know, is being deprecated), UnQLite, LevelDB, or Couchbase.

Regardless of your choice of an on-device database and storage methodology, you can use Ably LiveSync to synchronize data from your managed or on-premises database to mobile devices in realtime. This includes MongoDB - as well as Atlas. While we currently support only MongoDB and PostgresSQL, we are working on adding support for other database engines.

What is Ably LiveSync?

Ably LiveSync lets you monitor database changes and reliably broadcast them to millions of frontend clients, keeping them up-to-date in realtime.

LiveSync works with any tech stack and prevents data inconsistencies from dual-writes while avoiding scaling issues from "thundering herds" — sudden surges of traffic that can overwhelm your database.

How to persist data locally with Ably

Let’s explore how to build a simple in-store management app that tracks product inventory using React Native and SQLite for local storage and a Mongo Atlas for our cloud database. Despite Mongo being a document storage and SQLite being a relational database, the two can be used in combination. We are going to use the Ably SDK callback methods to store documents and changes inside our local SQLite database.

Setting up Ably

For simplicity, we’ll stick to TypeScript. Before anything else, create a new React Native project using the CLI:

npx @react-native-community/cli@latest init AwesomeStore

Creating a MongoDB integration rule with Ably

Now we need to create a new channel that streams database changes to your clients. This ensures realtime updates whenever your MongoDB data changes. To create an integration rule that will sync your MongoDB database with Ably, you’ll first have to sign up for an Ably account.

Once that’s done, you should have access to your Ably dashboard. Create an app or select the app you wish to use. Navigate to the Integrations tab > Create a new integration rule > MongoDB. Fill out the Connection URL with your MongoDB connection URL; Database name with your db name (for this example, SQLiteDatabase); and Collection with your collection name (for this example, products). For more information on this process and the parameters involved, check out our docs.

This sets up a new channel, built on top of our core Ably Pub/Sub product, which streams changes (through MongoDB change streams) from your database to your clients.This essentially ensures that any change that occur in your database will be delivered to any device subscribed to a channel.

Creating the local datastore

We’ll create a new file in our project called datastore.js and initialize SQLite:

export const createTables = async (db: SQLiteDatabase) => {

  const query = `CREATE TABLE IF NOT EXISTS products(
        id INT32
        name TEXT NOT NULL
        description TEXT
        quantity INT32
    );`;

  await db.executeSql(query);
};

After the tables are created, we need a way to retrieve store products and update their stock:

export const getProducts = async (db: SQLiteDatabase): Promise<ToDoItem[]> => {
  try {
    const products: StoreProduct[] = [];
    const results = await db.executeSql(`SELECT id, name, description, quantity FROM ${tableName}`);
    results.forEach(result => {
      for (let index = 0; index < result.rows.length; index++) {
        products.push(result.rows.item(index))
      }
    });
    return products;
  } catch (error) {
    console.error(error);
    throw Error('Failed to get products!');
  }
};

export const saveOrUpdateProducts = async (db: SQLiteDatabase, products: StoreProduct[]) => {
  const insertQuery =
    `INSERT OR REPLACE INTO ${tableName}(id, name, description, quantity) values` +
    products.map(i => `(${i.id}, '${i.value}', '${i.description}', '${i.quantity}')`).join(',');

  return db.executeSql(insertQuery);
};

Receiving database changes from MongoDB over Ably

Let’s take a look at how we can receive changes from the configured Ably channel. More information can be found in our documentation, but this is the important snippet we need - setting up the Ably Realtime SDK:

import * as Ably from 'ably';

// Instantiate the Realtime SDK
const ably = new Ably.Realtime('[your API key]');

// Get the channel to subscribe to
const channel = ably.channels.get('store:1');

// Subscribe to messages on the 'store:1' channel
await channel.subscribe((message) => {
    // Print every change detected in the channel
  console.log('Received a change event in realtime: ' + message.data)
});

We need to write a new function that will take the payload of message.data and store it in our database:

const addOrUpdateProduct = async (data: any) => {
    try {
      let product: StoreProduct = {
            id: data.fullDocument.id,
            name: data.fullDocument.name,
            description: data.fullDocument.description,
            quantity: data.fullDocument.quantity,
      }
      const db = await getDBConnection();
      await saveOrUpdateProducts(db, product);
    } catch (error) {
      console.error(error);
    }
  };

We can call our new function in our message subscription:

// Subscribe to messages on the 'store:1' channel
await channel.subscribe((message) => {
      // Print every change detected in the channel
      console.log('Received a change event in realtime: ' + message.data)
      if (data.ns.coll === "products") {
            await addOrUpdateProduct(message.data)
      } else if (data.ns.coll === "store") {
            // Another function which updates changes for the `store` collection
      } else {
            console.warn("Unknown collection");
      }
});

The full workflow

With this setup, the app listens for realtime updates from your MongoDB collection and persists changes locally, ensuring an up-to-date inventory system even when offline.

Final thoughts: What makes Ably different

I hope that gives you a good overview of what Ably LiveSync’s MongoDB Connector can do! Besides providing a potential alternative to Atlas Device Sync, Ably, as a realtime communications platform, is built for scalability and reliability. Here are some features of Ably Pub/Sub, the backbone upon which LiveSync’s database connector is built:

Predictable performance: A low-latency and high-throughput global edge network, with median latencies of <50ms.
Guaranteed ordering & delivery: Messages are delivered in order and exactly once, even after disconnections.
Fault-tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs. 99.999999% (8x9s) message availability and survivability, even with datacenter failures.
High scalability & availability: Built and battle-tested to handle millions of concurrent connections at scale.
Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.Try Ably today and explore our MongoDB connector.

Low latency at scale: Gaining the competitive edge in sports betting

Ably Blog — Mon, 06 Jan 2025 09:27:04 +0000

The sports betting industry has grown rapidly in recent years, fueled by changing regulations, advancements in technology, and a rising demand for realtime interactions from consumers. For many fans, in-play betting adds another dimension to how they can engage, making them feel closer to the action. When you then consider the increasing number of global followers many teams now have, it’s easy to understand why global revenues are projected to continue expanding at a compound annual growth rate (CAGR) exceeding 10%. To sustain this growth, data providers and betting companies face a key challenge: consistently ensuring fast, low latency delivery of data and services, even as they scale operations to meet growing demand – every fan needs to receive the same data at the same time.

Low latency - the rapid transmission of data with minimal delay—plays a crucial role in the sports betting ecosystem. A single delay in odds updates or live bet placement can impact user retention and create financial exposure for operators. This challenge becomes even more pressing as companies expand into new geographies, requiring infrastructure that can deliver consistent, low latency experiences worldwide.

Why low latency matters in sports betting

In sports betting, every second matters. For bettors, much of the appeal of sports betting lies in the immediacy of their interaction with live events. Odds must be delivered and displayed in realtime, and delays can result in missed opportunities for bettors and revenue loss for operators.

If odds are not updated instantly after a critical game event, operators risk users betting on outdated odds. For data providers, failure to deliver realtime data can strain relationships with clients, harm reputation and even have legal implications.

The importance of a consistent user experience

Scaling betting operations isn’t just about onboarding new users—it’s about ensuring these users have the same experience, regardless of location or time.

There are two key infrastructure requirements involved in making this possible: Consistent low latency, and the ability to handle disconnections.

When it comes to latency, users demand the same experience whether they’re betting from Europe, Asia, or the Americas. A lack of consistency can create an uneven playing field – or even cause users to abandon platforms as they develop a mistrust for their data. For both data providers and betting companies, this means achieving similar levels of low latency to clients in diverse geographies. From an infrastructure perspective, this means having a global network of points of presence which devices can connect to and achieve low latency, wherever your users are.

When it comes to handling disconnections and subsequent reconnections, this is particularly key when serving users in markets with unstable network conditions – but is also useful when a user changes data networks or is travelling. For betting operators, they need a strategy to define how a bet is treated if it is placed as a user loses connectivity, and the odds change while they are offline for example – should the original odds be ‘retained’, or should the odds be refreshed upon reconnection? Operators need to consider how to ensure that users with worse network connectivity don’t become ‘worse off’ as a result.

How Genius Sports delivers critical data at a global scale

Genius Sports serves betting companies that demand instantaneous data delivery for live sporting events. Before adopting their current realtime infrastructure, they relied on a traditional centralised system to deliver live data to its betting clients. Maintaining live data performance at low latency and on a global scale, meant locating ever larger and more costly on-premise facilities, close to customers. The system struggled with scalability and latency consistency, especially during high-demand events like global tournaments. As costs and latency demands increased, Genius Sports needed a new realtime solution.

To ensure a consistent user experience, the company switched to a cloud-native distributed infrastructure that could handle their need to reliably serve customers – regardless of their location. By leveraging a WebSocket-based realtime data streaming solution, Genius Sports is now able to deliver on the high expectations of both its B2B clients and the end-users who rely on their ultra-low latency data delivery without the headache of having to manage and scale the realtime infrastructure.

Responding to the growing competition for fan engagement

As the sports betting market becomes more competitive, betting companies are incorporating innovative ways to deliver realtime experiences to keep their customers ‘on-platform’. This means investing in infrastructure that can not only handle the fast-paced nature of betting and provide instant, low latency, updates but also enable the creation of interactive experiences.

Sportsbet faced a unique challenge with its innovative “Bet With Mates” feature, a product allowing friends to pool resources and bet together. A critical component was a chat feature that mirrored the functionality and speed of modern messaging applications like WhatsApp.

Sportsbet’s existing infrastructure could not deliver the rich user experience their customers expected, especially for features like realtime updates, reactions, and comments. As well as latency needs, Sportsbet also had very stringent security and data handling requirements. For Sportsbet, opting for a cloud-native realtime platform that handled both the underlying infrastructure and latency complexities but also made building their chat solution easy, was game changing. The solution saved Sportsbet valuable time and effort compared to them building a basic implementation from scratch.

Low latency, high stakes

It is clear that the sports betting industry is at an inflection point. As regulatory changes continue to unlock new opportunities, companies must rise to the challenge of delivering low latency experiences at scale. Whether it’s B2B providers like Genius Sports ensuring data consistency or B2C platforms like Sportsbet creating seamless user experiences, the ability to operate in realtime, no matter where and when, will define success.

How Sportsbet handles 4.5M daily chat messages on its 'Bet With Mates' platform

Ably Blog — Fri, 20 Dec 2024 12:35:55 +0000

Sportsbet is a leader in the Australian wagering market. Through their best-in-class platform, which includes ‘Bet With Mates’, they bring excitement to life for sports and racing enthusiasts - replicating the experience of punting with friends in the pub, no matter where they are.

But after launching 'Bet With Mates', Sportsbet customers were still off the platform (second-screening) to talk about their bets and banter with each other in WhatsApp and other chat apps. Sportsbet wanted to introduce that functionality into ‘Bet With Mates’ and provide everything their customers needed without ever having to leave the platform.

Sportsbet needed a chat feature that met their customers’ high expectations of messaging applications.

Selecting the right chat solution for 'Bet With Mates'

When deciding which solution to use for the 'Bet With Mates' chat, there were a few key criteria.

It had to be feature-rich including reaction and reply functionality and also update in realtime.

As an extremely event-driven business with huge traffic spikes during major events like the Australian Football League, National Rugby League finals, and the Melbourne Cup, the solution needed to be highly performant and scalable.

It had to demonstrate great frontend performance figures, integrate well into Sportsbet’s build pipeline, and be future-proofed for other realtime use cases that developers were planning.

As well as latency needs, Sportsbet also had very stringent security and data handling requirements, so the solution needed to be hosted within Australia on a dedicated cluster.

And finally, the team decided they needed to deliver the new product in less than four months so that it would be live and in customer hands before the next AFL season launch!

Why Sportsbet chose Ably

Based on their requirements, Sportsbet decided to move ahead with Ably after comparing them to other providers. Some key benefits for Sportsbet included:

Great customer support: Early assessments of Ably’s documentation encouraged Sportsbet to quickly move to build a proof of concept. This involved dedicated support from the Ably team to consult on requirements, providing access to SDKs for their chosen tech stack, and creating a sandbox environment to conduct some integration testing and analysis.
Easy to get started: Sportsbet started out building a prototype using Ably React Hooks with their existing react client and were impressed with how quickly they could get a basic chat feature going without having to build services. They then added some components that published events including bet placements and group activity as well as features like reactions and comments in the same message stream.
Support for data security: Ably rapidly spun up a new dedicated cluster within Australia specifically for Sportsbet, which removed another potential barrier to being ready for the AFL season. Ably’s SAML integration also enabled Sportsbet to plug into their existing SSO system in record time.

Sportsbet + Ably: The results

The ‘Bet With Mates’ chat feature has proven a hit with fans, contributing to the organic growth of the overall ‘Bet with Mates’ platform. It has also proven sticky – customers who use ‘Bet With Mates’ Chat use it regularly.

Sportsbet put this success down to Ably’s unwavering reliability when it comes to performance and message delivery. They also reported that autoscaling has performed flawlessly without any incidents of concern in over a year, even on high traffic days. Peak figures for these high traffic periods have reached around 4.5 million published messages a day.

Reflecting on the success of the project and relationship with Ably, Andy commented:

“By choosing to partner with Ably, we were able to deliver a high quality outcome in a frankly impressive timeframe, and free up our delivery teams earlier to focus on other initiatives. It’s a testament to the strength of Ably’s offering how much of our time with them is spent discussing other potential use cases rather than the current implementation.”

Ably: The definitive realtime experience platform. Built for scale.

Sportsbet is one of the thousands of companies that depend on Ably to power realtime experiences for billions of people - including live updates, chat, collaboration, notifications and fan engagement. Reliably, securely and at serious scale.

Why choose Ably?

99.999% uptime SLA: We guarantee 5x9s of uptime, but consistently do better. We've had 100% uptime for 5+ years.
No scale ceiling: Ably handles massive amounts of data throughput and concurrent connections without SREs breaking into sweat.
Strong data integrity: Guaranteed data ordering, delivery, and exactly-once semantics. Even under unreliable network conditions.
Almost-infinite elasticity: Bursty connection traffic? Ably seamlessly and automatically absorbs millions of concurrent connections arriving at once.
Composable realtime: Our range of application building blocks and integrations enable developers to create the live experiences users and businesses demand. From live chat to data broadcast, and collaborative UXs to notifications, our SDKs unlock innovation - with no infrastructure to build.
Customer-first pricing, affordable at scale: Ably's pricing offers per-minute billing, consumption-based pricing, and volume-based discounts to keep you ROI positive, as you scale.

For more information, read our docs, or sign up for free!

DEV Community: Ably

Why SSE breaks down for production AI customer support chat

Key takeaways

What are WebSockets and SSE?

Why the WebSockets vs SSE choice matters for production AI chat

How SSE breaks down under real AI chat conditions

Canceling or interrupting a response mid-stream

Escalating from AI to a human agent mid-conversation

Continuing a conversation across devices

Enterprise proxy and firewall behavior

How to choose between WebSockets and SSE for AI chat

SSE

WebSockets

How Ably AI Transport adds durable sessions on top of WebSockets

FAQ

Why chat.stop() doesn't cancel your LLM generation (and what to build instead)

Why stop() and disconnect mean different things

What a correct stop implementation actually requires

Three questions to ask about your stop button before shipping

How a bidirectional session changes the stop vs disconnect distinction

Canceling a run with Ably AI Transport

Adopting Ably AI Transport: what changes in your stack

Conclusion

Frequently asked questions

When should you replace DefaultChatTransport?

Key takeaways

How DefaultChatTransport works, and the conditions it was built for

Four things DefaultChatTransport can't do in production

How a WebSocket-based transport layer creates a durable session between agent and client

When DefaultChatTransport is still the right choice

Frequently asked questions

Does the Vercel AI SDK support multi-device AI chat out of the box?

Why doesn't stop() cancel server-side generation in Vercel AI SDK?

How much infrastructure does Vercel AI SDK stream resumption require?

When should I replace DefaultChatTransport?

Why replace DefaultChatTransport with a WebSocket-based transport layer?

Vercel AI SDK custom transport vs default transport, what actually changes?

Why AWS ALB and Cloudflare silently kill your AI agent sessions

Why AI agents get disconnected in ways standard apps don't

Why SSE doesn't fit

The fix for idle timeouts: server-side ping frames

Common idle timeouts to plan around

Other connection challenges to consider

Corporate VPN and enterprise proxy traversal

Mobile network handoffs

What transport reconnection recovers — and what it doesn't

How session recovery works

What the user should see during a disconnect

What building this yourself actually costs

Frequently asked questions

How do I stop AI chat sessions from timing out?

What happens if a user disconnects during LLM streaming?

How do I avoid duplicate AI messages after a WebSocket reconnect?

What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?

Does Cloudflare close WebSocket connections? What is the timeout?

Can WebSockets work behind a corporate VPN or enterprise proxy?

How long does Ably retain channel history for session recovery?

Why your AI chat reconnects but your session doesn't

What WebSockets get right for AI chat

Where production AI connections actually fail

Why reconnection logic doesn't fix the session problem

What production AI chat needs from the transport layer

How Ably AI Transport solves the session layer problem

Frequently asked questions

When is SSE still the right choice for AI chat?

What timeout values should I configure to prevent AI connection drops in production?

Does reconnection logic solve the session recovery problem?

How does Ably replay missed messages after a WebSocket reconnect?

Ably AI Transport is now available

The gap: HTTP streaming breaks down for stateful AI UX

What Ably AI Transport does

The key shift: sessions become channels

How Ably AI Transport ensures a resilient, stateful AI UX

Concrete examples

Getting started (low friction)

AWS us-east-1 outage: How Ably’s multi-region architecture held up

Resilience in action: zero service disruption

The technical sequence

Latency impact: negligible

The architecture at work

Why `stop()` and disconnect mean different things