<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ably Blog</title>
    <description>The latest articles on DEV Community by Ably Blog (@ablyblog).</description>
    <link>https://dev.to/ablyblog</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F987387%2Ff9d0ea92-d06e-46d6-8efc-6e92b510943e.png</url>
      <title>DEV Community: Ably Blog</title>
      <link>https://dev.to/ablyblog</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ablyblog"/>
    <language>en</language>
    <item>
      <title>Your AI UX isn't broken. Your session layer is.</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 23 Jun 2026 10:29:03 +0000</pubDate>
      <link>https://dev.to/ablyblog/your-ai-ux-isnt-broken-your-session-layer-is-3c2b</link>
      <guid>https://dev.to/ablyblog/your-ai-ux-isnt-broken-your-session-layer-is-3c2b</guid>
      <description>&lt;p&gt;Fiona Corden is Ably's Technical Product Manager who works directly with engineering and product teams building AI products. She ran a webinar on AI UX failure modes recently and had more to say than the time allowed. This is the written version: the argument she was making, some context on the demo, and the questions that came up. The recording is below if you'd rather watch than read.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/u0pgBSKZ74A"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The thesis: AI products are being let down by the user experience, not the model.&lt;/p&gt;

&lt;p&gt;Over the past nine months, I've spoken to engineering and product teams at more than 40 companies across a range of industries, all of whom are shipping AI agents, assistants, and copilots to real users at scale. They're all running into the same problems, despite having little else in common, and the root cause isn't model capability because that is the thing that varies most. What they share is the delivery layer between the agent and the client.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Most AI UX failures are delivery failures, not model failures. The fix is moving the session off the connection — a transport-layer swap, not a rewrite.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Most AI UX failures are delivery failures, not model failures. The transport layer is where sessions break, context is lost, and user trust erodes.&lt;/li&gt;
&lt;li&gt;The fix is to stop treating the connection as the session. A durable session is a persistent, shared resource that sits between agent and client, so it survives connection drops, device switches, and agent crashes.&lt;/li&gt;
&lt;li&gt;Moving the session off the connection is a transport-layer swap, not a rewrite — the agent code, model integration, and prompt harness stay the same.&lt;/li&gt;
&lt;li&gt;Once the session lives in the right place, multiple features fall out of the same architecture: multi-tab and multi-device sync, bidirectional agent control, human-AI handoffs with full context, and concurrent subagent coordination.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What users are experiencing
&lt;/h2&gt;

&lt;p&gt;As a user of AI applications, you've probably seen some or all of these failure modes. Niggles that were tolerable when AI experiences were novel are becoming infuriating because they crop up in products we use every day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dropped sessions.&lt;/strong&gt; Something breaks the connection between the user and the agent, leaving the user with no response and sometimes not even the prompt they typed. Connection drops can come from everywhere: network changes, a laptop going to sleep, corporate proxies closing idle streaming connections. None of these are unusual — if you want a product that feels reliable, you can't treat the routine case as an edge case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silent agent failure.&lt;/strong&gt; Something goes wrong on the agent side and the user has no feedback signal. Do they wait? Hit cancel? Start again? A user who can't tell whether the product is working will assume it's broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context loss at handoff.&lt;/strong&gt; The chatbot reaches the limit of what it can handle and passes the conversation to a human support agent. The human has no context, so the user has to repeat everything. In a support context, where the user has usually run out of patience by the time they reach a human at all, this is one of the fastest ways to lose their trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Device lock-in.&lt;/strong&gt; People expect to start something on their phone and finish it on their laptop, or have a tab they left open reflect what they just did somewhere else. This is how every other cloud product works — it's a jarring gap when an AI product doesn't behave the same way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No control during generation.&lt;/strong&gt; Some teams have disabled user input during model generation entirely because handling interruptions was too hard. The user is left watching the agent go off track, burning tokens to produce a wrong answer, with no way to stop it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture behind the failures
&lt;/h2&gt;

&lt;p&gt;In most AI products today, the agent and client are connected by a direct pipe — HTTP, SSE streaming, or WebSockets. That pipe is tied to a single request, so the session only exists while those two specific endpoints stay up. If either drops, the session state is gone. Even if the session stays up, the direct pipe architecture means no other client can observe what's happening, and no other agent can join the session.&lt;/p&gt;

&lt;p&gt;The failures above occur because the session is living inside the connection, and connections are not built to be durable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moving to durable sessions
&lt;/h2&gt;

&lt;p&gt;A durable session is a persistent, addressable, shared resource that sits between the agent and client. The connection becomes one way of accessing the session, rather than the session itself — so multiple clients and agents can come and go while the session persists.&lt;/p&gt;

&lt;p&gt;Once the session is a resource in its own right, it can hold a lot more than the conversation transcript. Tool call history can live there, so can presence, so can shared structured state — like which screen the user is currently on or what they last clicked. All of it is kept in sync for anyone connected.&lt;/p&gt;

&lt;p&gt;Using the durable session model unlocks features that would previously have been built as separate development items:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every connected client sees an up-to-date view, including a response that's still streaming in.&lt;/li&gt;
&lt;li&gt;A late joiner gets the same complete, correctly ordered history as everyone else — whether that late joiner is a new tab, a different device, or a human support agent stepping in.&lt;/li&gt;
&lt;li&gt;Subagents can publish straight into the session without everything funnelling through a central orchestrator.&lt;/li&gt;
&lt;li&gt;Interruptions work cleanly because the agent receives an explicit cancel, redirect, or steer signal instead of having to infer intent from the state of a connection.&lt;/li&gt;
&lt;li&gt;Presence enables smarter agent behavior: deprioritize work when the user is away, push a notification when the task completes, and let them know if it has to go offline mid-task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I showed in the demo
&lt;/h2&gt;

&lt;p&gt;The demo was built on the &lt;a href="https://ably.com/docs/ai-transport/getting-started/vercel-ai-sdk" rel="noopener noreferrer"&gt;Vercel AI SDK&lt;/a&gt;, which lets developers supply their own custom transport layer. We replaced the default HTTP/SSE implementation with durable sessions. That was the only change: no additional infrastructure, no Redis, no server-side buffering code.&lt;/p&gt;

&lt;p&gt;With that in place, two tabs stayed in sync for streaming responses — even across network disconnects. A subagent started in one tab showed up in the other and could be cancelled from there. Two concurrent requests were handled by separate subagents publishing directly into the session, with each agent's output grouped cleanly in the conversation rather than interleaved into a single muddled stream. A human support agent joining partway through got the full history straight away with no recap.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Vercel AI SDK integration and session implementation&lt;/a&gt; are in the docs and repos if you want to look at the code directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions from the session
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How do durable sessions relate to durable execution?&lt;/strong&gt; They sit at different layers and solve different halves of the same reliability story. Durable execution — the category Temporal and similar tools occupy — makes your backend workflows crash-proof. Durable sessions make the client-side conversation crash-proof. Your agent can be fully resumable on the backend and still leave users with a broken experience if the connection handling isn't right, so the two are complementary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do Vercel's or Cloudflare's frameworks give me this already?&lt;/strong&gt; Partly. A stable session address means a returning user or restarting agent can navigate back to it, which is useful. What they don't have yet are shared structured state, presence, multi-device sync, and multi-agent coordination in a single session — because those frameworks are built around one workflow at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are there open source options?&lt;/strong&gt; It depends on your architecture. The &lt;a href="https://workflow-sdk.dev/" rel="noopener noreferrer"&gt;Vercel Workflow DevKit&lt;/a&gt; is a good starting point if you're on Vercel. ElectricSQL has a durable sessions concept for local-first apps. Ably isn't open source, but there's a &lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;free tier&lt;/a&gt; with enough usage for meaningful experimentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions to ask about your AI product
&lt;/h2&gt;

&lt;p&gt;If you want to work out whether any of this applies to your own product, three questions to start with:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there already a problem you're not seeing clearly?&lt;/strong&gt; Look at your CSAT themes once you've filtered out complaints that are really about answer quality. Examine how your session lengths are distributed — a lot of short sessions with small gaps between them can mean people losing context and restarting rather than getting anywhere. It's also worth speaking to engineering, because they may already be building pieces of a session layer ad hoc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What do you want to build next?&lt;/strong&gt; Notice which experiences on your roadmap assume multi-device continuity, bidirectional control, or human-AI handoffs — those have a session architecture requirement underneath.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build or buy?&lt;/strong&gt; If you care about behaving consistently across several products, or the maintenance cost of your bespoke session layer is growing, a dedicated platform is worth evaluating.&lt;/p&gt;

&lt;p&gt;If you want to explore further, the &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;docs&lt;/a&gt; are a good place to start. There's also a &lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;free tier&lt;/a&gt; if you want to experiment. And if you'd rather talk through your specific architecture, you can &lt;a href="https://ably.com/contact" rel="noopener noreferrer"&gt;book a call&lt;/a&gt; through the website.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Further reading: &lt;a href="https://ably.com/blog/durable-sessions-infrastructure-layer-ai-agents" rel="noopener noreferrer"&gt;Why we're betting on Durable Sessions&lt;/a&gt; · &lt;a href="https://ably.com/blog/ai-agent-model-is-fine-but-the-session-is-broken" rel="noopener noreferrer"&gt;The model is fine. The session is broken.&lt;/a&gt; · &lt;a href="https://ably.com/blog/durable-ai-infrastructure-sessions-streams-transports-ecosystem" rel="noopener noreferrer"&gt;The Durable Sessions stack is forming&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Have you hit any of these failure modes in production? Curious what your session layer looks like right now.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>architecture</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Your AI agent kept running. Your user's session didn't survive it.</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:18:12 +0000</pubDate>
      <link>https://dev.to/ablyblog/your-ai-agent-kept-running-your-users-session-didnt-survive-it-49m2</link>
      <guid>https://dev.to/ablyblog/your-ai-agent-kept-running-your-users-session-didnt-survive-it-49m2</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; AI agent sessions break in production in four specific ways — and none of them are model problems. This post covers the failure modes, why the standard Redis buffer workaround has sharp edges, and what a proper session layer actually needs to provide.&lt;/p&gt;




&lt;p&gt;Take any AI agent demo from the last six months. It works. Now ship it to real users on real networks, real devices, real attention spans. A meaningful share of those users will never finish their first conversation cleanly. Not because the model gave a bad answer. Because the connection dropped, the tab refreshed, the phone took over from the laptop, or the spinner kept spinning forever.&lt;/p&gt;

&lt;p&gt;We interviewed 38 companies building AI products at scale and evaluated 37 vendors across the AI infrastructure landscape. Almost everyone is hitting the same wall. None of the problems are model problems. And there isn't a layer in the stack today that solves them by default.&lt;/p&gt;

&lt;p&gt;Here's what's actually breaking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four failure modes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Streams break and you lose the live state
&lt;/h3&gt;

&lt;p&gt;HTTP streaming over SSE works fine in development. In production, every hop between server and user has its own timeout:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS ALB&lt;/strong&gt; kills idle connections after &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/edit-load-balancer-attributes.html" rel="noopener noreferrer"&gt;60 seconds by default&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare&lt;/strong&gt; returns a &lt;a href="https://developers.cloudflare.com/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-524/" rel="noopener noreferrer"&gt;524 after ~100 seconds&lt;/a&gt; for proxied origins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Istio/Envoy&lt;/strong&gt; default to a 5-minute stream idle timeout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Corporate proxies&lt;/strong&gt; buffer un-chunked &lt;code&gt;text/event-stream&lt;/code&gt; responses (no Content-Length)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile carriers&lt;/strong&gt; rebind NAT entries on idle TCP flows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browsers&lt;/strong&gt; throttle background tabs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When any of these fire, you can replay completed state from a buffer if you wrote one. The bit the user was actually watching — the live stream — is gone.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99o4r3atbaxrd52e9k00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99o4r3atbaxrd52e9k00.png" alt="Token stream graph showing connection cut at 157 tokens versus resumed stream completing in 340ms" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Sessions belong to the browser tab, not the user
&lt;/h3&gt;

&lt;p&gt;Almost every agent framework is point-to-point: one connection, one device. Switch from laptop to phone, the conversation doesn't follow. Refresh, the live stream is gone.&lt;/p&gt;

&lt;p&gt;This isn't a framework failure — it's an HTTP constraint. Vercel and TanStack have both shipped connection adapter interfaces specifically so a different transport can be plugged in. Of the 37 vendors we evaluated, 32 have no multi-device fan-out for AI sessions at all.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F324bzyb44yyzvruhtb2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F324bzyb44yyzvruhtb2t.png" alt="Laptop starting a session, then phone showing no session versus full history restored" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Users can't interrupt mid-stream
&lt;/h3&gt;

&lt;p&gt;Once an agent starts generating, HTTP gives you no clean way to route a new instruction back to the running agent. The request is in flight. The response stream is one-directional.&lt;/p&gt;

&lt;p&gt;We spoke to one of the largest customer support platforms in the world. They disabled all user input while the agent was responding — handling interruption reliably was technically too difficult with SSE. You've felt this: watching the agent go down the wrong path, unable to stop it.&lt;/p&gt;

&lt;p&gt;Coding agents like Claude Code are the leading indicator. Once users get used to interrupting mid-stream, they'll expect it from every agent product they touch.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Agents fail silently
&lt;/h3&gt;

&lt;p&gt;From the client's perspective, an idle SSE connection looks identical to a dead one. When an agent crashes, stalls, or loses its connection, the client can't tell the difference between a thinking agent, a stalled agent, and a dead agent — three completely different states, indistinguishable on screen.&lt;/p&gt;

&lt;p&gt;33 of the 37 vendors we evaluated have no agent health signal at the infrastructure level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjjphbxe7bpkuuyly44a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjjphbxe7bpkuuyly44a.png" alt="Agent UI showing spinning dots with unknown status versus step-by-step progress" width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What most teams build instead
&lt;/h2&gt;

&lt;p&gt;The pattern is consistent across teams. Engineers add a Redis buffer between agent and client for live stream replay on reconnect. They build polling or queueing so a new instruction can find the right running agent. They add fan-out for multi-device.&lt;/p&gt;

&lt;p&gt;Vercel's lead maintainer put it plainly in a widely referenced GitHub issue: &lt;em&gt;"to solve this we would need to have a channel to the server that allows transporting that information. WebSockets are one option."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A Pydantic AI user on Hacker News: &lt;em&gt;"a lot of glue."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every serious production team independently arrives at the same conclusion: &lt;strong&gt;generation has to be decoupled from delivery.&lt;/strong&gt; Most end up building their own version of the same architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why reconnection alone isn't enough
&lt;/h2&gt;

&lt;p&gt;The instinct is to treat this as a streaming reliability problem — reconnects, timeouts, duplicate tokens. That's part of it, but only part.&lt;/p&gt;

&lt;p&gt;The real category is what we call &lt;strong&gt;durable sessions&lt;/strong&gt;: a persistent, addressable connection between agents and users that outlives any individual connection, device, or participant.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disconnect and reconnect → session is still there&lt;/li&gt;
&lt;li&gt;Switch devices → session follows you
&lt;/li&gt;
&lt;li&gt;Agent crashes and respawns → session survives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is different from durable execution (Temporal makes the backend crash-proof). Durable sessions make the &lt;em&gt;user experience&lt;/em&gt; crash-proof. Both matter. They solve different halves.&lt;/p&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reconnection becomes reattachment&lt;/strong&gt; — client requests from its last serial, gets everything missed in order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Device switch is just another subscriber&lt;/strong&gt; joining the same channel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent coordination&lt;/strong&gt; is fan-in to a shared session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Presence&lt;/strong&gt; lets the agent know when no one's watching (pause expensive work)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Organisation-side handover&lt;/strong&gt; — a supervisor joins a live session on a different device, hours later, with full context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0871j0ulffff2k97z44.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0871j0ulffff2k97z44.png" alt="Architecture diagram showing durable sessions layer connecting users, agent frameworks, and AI models" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this is going
&lt;/h2&gt;

&lt;p&gt;The frontier labs spend tens of millions of engineering dollars building this layer themselves. Everyone else either accepts the broken experience or burns engineering cycles rebuilding fragments of it.&lt;/p&gt;

&lt;p&gt;The delivery problem is where the work is now. The model is fine. The session is what breaks.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; is the session layer for this gap. The &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;docs&lt;/a&gt; go deeper on the session model.&lt;/p&gt;

&lt;p&gt;Which of these failure modes have you hit? Have you disabled user input during agent responses as a workaround?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>webdev</category>
      <category>realtime</category>
    </item>
    <item>
      <title>AI agent streaming in action: barge-in, human handover, and session continuity</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:08:03 +0000</pubDate>
      <link>https://dev.to/ablyblog/ai-agent-streaming-in-action-barge-in-human-handover-and-session-continuity-45g8</link>
      <guid>https://dev.to/ablyblog/ai-agent-streaming-in-action-barge-in-human-handover-and-session-continuity-45g8</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; AI agent streams break in ways most frameworks don't handle: dropped connections, mid-task interruptions, human handovers across devices. This post walks through a live demo of how Ably AI Transport handles all three — barge-in via explicit cancel signals, durable organization-side HITL, and decoupled multi-agent progress via LiveObjects.&lt;/p&gt;




&lt;p&gt;You're mid-conversation with an AI support agent. You've explained the problem, the agent is halfway through a response, and the connection drops. When you reconnect, the response is gone.&lt;/p&gt;

&lt;p&gt;You type the same question again. The agent asks the same clarifying questions again. Three minutes of context, gone. Not because the model forgot it, but because the delivery layer stored nothing.&lt;/p&gt;

&lt;p&gt;Connection drops, page refreshes, and device switches all fail for the same reason: session state lives in the delivery connection, not independently of it. &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; fixes this by storing the session in a channel that outlasts any individual connection. The demo below covers barge-in, human handover, and multi-agent coordination in depth: the primitives most production teams end up building from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connection drops restart most AI streams from scratch. Ably AI Transport buffers session output in the channel, so clients reconnect and catch up without re-running inference.&lt;/li&gt;
&lt;li&gt;Barge-in requires a bi-directional channel. SSE can't distinguish a user interrupt from a network drop; AI Transport delivers cancel and redirect as explicit channel signals the agent acts on.&lt;/li&gt;
&lt;li&gt;Organization-side human handover — where a supervisor joins a live session on a different device hours later — is the HITL case most frameworks leave unsolved. AI Transport's durable session persists the pending approval in channel history until the right person responds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/5xZen_ZOevM"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Mike Christensen (Pub/Sub team lead at Ably) walks through all of these primitives in a live multi-agent holiday planning app. The sections below follow the same chapter structure as the video.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI agent streams break in production
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Connection drops mid-stream.&lt;/strong&gt; Standard HTTP streaming stores no session state server-side. When the connection closes, the tokens generated during the gap disappear: the delivery layer was never asked to hold them. The client reconnects to an empty state and re-prompts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page refresh loses the stream.&lt;/strong&gt; Most AI implementations store token state in the browser: React component state, a JavaScript variable tracking the partial response. When the page reloads, that state is gone. The agent has no awareness that the client disappeared mid-generation, and no mechanism to re-stream output that it already produced.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Device switches lose the session.&lt;/strong&gt; Sessions are tied to connections, and connections are tied to devices. Move from laptop to phone and the conversation doesn't follow. The new device has no path to the session's history.&lt;/p&gt;

&lt;p&gt;All three share the same root cause. Generation state is coupled to a single delivery connection. Decoupling them — by storing the session in a channel that outlasts any individual connection — is what fixes all three at once. For a deeper look at timeout sources and protocol fallback, see &lt;a href="https://ably.com/blog/is-websockets-enough-ai-chat" rel="noopener noreferrer"&gt;Is WebSockets enough for AI chat?&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Ably AI Transport handles connection recovery and session continuity
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Server-side buffering and offset-based replay.&lt;/strong&gt; Every token the agent publishes goes to the session channel as it's generated, regardless of whether the client is connected. On reconnect, AI Transport uses &lt;code&gt;untilAttach&lt;/code&gt; to deliver everything published during the gap, in order, before the live stream resumes. The LLM never re-runs; the client catches up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session on the channel, not the connection.&lt;/strong&gt; The session lives in the channel, not in the connection that opened it. Any device subscribing to the same channel name joins the same session: full conversation history, followed by the live stream from its current position. Two browser tabs, a laptop and a phone, a page reload mid-response: all receive the same unbroken state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Channel history for context.&lt;/strong&gt; When a client has been offline beyond the live recovery window, channel history provides the full conversation. Clients load older messages using &lt;code&gt;view.loadOlder()&lt;/code&gt;, paginating back through the session until they have the full context. For users who are offline entirely, push notifications via FCM, APNs, or Web Push can deliver agent completions when they return. Push notification delivery is currently Partial in the feature set.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=5xZen_ZOevM&amp;amp;t=887s" rel="noopener noreferrer"&gt;In the demo, Mike refreshes the page mid-stream, and the response picks up exactly where it stopped&lt;/a&gt;. Two windows open side by side show the same in-progress response, updating simultaneously.&lt;/p&gt;

&lt;p&gt;Session continuity is the infrastructure layer. What happens on top of it: how users interact with agents in motion, how human operators step in, how multiple agents coordinate, depends on it being in place.&lt;/p&gt;

&lt;p&gt;The next four sections cover the interaction patterns the demo demonstrates: what the user sees while the agent works, how they interrupt or redirect it, how a human operator takes over with full context, and how multiple specialised agents surface progress independently. All four require the session to be live and visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent progress visibility: what the user sees while the agent works
&lt;/h2&gt;

&lt;p&gt;A user can only meaningfully interrupt an agent they can see working. Progress visibility is the prerequisite for both barge-in and human handover. Without visibility, users have no basis for interrupting: they're cancelling a process they can't see, with no information about whether to wait or redirect.&lt;/p&gt;

&lt;p&gt;The demo surfaces four types of progress signal. Token streaming shows what the orchestrator is generating. &lt;a href="https://ably.com/liveobjects" rel="noopener noreferrer"&gt;Ably LiveObjects&lt;/a&gt; carries the structured progress state from each of the three specialist agents: flights, hotels, and activities. Presence shows which agents are active in the session, and task history shows what each has completed.&lt;/p&gt;

&lt;p&gt;Each signal comes from a different source, and each arrives independently. All three specialist agents publish their progress directly, without routing through the orchestrator. So the user sees the live state from each agent simultaneously. Each agent also converts its raw query parameters into natural language using a separate model call: progress cards show "Searching for direct flights on the 14th" rather than a query object. That's what makes barge-in useful. The user's decision to interrupt is based on accurate realtime information, not a stale snapshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Barge-in: how users interrupt and redirect agents mid-response
&lt;/h2&gt;

&lt;p&gt;In Ably's customer discovery research — which Ably's CEO, Matthew O'Riordan, walks through &lt;a href="https://www.youtube.com/watch?v=DSmCCimWmII" rel="noopener noreferrer"&gt;in this talk&lt;/a&gt; — interruption emerged as a critical piece of functionality once teams moved to asynchronous agent experiences. One team disabled user input entirely: with SSE, a user's stop signal looks identical to a network drop, so there was no safe way to act on it.&lt;/p&gt;

&lt;p&gt;AI Transport changes this because the channel is bi-directional. User input arrives as a specific channel event, not a connection side effect, so the agent can act on it reliably while remaining live.&lt;/p&gt;

&lt;p&gt;Two patterns are available, and the choice depends on what you want the user to see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cancel-then-send&lt;/strong&gt; is the more common of the two. Call &lt;code&gt;transport.cancel()&lt;/code&gt; and it publishes an explicit cancel signal on the channel: the server's abort fires, the LLM stream stops, and the turn ends with reason &lt;code&gt;'cancelled'&lt;/code&gt;. The session stays intact and the next message starts a clean turn. In the demo, Mike says "I want to visit a museum" while the activities agent is mid-search, the kind of redirect where there's no value in letting the original task finish. &lt;code&gt;transport.cancel()&lt;/code&gt; fires, the search stops, and the agent starts fresh on the museum query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handleSend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;activeTurns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Send-alongside&lt;/strong&gt; is the alternative. It sends a new message without cancelling the active turn, so both run concurrently: the agent continues the first response while processing the new input. You can cancel a specific turn using &lt;code&gt;transport.cancel({ turnId })&lt;/code&gt; if needed. Send-alongside is appropriate when you want the user to see both responses. For example: a clarifying follow-up while the agent is finishing its response, or a comparison query where both outputs are useful.&lt;/p&gt;

&lt;p&gt;Full API reference: &lt;a href="https://ably.com/docs/ai-transport/features/interruption" rel="noopener noreferrer"&gt;Interruption and barge-in docs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Human-in-the-loop: getting full session context to an operator on any device
&lt;/h2&gt;

&lt;p&gt;Most frameworks implement one variant of human-in-the-loop (HITL) and leave the other unsolved. But the distinction between them matters in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User-side HITL&lt;/strong&gt; is the pattern where the agent pauses and asks the user to approve an action before executing. For example, "Should I book this flight?". The user approves or rejects, and the agent continues. Almost every agent framework has this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Organization-side HITL&lt;/strong&gt; is the harder case. The agent needs to escalate to an internal supervisor: someone who may be on a different device, in a different time zone, and who might not respond for hours. This is the customer support scenario: a human agent takes over mid-conversation, with full context, without the user re-explaining anything. Most frameworks leave this unsolved.&lt;/p&gt;

&lt;p&gt;AI Transport handles both through the same mechanism. The agent defines a tool that pauses for human input rather than executing automatically. When the LLM decides it needs approval and invokes this tool, AI Transport stops the turn and publishes the pending request to the channel as a durable message.&lt;/p&gt;

&lt;p&gt;Any connected client sees it and can resolve it by calling &lt;code&gt;view.update()&lt;/code&gt;. A supervisor joining on a different device hours later sees the same pending request in channel history.&lt;/p&gt;

&lt;p&gt;The approval is a durable channel message, not a live server process waiting to time out. Calling &lt;code&gt;view.update()&lt;/code&gt; triggers a continuation turn, and the agent picks up where it paused.&lt;/p&gt;

&lt;p&gt;Organization-side escalation is available today; implementation guides are being finalized.&lt;/p&gt;

&lt;p&gt;Full implementation detail: &lt;a href="https://ably.com/docs/ai-transport/messaging/human-in-the-loop" rel="noopener noreferrer"&gt;Human-in-the-loop docs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-agent coordination and shared state via Ably LiveObjects
&lt;/h2&gt;

&lt;p&gt;Routing all agent activity through a central orchestrator creates a bottleneck. Every progress update has to pass through the coordinator before it appears to the user. At the scale of a multi-step, multi-agent workflow, that lag accumulates.&lt;/p&gt;

&lt;p&gt;This demo takes a different approach. The orchestrator delegates to three specialist agents: flights, hotels, and activities, all running concurrently. Each specialist publishes its progress directly to Ably LiveObjects — bypassing the orchestrator entirely for user-facing updates.&lt;/p&gt;

&lt;p&gt;The orchestrator waits for final results. The user sees live progress bars from all three agents updating in realtime, independently.&lt;/p&gt;

&lt;p&gt;LiveObjects carries more than progress signals. User selections (flight, hotel, and activities choices) are written to LiveObjects state the moment the user makes a choice. When the user later asks "What's my current itinerary?", the orchestrator reads directly from LiveObjects rather than reconstructing context from chat history. If the user deleted a selection outside the chat thread, the agent sees that immediately. The conversation is one interface to the system; the source of truth is the state.&lt;/p&gt;

&lt;p&gt;This matters because the user-facing update rate is decoupled from the orchestrator's coordination cycle. Each agent surfaces progress as fast as it produces it, with no relay step in between.&lt;/p&gt;

&lt;p&gt;Presence adds a further signal: agents can check whether the user is actually connected before streaming. An agent completing a search while the user is offline can push a notification rather than stream into a disconnected channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LiveObjects availability:&lt;/strong&gt; GA in JavaScript. Experimental in Swift and Java.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/liveobjects" rel="noopener noreferrer"&gt;Learn more about Ably LiveObjects&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Session continuity, barge-in, and human handover aren't features that sit on top of an AI stack. They're properties of the delivery layer underneath it. The session channel is what makes them composable: the same mechanism that replays tokens on reconnect makes a pending approval durable, and lets a supervisor join a live conversation hours after it started. Most teams reach for these patterns eventually. The question is whether you build them yourself or start with infrastructure that already has them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport documentation&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Which of these production problems have you hit building AI agents? Have you had to disable user input during agent responses as a workaround? Curious what the reconnection side looked like before having a session layer.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>websockets</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>LiveObjects now available: shared state without the infrastructure overhead</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 05 May 2026 08:48:23 +0000</pubDate>
      <link>https://dev.to/ablyblog/liveobjects-now-available-shared-state-without-the-infrastructure-overhead-hin</link>
      <guid>https://dev.to/ablyblog/liveobjects-now-available-shared-state-without-the-infrastructure-overhead-hin</guid>
      <description>&lt;p&gt;Shared state is a hard problem. Not hard in the abstract, computer-science sense (the concepts are well understood). Hard in the &lt;em&gt;someone has to actually build this&lt;/em&gt; sense, where every team that wants a live leaderboard, a shared config panel, or a poll that updates in real time ends up reinventing the same wheels: conflict resolution, reconnection handling, state recovery.&lt;/p&gt;

&lt;p&gt;Most teams do not want to spend their time building and maintaining that layer. They want to ship the feature that depends on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That is what &lt;a href="https://ably.com/docs/liveobjects" rel="noopener noreferrer"&gt;LiveObjects&lt;/a&gt; is for.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From experimental to production-ready
&lt;/h2&gt;

&lt;p&gt;When we first shipped LiveObjects, the API was explicitly experimental. We had the primitives (LiveMap for synchronized key-value state, LiveCounter for distributed counting) but the ergonomics needed work. Early adopters were clear: working directly with object instances felt brittle, especially when objects were replaced. Subscriptions broke. Navigating nested structures was cumbersome. The mental model didn't fit how people actually wanted to build.&lt;/p&gt;

&lt;p&gt;So we rebuilt the API from the ground up. The result shipped in the JavaScript SDK before the end of last year, moving LiveObjects into Public Preview. It's centered on path-based operations. Instead of binding to specific object instances, you work with PathObjects that resolve at runtime against whatever exists at that location. Replace the object underneath, and your subscriptions follow automatically.&lt;/p&gt;

&lt;p&gt;That feedback loop, from experimental signal to a redesigned API, is what today's release reflects. The API is stable and ready for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A new API designed around how you think about state
&lt;/h3&gt;

&lt;p&gt;The old approach required holding references to specific object instances, which meant reasoning about object identity rather than the shape of your data. The new PathObject API flips this: you describe a path, and the SDK handles the rest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;Ably&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ably&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LiveObjects&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LiveMap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LiveCounter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ably/liveobjects&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Ably&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Realtime&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LiveObjects&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;game:room-42&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;modes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OBJECT_SUBSCRIBE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OBJECT_PUBLISH&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;leaderboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LiveMap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;alice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LiveCounter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;bob&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LiveCounter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
  &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;round&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LiveCounter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;leaderboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;renderLeaderboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;leaderboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Subscriptions now observe paths rather than instances. If the underlying object is replaced, your subscription keeps working. No rewiring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Object resets&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the most requested features: reset an object to a clean state without tearing down and recreating the channel. Previously, the workaround was destroying the channel entirely, which forced clients to reconnect, reattach subscribers, re-establish presence, and race the teardown. Object resets remove all of that. Useful for a new game round, a cleared poll, or a reset config without losing connection state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliable data expiry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;State now expires reliably after 90 days by default. Previously this was best-effort. If you're building anything with ephemeral sessions or time-bounded content, you can depend on this rather than writing your own cleanup logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revised object limits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 100-object-per-channel limit has been revised to apply sensibly to top-level objects. Applications with nested structures can model data naturally without counting objects or designing workarounds to stay under the limit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easier map handling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.compact()&lt;/code&gt; and &lt;code&gt;.compactJson()&lt;/code&gt; convert any LiveMap tree to a plain JavaScript object in one call, useful for rendering, serialization, or passing state to code that doesn't know about LiveObjects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;leaderboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// { alice: 120, bob: 95 }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What you can build
&lt;/h2&gt;

&lt;p&gt;Live polls, leaderboards, collaborative forms, shared dashboards: any feature where multiple clients write to the same state and see each other's changes immediately. LiveObjects handles these well. But the use case we're focused on most right now is AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State sync for AI sessions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents need to share context. Not respond to a single request and forget it, but maintain a live picture of what's happening: what the user is working on, what tasks are in progress, what the session looks like.&lt;/p&gt;

&lt;p&gt;The naive approach is polling or rebuilding session context on every request. That works until it doesn't. Agents diverge, state drifts, and the coordination layer becomes the thing your team maintains instead of the product.&lt;/p&gt;

&lt;p&gt;LiveObjects is a cleaner mechanism. Multiple clients and agents read and write shared state simultaneously, conflicts are resolved automatically, and every subscriber sees updates the moment they land.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;session&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;current_task&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarizing document&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;progress&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LiveCounter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;context&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LiveMap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;page_title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Q3 Report&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;selected_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Revenue grew 24% YoY...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;session&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;updateAgentStatusPanel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're building AI applications and using Ably for token streaming, LiveObjects handles the state layer: what the model is working with, what it's doing, and what users can steer in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multiple SDKs, production-ready
&lt;/h2&gt;

&lt;p&gt;LiveObjects is available in JavaScript today, with Swift and Java coming in the weeks that follow. Other SDKs are available now via inband objects and the REST API for platforms without a native client yet.&lt;/p&gt;

&lt;p&gt;We're making a stability commitment for each SDK when it reaches the bar, not flipping a global flag while only one runtime is actually ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;The LiveObjects plugin ships as part of the standard SDK.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;npm install ably
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://ably.com/docs" rel="noopener noreferrer"&gt;LiveObjects docs&lt;/a&gt; have quick-start guides and a migration guide if you're upgrading from the experimental API. If you're building AI applications, the &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;AI Transport docs&lt;/a&gt; cover how LiveObjects fits into the state sync layer.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>distributedsystems</category>
      <category>news</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Multi-device AI session continuity: how cross-device conversation sync works</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 05 May 2026 08:29:44 +0000</pubDate>
      <link>https://dev.to/ablyblog/multi-device-ai-session-continuity-how-cross-device-conversation-sync-works-2i1d</link>
      <guid>https://dev.to/ablyblog/multi-device-ai-session-continuity-how-cross-device-conversation-sync-works-2i1d</guid>
      <description>&lt;p&gt;&lt;strong&gt;Written by Amber Dawson&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You start a research task on your laptop, the network drops during a meeting, and when you open your phone to continue, the conversation is gone – you re-prompt, get partial duplicate results, and lose 30 minutes of work. The delivery layer dropped it. That's one of the most consistent problems teams hit when building AI applications.&lt;/p&gt;

&lt;p&gt;It's particularly acute in customer support, where a session belongs to the conversation - not to any single device, connection, or participant. An AI agent handles a query, the user switches from desktop to mobile mid-interaction, a human needs to step in. Every one of those transitions is a point where the session can silently break.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this breaks
&lt;/h2&gt;

&lt;p&gt;HTTP streaming is stateless. Each connection is independent, tied to a specific device and browser session, so when the user switches devices, refreshes, or loses connectivity, the new device has no position in the stream. It doesn't know which tokens the previous device received, it can't resume mid-response, and it starts over.&lt;/p&gt;

&lt;p&gt;There's no shared state across connections. Device B has no visibility into what Device A received, and without session tracking built into the architecture, the server treats each connection as a new actor. A stateless delivery layer wasn't designed for conversations that span sessions, devices, or time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhqad63sn462yjv6ln78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhqad63sn462yjv6ln78.png" alt=" " width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks in production
&lt;/h2&gt;

&lt;p&gt;Teams building multi-device AI experiences without dedicated infrastructure hit the same set of edge cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lost responses.&lt;/strong&gt; The model finished generating while the user was offline or mid-switch. Nobody saw the output. The compute was wasted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duplicate effort.&lt;/strong&gt; The user doesn't know if the previous session completed, so they re-prompt. You pay for the same response twice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State conflicts.&lt;/strong&gt; A new prompt arrives on the phone while the laptop tab still shows an incomplete response. Which version is canonical? The server doesn't know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile-specific failures.&lt;/strong&gt; iOS and Android background apps aggressively drop connections. WiFi-to-cellular handoffs are frequent. A conversation that works fine on desktop will fall apart on mobile without explicit reconnection and resume handling.&lt;/p&gt;

&lt;p&gt;These failures don't show up in demos. They appear in production, under real network conditions, with real users – and they erode trust quickly because AI conversations often carry context the user spent time building.&lt;/p&gt;

&lt;h2&gt;
  
  
  What most teams build first
&lt;/h2&gt;

&lt;p&gt;The standard workaround is a Redis buffer between the AI backend and the client. It handles full page reloads reasonably well. It doesn't handle tab switches. It breaks on mobile backgrounding. And it has no path for multi-device delivery – the session state is scoped to one client, not to the user.&lt;/p&gt;

&lt;p&gt;Every serious production team discovers this wall independently and ends up engineering some version of the same architecture. Vercel's own lead maintainer acknowledged the gap directly: "to solve this we would need to have a channel to the server that allows transporting that information. WebSockets are one option." That's the right diagnosis. The Redis buffer is an approximation of the real fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural shift: state lives in the channel, not the connection
&lt;/h2&gt;

&lt;p&gt;The underlying problem is that session state is coupled to the connection. The fix is decoupling them.&lt;/p&gt;

&lt;p&gt;Instead of streaming directly over an HTTP connection, the server publishes messages to a channel. Any device subscribing to that channel receives the same messages. The state is in the channel. The connection is the transport, nothing more.&lt;/p&gt;

&lt;p&gt;This is the foundation of what's increasingly called a durable session – a persistent, addressable session between agents and users that outlives any single connection, device, or participant. Durable execution makes the backend crash-proof; durable sessions makes the experience crash-proof. They sit on opposite sides of the agent and complement each other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnddrlfjk81uvid2fjpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnddrlfjk81uvid2fjpz.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In practice this changes the behavior fundamentally. Any device can join – same browser tab, phone, or tablet. Subscribing to the channel gives that device access to the conversation. Reconnection becomes catch-up rather than restart: &lt;a href="https://ably.com/docs/storage-history/history" rel="noopener noreferrer"&gt;channels persist message history&lt;/a&gt;, and when a device reconnects, it replays what it missed and transitions to live delivery. From the user's perspective, they pick up where they left off.&lt;/p&gt;

&lt;p&gt;Conflicts route through the server. User actions – sending prompts, interrupting, deleting messages – go to the server, which publishes the authoritative result to the channel. All devices receive the same update. There's no client-side state to reconcile.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the transport layer has to handle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Identity-aware fan-out.&lt;/strong&gt; The system needs to recognize all active sessions associated with a single user and propagate updates across all of them. When a user sends a message on one device, every other active device should reflect the change immediately. This requires mapping user identity to active connections at the infrastructure level, not the application layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ordering and &lt;a href="https://ably.com/docs/connect/states" rel="noopener noreferrer"&gt;session recovery&lt;/a&gt;.&lt;/strong&gt; If the connection drops – from a device switch, a network blip, or a page refresh – the user shouldn't lose messages or see them out of sequence. A well-designed transport layer replays missed events and keeps message sequences intact. History loads first, then the live stream resumes. The client doesn't need to manage the transition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token stream compaction.&lt;/strong&gt; Replaying thousands of individual tokens to a reconnecting device is wasteful. A better pattern compacts token streams into complete responses in channel history: one message per AI response, not hundreds of tokens. New devices load the complete response instantly, then receive new tokens for any in-progress generation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkct1pegstdtpxtgz2bue.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkct1pegstdtpxtgz2bue.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ably.com/docs/presence-occupancy/presence" rel="noopener noreferrer"&gt;Presence tracking&lt;/a&gt;.&lt;/strong&gt; The backend needs to know which devices are currently active. This matters for more than UX. Should the model keep streaming if the user closed the tab? Should a background task escalate if all devices have disconnected? Presence answers these questions from a live membership set rather than polling or timeout heuristics. Without it, systems rely on assumptions that produce missed interactions, wasted compute, and handoffs that arrive too late.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence-aware cost controls.&lt;/strong&gt; AI agents can quietly generate output that delivers no value but incurs real cost – streaming to an empty room, running tool calls after the user navigates away. Tying agent activity to presence means the infrastructure pauses or deprioritizes automatically when no devices are engaged and resumes when they return. Costs scale with actual usage, not connection count.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mobile is the hardest case
&lt;/h2&gt;

&lt;p&gt;Mobile devices are the toughest environment for connection continuity.&lt;/p&gt;

&lt;p&gt;Network instability is constant – WiFi-to-cellular handoffs, tunnel blackouts, dead zones. Resume capability isn't optional. Apps get backgrounded aggressively, so the model might finish generating while the app is suspended, and when the user returns they should see the completed response, not an empty screen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/push" rel="noopener noreferrer"&gt;Push notifications&lt;/a&gt; bridge the gap. When significant events occur while the app is backgrounded – task complete, human takeover required – notifications alert the user and deep-link directly to the conversation. The payload should carry enough context for the app to restore state without a full reload. Push notification infrastructure (FCM, APNs, Web Push) ships as a supported capability; AI-specific end-to-end delivery patterns are still being documented, so implementation details vary by platform.&lt;/p&gt;

&lt;p&gt;Battery is also a real constraint. Holding open WebSocket connections when the app is backgrounded drains battery, so intelligent reconnection strategies close connections when backgrounded, reconnect on foreground, and use push notifications to trigger reconnects for important updates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kwk36pddse08v611m10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kwk36pddse08v611m10.png" alt=" " width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Gen-1 AI vs Gen-2 AI: the real decision
&lt;/h2&gt;

&lt;p&gt;Not every AI application needs cross-device support. HTTP streaming works well for Gen-1 AI products – a user sends a prompt, the model returns a response, the interaction is complete. Single session, single device, seconds to complete. For that use case, HTTP streaming is the right call.&lt;/p&gt;

&lt;p&gt;Gen-2 AI products look structurally different. Sessions last minutes or hours. Agents make tool calls mid-conversation, coordinate with other agents, and run tasks in the background while the user is elsewhere. Humans need to step in – approving actions, taking over from an agent that has reached its limits, handing control back. Users move between devices and expect the conversation to follow them.&lt;/p&gt;

&lt;p&gt;The question isn't whether your architecture is complex. It's which generation of product you're building. If sessions outlive a single connection, if users will move between devices, if a human might need to join a running conversation – channel-based architecture is the right call. 32 of 37 vendors evaluated have no multi-device fan-out capability at all, which means most teams building Gen-2 products are either rebuilding this layer from scratch or shipping without it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this makes possible
&lt;/h2&gt;

&lt;p&gt;Channel-based sessions change what teams can build. A user starts a complex analysis on their phone during a commute, continues on their laptop at the office, and receives a push notification when the background task completes. In a customer support workflow, an AI agent handles a query, the conversation follows the user from desktop to mobile mid-interaction, and a human operator can step in on any device with full session context intact – then hand control back to the agent when they're done.&lt;/p&gt;

&lt;p&gt;Users already expect this from messaging applications. AI conversations are next.&lt;/p&gt;

&lt;p&gt;The infrastructure decision is whether to build session synchronization yourself or use systems designed for it. Building it means pub/sub channels, &lt;a href="https://ably.com/docs/storage-history/storage" rel="noopener noreferrer"&gt;message persistence with configurable retention&lt;/a&gt;, client SDKs that handle subscription and &lt;a href="https://ably.com/docs/storage-history/history" rel="noopener noreferrer"&gt;history replay&lt;/a&gt;, &lt;a href="https://ably.com/docs/presence-occupancy/presence" rel="noopener noreferrer"&gt;presence tracking&lt;/a&gt;, mobile SDKs with background handling, &lt;a href="https://ably.com/docs/push" rel="noopener noreferrer"&gt;push notification support&lt;/a&gt;, and identity-scoped authorization. That's weeks to months of engineering, and the edge cases don't appear until production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; implements this model – the docs on &lt;a href="https://ably.com/docs/storage-history/history" rel="noopener noreferrer"&gt;channel history&lt;/a&gt; and &lt;a href="https://ably.com/docs/connect/states" rel="noopener noreferrer"&gt;connection state recovery&lt;/a&gt; cover what the infrastructure layer needs to handle in detail.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Why we're betting on Durable Sessions</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 27 Apr 2026 11:45:08 +0000</pubDate>
      <link>https://dev.to/ablyblog/why-were-betting-on-durable-sessions-2gck</link>
      <guid>https://dev.to/ablyblog/why-were-betting-on-durable-sessions-2gck</guid>
      <description>&lt;p&gt;Written by Matthew O'Riordan&lt;/p&gt;

&lt;p&gt;Over the past year, I've spoken to more than 40 engineering teams building production AI agents. Different companies, different frameworks, different use cases. The same conversation kept happening.&lt;/p&gt;

&lt;p&gt;"Our streams break when users switch tabs." "We can't tell if the agent crashed or is still thinking." "We built a custom reconnection layer and it took three months." "Our users can't switch from laptop to phone mid-conversation." Every team described it differently, but they were all describing the same gap. Between the agent and the user, there's no dedicated infrastructure for the session itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjptaq1of0vdam8d41fi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjptaq1of0vdam8d41fi.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We backed this up with research across 37 AI infrastructure platforms, hundreds of GitHub issues and community threads, and 40+ customer discovery calls. 35 of those 37 platforms have no stream resumption after a disconnect. 33 have no way to detect an agent crash. The gap is universal, and the framework maintainers know it. Vercel built a &lt;a href="https://vercel.com/blog/ai-sdk-5" rel="noopener noreferrer"&gt;pluggable ChatTransport in AI SDK&lt;/a&gt; 5 so developers can bring their own transport. TanStack AI shipped a &lt;a href="https://tanstack.com/ai/latest/docs/guides/connection-adapters" rel="noopener noreferrer"&gt;ConnectionAdapter&lt;/a&gt; for third-party providers. They've diagnosed the problem and built the plugin points. They're waiting for specialist infrastructure to show up.&lt;/p&gt;

&lt;p&gt;Nobody did anything wrong. Everyone focused on the right thing first: the intelligence, the orchestration, the models. But as AI experiences have gotten more sophisticated, the transport layer between the agent and the user has become the constraint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents are becoming human-like, and they need human infrastructure
&lt;/h2&gt;

&lt;p&gt;The insight that changed how we think about this came from an unexpected direction. As agents get more sophisticated, they start behaving like human participants in a conversation. They think for a while before responding. They work on tasks in the background. They hand off to a human colleague when they hit their limits. Users walk away, come back later, and expect to pick up exactly where things were.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfrfutaduy9xfk2hjg38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfrfutaduy9xfk2hjg38.png" alt=" " width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These are the exact communication challenges we've been solving for human-to-human interaction for 10 years. Presence, reliable delivery, session continuity across devices, bidirectional control. Every messaging app since WhatsApp has solved these problems for humans, and the moment agents become participants in conversations, they need the same infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  We've been building this for a decade
&lt;/h2&gt;

&lt;p&gt;I'll be honest. We almost dismissed the AI space entirely. When every company suddenly needed an "AI strategy," my instinct was skepticism. We're an infrastructure company. We process trillions of transactions across billions of devices. Why would we need an AI-specific product?&lt;/p&gt;

&lt;p&gt;I was wrong. Companies like Intercom and HubSpot were already building AI agent experiences on top of our Pub/Sub messaging infrastructure, the realtime layer that handles reliable delivery between servers, devices, and services. They needed ordered delivery, presence, session state, multi-device support. They were using the infrastructure we'd already built, without waiting for us to package it as an AI product.&lt;/p&gt;

&lt;p&gt;Ably has been a durable session layer for 10 years. We never called it that because the term didn't exist. We called it realtime infrastructure, messaging, pub/sub. But the capabilities are the same. Persistent sessions that survive disconnects. Ordered delivery with automatic catch-up. Multi-device fan-out, presence, bidirectional communication. We built all of this for human communication at scale, and it turns out it's exactly what AI-to-human communication needs too.&lt;/p&gt;

&lt;h2&gt;
  
  
  A category is forming
&lt;/h2&gt;

&lt;p&gt;We're not inventing this term. We're recognizing something that's already happening.&lt;/p&gt;

&lt;p&gt;ElectricSQL published a &lt;a href="https://electric-sql.com/blog/2026/01/12/durable-sessions-for-collaborative-ai" rel="noopener noreferrer"&gt;"Durable Sessions" blog post&lt;/a&gt; earlier this year defining it as a pattern for collaborative AI. EMQX has used "Durable Sessions" as &lt;a href="https://docs.emqx.com/en/emqx/latest/durability/durability_introduction.html" rel="noopener noreferrer"&gt;a named feature in their MQTT&lt;/a&gt; broker for years. Convex is building agent components with persistent threads and durable workflows. Vercel is building a DurableAgent class. At least 12 companies are converging on the same problem space from different angles.&lt;/p&gt;

&lt;p&gt;The pattern mirrors Durable Execution. Temporal existed before AI agents needed it, then suddenly every team building production agents needed backend workflows that couldn't fail. Temporal went from niche to &lt;a href="https://temporal.io/blog/temporal-raises-usd300m-series-d-at-a-usd5b-valuation" rel="noopener noreferrer"&gt;a $5 billion valuation.&lt;/a&gt; AWS adopted the term for Lambda Durable Functions. The category debate was over.&lt;/p&gt;

&lt;p&gt;Durable Execution made the backend crash-proof. Durable Sessions makes the experience crash-proof. They're complementary layers on opposite sides of the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  This needs to be bigger than Ably
&lt;/h2&gt;

&lt;p&gt;A category with one company in it isn't a category. It's a product pitch. We want other companies in this space. We want developers to recognize "durable sessions" as an infrastructure layer they need, regardless of who provides it.&lt;/p&gt;

&lt;p&gt;We've published &lt;a href="https://durablesessions.ai/" rel="noopener noreferrer"&gt;durablesessions.ai&lt;/a&gt; as a community resource that defines the concept, documents vendor convergence, and tracks how the ecosystem is forming. I'm personally committed to pushing this forward. Not because it helps Ably specifically, but because I believe it will improve how we all build and experience AI. I've been doing this for a long time and I've never been more energized about what's ahead.&lt;/p&gt;

&lt;p&gt;If you're at AI Engineer Europe next week, our tech lead will be presenting on durable sessions and why this layer matters. I'll be there too. Come find me and the team. If you're building in this space, whether as a competitor, a complement, or a fellow traveler, I want to talk. Getting the people working on this in the same room, having honest conversations about what developers actually need, is worth more than any blog post.&lt;/p&gt;

&lt;p&gt;This is the first in a series. Over the coming weeks, we'll go deeper into the evidence, the ecosystem, and the practical framework for evaluating what your AI sessions actually need. Follow along here or connect with me on &lt;a href="https://www.linkedin.com/in/mattoriordan" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Why AI agents need a transport layer: Solving the realtime sync problem</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 27 Apr 2026 11:37:29 +0000</pubDate>
      <link>https://dev.to/ablyblog/why-ai-agents-need-a-transport-layer-solving-the-realtime-sync-problem-k4o</link>
      <guid>https://dev.to/ablyblog/why-ai-agents-need-a-transport-layer-solving-the-realtime-sync-problem-k4o</guid>
      <description>&lt;p&gt;Building AI agents that work reliably in production requires solving problems that have nothing to do with AI. While teams focus on prompt engineering, model selection, and agent orchestration, a different class of challenges emerges at deployment. These have little to do with LLMs and everything to do with keeping agents and clients synchronized in realtime.&lt;/p&gt;

&lt;p&gt;Over the past few months, we've spoken with engineers at over 40 companies building AI assistants, copilots, and agentic workflows. The same infrastructure problems surfaced repeatedly – problems with distributed systems, not models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The infrastructure gap in AI applications
&lt;/h2&gt;

&lt;p&gt;When you're building an AI agent that streams responses to users, you're not just building an AI system. You're building a distributed realtime application where state needs to stay synchronized across components that connect, disconnect, and reconnect unpredictably.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4nlrbmq897i379jipft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4nlrbmq897i379jipft.png" alt=" " width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These are the technical challenges that came up consistently:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection management at scale.&lt;/strong&gt; Managing WebSocket or SSE connections between agents and clients becomes complex quickly. Connections drop during mobile network handoffs, page refreshes, and tab switches. Each disconnection requires handling buffering, replay logic, and state reconciliation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client-specific state tracking.&lt;/strong&gt; Agents need to track what each individual client has received, across multiple devices and multiple users. When a client reconnects, the agent must determine exactly which messages they missed and replay only those, without gaps or duplicates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distributed agent routing.&lt;/strong&gt; In distributed deployments, reconnecting clients need to reach the correct agent instance. This gets harder still with durable execution patterns, where agent state persists but the instance handling it may change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continuity between historical and live data.&lt;/strong&gt; Clients loading a conversation need continuity between historical messages and live streaming responses. Gaps in this transition break the user experience.&lt;/p&gt;

&lt;p&gt;What teams actually wanted wasn't complicated: token streams that survive network interruptions, conversations that work across device switches, multi-user sessions that stay synchronized, and long-running agent work that continues when users go offline.&lt;/p&gt;

&lt;p&gt;These requirements describe a transport layer problem – the infrastructure between agents and clients that handles delivery, synchronization, and state management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vwl8ddrhluaw16punj4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vwl8ddrhluaw16punj4.png" alt=" " width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical patterns for AI workloads
&lt;/h2&gt;

&lt;p&gt;Several technical patterns emerged from observing how teams build AI applications on top of pub/sub infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport/token-streaming" rel="noopener noreferrer"&gt;&lt;strong&gt;Token streaming&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;with message appends.&lt;/strong&gt; LLMs stream tokens individually, but storing thousands of separate token messages per response creates inefficient channel history. Loading a conversation would require replaying thousands of individual tokens.&lt;/p&gt;

&lt;p&gt;The solution is a message append operation: publish an initial message, then append subsequent tokens to it by referencing the message serial. Clients joining mid-stream receive the complete response so far in a single update, then receive subsequent appends. Channel history contains one compacted message per AI response rather than thousands of token fragments.&lt;/p&gt;

&lt;p&gt;Server-side rollups batch appends within a configurable time window (default 40ms) to stay within rate limits while maintaining smooth streaming UX. This handles the impedance mismatch between token-by-token streaming from models and efficient message storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Annotations for citations.&lt;/strong&gt; AI responses that reference external sources need citation metadata attached without modifying the response content itself. Publishing &lt;a href="https://ably.com/docs/ai-transport/messaging/citations" rel="noopener noreferrer"&gt;citations as annotations&lt;/a&gt; – metadata referencing a message serial – keeps the response clean while enabling rich client-side rendering.&lt;/p&gt;

&lt;p&gt;Annotations include a type (e.g., citations:multiple.v1) and arbitrary data: URLs, titles, character offsets for inline citation markers. The transport aggregates annotations automatically – clients receive a summary ("3 citations from wikipedia.org, 2 from nasa.gov") rather than processing every individual event.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Messaging patterns for agentic workflows.&lt;/strong&gt; The bi-directional nature of channels enables several agent interaction patterns:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport/messaging/tool-calls" rel="noopener noreferrer"&gt;Tool calls&lt;/a&gt;: Agents publish tool invocations with a toolCallId for correlation. Clients can render generative UI (displaying a weather card when get_weather is invoked) or execute client-side tools (agent requests GPS location, client executes locally and publishes the result back).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport/messaging/human-in-the-loop" rel="noopener noreferrer"&gt;Human-in-the-loop&lt;/a&gt;: Agents publish approval requests. Authorized users review and respond over the same channel. The agent verifies the approver's clientId or userClaim before executing sensitive operations. The request-response pattern fits naturally into bi-directional channels.&lt;/p&gt;

&lt;p&gt;Chain-of-thought streaming: Streaming reasoning alongside output can happen inline (single channel, distinguished by message name) or threaded (separate reasoning channel per response, subscribed to on demand to reduce bandwidth).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5klq3s33z7o0zxao4x3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5klq3s33z7o0zxao4x3.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for production AI
&lt;/h2&gt;

&lt;p&gt;The gap between prototype and production AI isn't primarily about model capabilities. It's about infrastructure that handles the messy realities of distributed systems: network interruptions, device switches, concurrent users, and agent failures.&lt;/p&gt;

&lt;p&gt;When agents and clients communicate through a proper transport layer rather than direct connections, entire classes of complexity disappear. Agents don't track connection state. Reconnection logic isn't custom code in every agent. &lt;a href="https://ably.com/blog/cross-device-ai-sync" rel="noopener noreferrer"&gt;Multi-device support&lt;/a&gt; isn't a feature you build, it's a property of the architecture.&lt;/p&gt;

&lt;p&gt;The interesting problems in AI infrastructure aren't always where you expect them. Sometimes the hard part isn't the AI – it's keeping everything synchronized.&lt;/p&gt;

&lt;p&gt;Ready to build resilient AI applications? Explore the &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;AI Transport documentation&lt;/a&gt; for implementation patterns, code examples, and architectural guidance.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>The missing transport layer in user-facing AI applications</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 13 Apr 2026 14:17:39 +0000</pubDate>
      <link>https://dev.to/ablyblog/the-missing-transport-layer-in-user-facing-ai-applications-3j90</link>
      <guid>https://dev.to/ablyblog/the-missing-transport-layer-in-user-facing-ai-applications-3j90</guid>
      <description>&lt;p&gt;Most AI applications start the same way: wire up an LLM, stream tokens to the browser, ship. That works for simple request-response. It breaks when sessions outlast a connection, when users switch devices, or when an agent needs to hand off to a human.&lt;/p&gt;

&lt;p&gt;The cracks appear in the delivery layer, not the model. Every serious production team discovers this independently and builds their own workaround. Those workarounds don't hold once users start hitting them in production.&lt;/p&gt;

&lt;p&gt;Here's what breaks, and what the transport layer needs to handle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift that creates the problem
&lt;/h2&gt;

&lt;p&gt;Simple AI applications are synchronous. User sends a message, model returns a response, done. A dropped connection restarts cleanly.&lt;/p&gt;

&lt;p&gt;Agentic applications aren't like that. They run in a loop: perceive the user's intent, reason with the model, act by calling tools or sub-agents, and observe the result. Then they go around again until the task is done.&lt;/p&gt;

&lt;p&gt;A research agent might loop a dozen times over several minutes, calling APIs and querying databases. The user is present throughout, watching, waiting, potentially needing to redirect. The connection might drop mid-loop, the user might switch devices, or they realize mid-stream the agent is heading the wrong way.&lt;/p&gt;

&lt;p&gt;That's a different problem, and one HTTP streaming wasn't designed to solve. The backend surviving and the session surviving are two different things. What's missing is a layer that treats the conversation as durable state: persisting across connections, devices, and participants.&lt;/p&gt;

&lt;p&gt;Durable execution makes the backend crash-proof. Durable sessions makes what the user actually sees crash-proof. Most teams building agentic products need both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks in production
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tokens disappear and reconnects corrupt state.&lt;/strong&gt; HTTP streaming delivers tokens once. A dropped connection loses them. Most workarounds handle full page reloads but not tab switches or mobile backgrounding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2odrarurbi84dlej26kt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2odrarurbi84dlej26kt.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Worse, naive reconnect implementations replay the same output and produce duplicates: fragments, repeated tokens, or an interface in an indeterminate state. The Vercel AI SDK makes the tradeoff explicit: its resume and stop features are incompatible. You can resume a dropped stream or cancel it, but not both. &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;A full breakdown of what resumable streaming requires at the infrastructure level is here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Users can't see what the agent is doing.&lt;/strong&gt; The agent is running tool calls, checking backend systems, orchestrating sub-agents. From the user's perspective it's a spinner and silence. Users abandon tasks they can't see progressing.&lt;/p&gt;

&lt;p&gt;There's no standard mechanism for surfacing intermediate results as first-class events on the session channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There's no way to interrupt.&lt;/strong&gt; Once generation starts, the user is locked out. Interruption requires bi-directional communication on the same channel simultaneously, user input arriving while agent output is still streaming, without breaking state. One company disabled user input entirely during agent responses because the backend couldn't distinguish an intentional cancel from a dropped connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre343jt98g88k35r6ovo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre343jt98g88k35r6ovo.png" alt=" " width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent keeps working after the user has left.&lt;/strong&gt; No signal tells the agent the user closed the tab. Compute and token costs accumulate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/presence-occupancy/presence" rel="noopener noreferrer"&gt;Presence&lt;/a&gt; is a live membership set showing who is active in the session. Agents use it to pause expensive operations when nobody is there and resume when they return.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple agents collide.&lt;/strong&gt; When two specialist agents are working on the same request, every intermediate update routes through the orchestrator. The orchestrator becomes a bottleneck: when it's relaying progress it doesn't care about, the architecture starts to fight itself. &lt;a href="https://ably.com/blog/multi-agent-ai-systems" rel="noopener noreferrer"&gt;The multi-agent coordination post goes deeper on how this plays out with concurrent specialist agents.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents fail silently.&lt;/strong&gt; Most infrastructure has no agent health mechanism at the transport level. When an agent crashes, a presence disconnect fires immediately, rather than waiting for a timeout inferred from a dead stream. Build on the wrong signal and recovery logic breaks under real failure conditions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4erssut4y9x7u106v15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4erssut4y9x7u106v15.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human handovers lose context.&lt;/strong&gt; When an agent escalates, most implementations open a different interface, summarize what happened, and hope the transfer works. The user explains their problem again. A &lt;a href="https://ably.com/docs/ai-transport/messaging/human-in-the-loop" rel="noopener noreferrer"&gt;unified channel where agents and humans can both participate&lt;/a&gt; addresses this: the human arrives with full history and picks up mid-thread.&lt;/p&gt;

&lt;p&gt;There are no transport-level diagnostics. Model-level tooling shows what the model decided to do. Nothing shows what happened between the agent and the user's screen: whether a message arrived, whether a reconnection worked, whether delivery stalled. Debugging a failed session means stitching together server logs that rarely reconstruct what actually happened.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funrbaou5u34xrpo9yszd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funrbaou5u34xrpo9yszd.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the transport layer needs to handle
&lt;/h2&gt;

&lt;p&gt;Resumable streaming. Output persists in the channel, not the connection. When a client reconnects, it rejoins from its last received position with no gaps and no duplicates. Mutable messages handle retry corruption: republish to the same message ID and the client sees clean updated state, not a second copy. Vercel built a pluggable &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/transport" rel="noopener noreferrer"&gt;ChatTransport interface&lt;/a&gt; specifically to support this pattern; TanStack AI shipped a &lt;a href="https://tanstack.com/ai/latest/docs/guides/connection-adapters" rel="noopener noreferrer"&gt;ConnectionAdapter&lt;/a&gt; for the same reason. The ecosystem has diagnosed the problem and built the plug-in points.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-device continuity.&lt;/strong&gt; &lt;a href="https://ably.com/docs/ai-transport/sessions-identity" rel="noopener noreferrer"&gt;Session state lives on the channel, not any individual client.&lt;/a&gt; Any device subscribing gets the same history and live updates. The session follows the user, not the connection.&lt;/p&gt;

&lt;p&gt;23 of 26 AI platforms evaluated in recent market research have no multi-device session continuity, including ChatGPT.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bi-directional communication on a shared channel.&lt;/strong&gt; User input and agent output flow on the same channel simultaneously. A redirect from the user arrives as an explicit signal while the agent is mid-stream, not as an ambiguous TCP side effect. The backend can now distinguish an intentional cancel from a dropped connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Progress as structured events.&lt;/strong&gt; Agent reasoning steps, tool call progress, and intermediate results should be &lt;a href="https://ably.com/docs/ai-transport/messaging" rel="noopener noreferrer"&gt;first-class events on the channel&lt;/a&gt;, subscribable independently of the main response stream. Specialized agents publish progress directly. The orchestrator stops relaying events it doesn't care about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence.&lt;/strong&gt; A live membership set for users, agents, and human operators. Agents make real decisions based on it: pause when the user is gone, resume when they return. Crash detection is a presence event: when an agent disconnects, the event fires immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session-level diagnostics.&lt;/strong&gt; Channel history serves as both the live diagnostic feed and the persistent audit record: structured, timestamped, and identity-attributed. This covers the delivery layer between agent and user, separate from model-level observability, and both surfaces matter in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The underlying principle
&lt;/h2&gt;

&lt;p&gt;Each of these problems is tractable in isolation. Solving all of them together, without a dedicated infrastructure layer, is where engineering budget quietly disappears. None of it has anything to do with the AI product itself.&lt;/p&gt;

&lt;p&gt;The workaround that seemed to hold breaks as soon as teams need cancellation, multi-device continuity, or human handover without a context break. The result is a growing layer of glue code that keeps teams away from the features they're actually trying to ship.&lt;/p&gt;

&lt;p&gt;The category forming around this problem, durable sessions, is the session-layer equivalent of what durable execution did for backend workflows. The infrastructure requirement is the same: a layer built for the failure modes that actually occur, not workarounds patched onto infrastructure designed for something else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Ably AI Transport fits
&lt;/h2&gt;

&lt;p&gt;Ably AI Transport is a drop-in durable session layer that absorbs this complexity. Developers publish to a session. The infrastructure handles resumable streaming, multi-device continuity, presence, shared state, and bi-directional communication. No changes required to your model calls or agent orchestration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Docs go deeper →&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Why your AI response restarts on page refresh (and what it takes to prevent it)</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 13 Apr 2026 13:58:52 +0000</pubDate>
      <link>https://dev.to/ablyblog/why-your-ai-response-restarts-on-page-refresh-and-what-it-takes-to-prevent-it-gd2</link>
      <guid>https://dev.to/ablyblog/why-your-ai-response-restarts-on-page-refresh-and-what-it-takes-to-prevent-it-gd2</guid>
      <description>&lt;p&gt;Your AI assistant is mid-sentence explaining a complex debugging strategy. The user refreshes the page. The response starts over from the beginning, or worse, vanishes entirely.&lt;/p&gt;

&lt;p&gt;This isn't a model problem. It's a delivery problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What breaks&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most AI applications &lt;a href="https://ably.com/docs/ai-transport/token-streaming" rel="noopener noreferrer"&gt;stream LLM responses&lt;/a&gt; over HTTP using Server-Sent Events or fetch streams. The connection delivers tokens in order until the response completes. If the user refreshes, closes the tab, or loses network connectivity, the stream ends. When they reconnect, there's no mechanism to resume from where they left off.&lt;/p&gt;

&lt;p&gt;The application has two options: start the entire response over (wasting tokens and user time) or lose everything that was streamed before the disconnection (losing context the user already read).&lt;/p&gt;

&lt;p&gt;Neither option works in production. Users refresh pages. Networks drop. Browsers crash. Mobile apps background. These aren't edge cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F861n0s6abmkchmlecz2j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F861n0s6abmkchmlecz2j.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why naive approaches fail&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Client-side buffering:&lt;/strong&gt; You can cache tokens in memory or localStorage, but this only handles intentional refreshes on the same device. It doesn't help with network interruptions, crashes, or users switching devices mid-conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response regeneration:&lt;/strong&gt; Re-requesting the full response from the LLM costs tokens, adds latency, and often produces different output. The user sees the response change on reload, breaking continuity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless HTTP streaming:&lt;/strong&gt; Standard SSE and fetch streams have no concept of session recovery. When the connection closes, the client has no way to tell the server "resume from token 847."&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How resumable streaming actually works&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The system needs three components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session identity:&lt;/strong&gt; Each AI response gets a unique session ID that persists across connections. When the client reconnects, it presents this ID to resume the same logical response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offset tracking:&lt;/strong&gt; The server tracks which tokens have been delivered. The client tracks which tokens it has received and rendered. On reconnect, the client requests "start from token N."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ordered delivery with history:&lt;/strong&gt; The transport layer guarantees token ordering and maintains a replayable history. When a client reconnects with an offset, the server resumes delivery from that point without re-invoking the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquckez75mq23z3u3tlsn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquckez75mq23z3u3tlsn.png" alt=" " width="800" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Tradeoffs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Building this yourself means managing session state, handling offset synchronisation across multiple connections, and ensuring tokens arrive in order even if network packets don't. You'll need persistent storage for token history and logic to handle race conditions when users reconnect from multiple tabs.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;A concrete example&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;User asks an AI assistant to explain a codebase. The LLM streams 2,000 tokens over 30 seconds. At token 1,247, the user's network drops for eight seconds. Without resumability, the user sees a frozen response, then either loses everything or watches it restart.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;resumable streaming&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The client detects the disconnection and stores offset 1,247&lt;/li&gt;
&lt;li&gt;Network recovers, client reconnects with session ID and offset&lt;/li&gt;
&lt;li&gt;Server resumes delivery from token 1,248&lt;/li&gt;
&lt;li&gt;User sees the response continue exactly where it stopped&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user never knows there was an interruption.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Multi-device continuity&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/blog/ably-ai-transport" rel="noopener noreferrer"&gt;Resumable streaming also enables conversation continuity across devices&lt;/a&gt;. The user starts a question on their laptop, switches to their phone, and sees the AI response pick up mid-stream. Same session ID, same offset tracking, different client.&lt;/p&gt;

&lt;p&gt;This matters for AI workflows that span locations: research started at a desk, continued on a commute, finished in a meeting room. Without transport-level session management, each device restart loses context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mvla9ev7prpo067bqhl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mvla9ev7prpo067bqhl.png" alt=" " width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why this matters for AI reliability&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Unreliable delivery creates unreliable AI experiences. Users learn not to trust that responses will complete. They avoid asking complex questions because they might lose the answer. They stop using AI features on mobile networks.&lt;/p&gt;

&lt;p&gt;Fixing this isn't about better models or smarter prompts. It's about ensuring delivery is as dependable as the intelligence behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Next steps&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you're building AI features where responses take more than a few seconds, or where users might switch devices or encounter network issues, you need resumable streaming. You can build session management and offset tracking yourself, or use infrastructure like &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; that handles it for you.&lt;/p&gt;

&lt;p&gt;Either way, design for reconnection from day one. Your users will refresh. Your network will drop. Production isn't a stable connection.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Resume tokens and last-event IDs for LLM streaming: How they work &amp; what they cost to build</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 13 Apr 2026 13:32:33 +0000</pubDate>
      <link>https://dev.to/ablyblog/resume-tokens-and-last-event-ids-for-llm-streaming-how-they-work-what-they-cost-to-build-4l7e</link>
      <guid>https://dev.to/ablyblog/resume-tokens-and-last-event-ids-for-llm-streaming-how-they-work-what-they-cost-to-build-4l7e</guid>
      <description>&lt;p&gt;When an AI response reaches token 150 and the connection drops, most implementations have one answer: start over. The user re-prompts, you pay for the same tokens twice, and the experience breaks.&lt;/p&gt;

&lt;p&gt;Resume tokens and last-event IDs are the mechanism that prevents this. They make streams addressable – every message gets an identifier, clients track their position, and reconnections pick up from exactly where they left off. The concept is straightforward. The production scope is not: storage design, deduplication, gap detection, distributed routing, and multi-device continuity all follow from the same first decision.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e66kdpxwi5xaz8ib8jy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e66kdpxwi5xaz8ib8jy.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How resume tokens actually work
&lt;/h2&gt;

&lt;p&gt;Resumable streaming has four moving parts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Message identifiers.&lt;/strong&gt; Every token or message gets a sequential ID when published – monotonically increasing, so each new message has a higher ID than the previous one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client state.&lt;/strong&gt; The client tracks the ID of the last message it successfully received. In a browser, that's typically held in memory or local storage. On mobile, it needs to survive app backgrounding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconnection protocol.&lt;/strong&gt; When the connection drops, the client presents the last ID it saw. The server responds with everything that arrived after that ID, then transitions to live streaming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Catchup delivery.&lt;/strong&gt; The client receives missed messages in order before live tokens resume. The seam should be invisible.&lt;/p&gt;

&lt;p&gt;The stream itself becomes the source of truth. The client doesn't reconstruct what it missed – the stream delivers it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SSE's Last-Event-ID header gives you
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/topic/server-sent-events" rel="noopener noreferrer"&gt;Server-Sent Events&lt;/a&gt; implements this natively. When an SSE connection drops, the browser automatically includes a Last-Event-ID header on reconnection. The server sees which event the client last received and resumes from there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftnssh7fnzz544paspo1k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftnssh7fnzz544paspo1k.png" alt=" " width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The browser handles reconnect logic. Application code doesn't change between initial connection and reconnection. For the happy path – stable connection, single device, short responses – SSE with Last-Event-ID works well.&lt;/p&gt;

&lt;p&gt;The problems start at the boundary of what SSE can do.&lt;/p&gt;

&lt;p&gt;SSE is unidirectional and HTTP-only. It has no native history beyond what you implement server-side. It doesn't handle bidirectional messaging, so live steering – users redirecting the AI mid-response – requires a separate channel. On distributed infrastructure, a reconnecting client may reach a different server instance that has no record of the original session. SSE handles the reconnect handshake. Everything else – distributed state, per-instance routing, multi-device history – is still your problem. For use cases that need bidirectional messaging, &lt;a href="https://ably.com/blog/websockets-vs-sse" rel="noopener noreferrer"&gt;WebSockets vs SSE&lt;/a&gt; covers the tradeoffs in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building resume into WebSockets
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/topic/websockets" rel="noopener noreferrer"&gt;WebSockets&lt;/a&gt; don't include resume semantics. When a WebSocket closes, the connection is gone. Reconnecting creates a new socket with no knowledge of the previous one.&lt;/p&gt;

&lt;p&gt;Building resume on WebSockets means building all of it yourself:&lt;/p&gt;

&lt;p&gt;Session IDs generated at stream start, stored server-side, presented by the client on reconnection. Message IDs assigned sequentially. Server logic to look up a session, find the position, replay history, then transition to live. Buffer management to decide how long to keep messages for sessions that haven't reconnected yet. Cleanup logic to expire stale sessions without cutting off legitimate reconnects.&lt;/p&gt;

&lt;p&gt;Each piece is straightforward in isolation. The edge cases are where the weeks go.&lt;/p&gt;

&lt;h2&gt;
  
  
  The storage problem teams underestimate
&lt;/h2&gt;

&lt;p&gt;Token-level storage is where most implementations hit an unexpected wall.&lt;/p&gt;

&lt;p&gt;A 500-word response generates roughly 625 tokens. If you store each token as a separate record, loading one response means retrieving 625 records. A conversation with 20 exchanges is 12,500 records. Multiply across thousands of concurrent users and history retrieval becomes the performance bottleneck.&lt;/p&gt;

&lt;p&gt;This matters because history retrieval is on the critical path for multi-device continuity. When a user switches from laptop to phone, the speed of catchup determines whether the experience feels continuous or broken.&lt;/p&gt;

&lt;p&gt;The more practical model is to treat each AI response as a single logical message and append tokens to it rather than publishing them individually. Clients joining mid-stream receive the full message so far, then get new tokens as they arrive. One record per response instead of hundreds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Duplicates and gaps: the two failure modes that break trust
&lt;/h2&gt;

&lt;p&gt;Duplicates happen when the connection drops after the client receives a message but before the acknowledgement reaches the server. On reconnect, the server doesn't know whether to replay that message. Without deduplication logic, the client renders the same token twice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmq3n9zn1kh5sy5x77r9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmq3n9zn1kh5sy5x77r9.png" alt=" " width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fix is using message IDs as deduplication keys on the client – straightforward in principle, but it needs to survive page reloads and work across tabs.&lt;/p&gt;

&lt;p&gt;Gaps happen when sequential IDs arrive out of order or not at all. If a client receives message 153 after 150, messages 151 and 152 are missing. Without gap detection, the client silently renders an incomplete response. With it, you need logic to request missing messages, decide what to do if they can't be retrieved, and handle the state when the client gives up waiting.&lt;/p&gt;

&lt;p&gt;Both failure modes are rare enough to be invisible in testing. Both surface under real network conditions: mobile handoffs, flaky WiFi, corporate proxy timeouts. The first time you see them is usually a support ticket.&lt;/p&gt;

&lt;h2&gt;
  
  
  What distributed deployment adds
&lt;/h2&gt;

&lt;p&gt;A single-server implementation can tie session state to process memory and mostly work. As soon as you run multiple instances – which you will, for reliability and scale – a routing problem appears.&lt;/p&gt;

&lt;p&gt;A client that connected to instance A reconnects to instance B. Instance B has no record of the session. Your options: route all reconnections back to the originating instance (a pinning strategy that creates hotspots and defeats the purpose of multiple instances), or store session state in shared infrastructure that all instances can read.&lt;/p&gt;

&lt;p&gt;Shared session storage means Redis or equivalent: network round-trips on reconnect, cache invalidation logic, and failure handling when the cache is unavailable. This is solvable. It's also not in the first implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The multi-device gap
&lt;/h2&gt;

&lt;p&gt;Multi-device continuity is where connection-oriented design hits a wall.&lt;/p&gt;

&lt;p&gt;When state lives in the connection – or in server memory tied to that connection – device switching loses context. The phone doesn't know what the laptop received. Without a shared source of truth for message history that any device can query, each reconnect from a new device is a new session.&lt;/p&gt;

&lt;p&gt;True multi-device continuity requires decoupling state from connections entirely. The conversation lives in a channel or persistent store. Devices subscribe and catch up rather than resuming a connection.&lt;/p&gt;

&lt;p&gt;This is a different architectural model than resuming an HTTP stream. For most teams, that realisation arrives after the first implementation is already in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4k5mq6yl5vgbly6h6auo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4k5mq6yl5vgbly6h6auo.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When resumable streaming matters most
&lt;/h2&gt;

&lt;p&gt;Not every streaming application needs this. For short-lived, single-session interactions on stable connections, standard HTTP streaming is fine.&lt;/p&gt;

&lt;p&gt;Resume becomes critical under specific conditions:&lt;/p&gt;

&lt;p&gt;Mobile clients handle network handoffs between WiFi and cellular constantly. Each one is a potential disconnection.&lt;/p&gt;

&lt;p&gt;Long responses – anything over 30 seconds – have a high probability of encountering a transient failure.&lt;/p&gt;

&lt;p&gt;Multi-device usage means the conversation needs to live in a channel, not a connection.&lt;/p&gt;

&lt;p&gt;Multi-agent systems, where several agents publish updates to a shared channel. A reconnecting client needs to catch up on everything all agents published, not just the primary response thread.&lt;/p&gt;

&lt;p&gt;The alternative is forcing users to restart on every interruption. That breaks trust fast, and the cost compounds on longer or more complex tasks where restarting is most painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you're actually signing up for to build this
&lt;/h2&gt;

&lt;p&gt;Teams that have shipped resumable streaming in production describe a consistent arc: the first implementation takes a week, the edge cases take a month, and cross-device reliability is still not fully solved six months later.&lt;/p&gt;

&lt;p&gt;The full scope of a production-grade build: session management, message storage with efficient retrieval by ID range, client-side deduplication, gap detection, distributed routing, cache invalidation, buffer expiry, and monitoring to surface issues you can't reproduce locally.&lt;/p&gt;

&lt;p&gt;Good transport infrastructure handles duplicates and gaps automatically. Application logic shouldn't need to check for either – that's the infrastructure's job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build vs infrastructure
&lt;/h2&gt;

&lt;p&gt;Building resumable streaming yourself is a reasonable choice if you have a stable team, time to maintain it, and no multi-device or distributed requirements.&lt;/p&gt;

&lt;p&gt;It's a harder choice than the SSE documentation makes it look. One team described spending several weeks on custom session management and still not fully solving cross-device reliability. The problems weren't obvious in the design phase – they appeared under mobile network conditions, under load, and when users did things the system wasn't built to handle.&lt;/p&gt;

&lt;p&gt;The alternative is transport infrastructure that implements resume as part of the platform. You keep control of your LLM, prompts, and application logic. Session continuity, offset management, ordered delivery, and multi-device state become infrastructure concerns rather than application concerns.&lt;/p&gt;

&lt;p&gt;Both paths are defensible. The costs of building are real and most of them are invisible until the first deploy.&lt;/p&gt;




&lt;p&gt;Streaming responses between AI agents and clients? &lt;a href="https://ably.com/docs/ai-transport/token-streaming" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; includes resumable token streaming, automatic replay, and channel-based delivery with guaranteed ordering. Docs go deeper.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Appends for AI apps: Stream into a single message with Ably AI Transport</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Thu, 26 Feb 2026 12:05:30 +0000</pubDate>
      <link>https://dev.to/ablyblog/appends-for-ai-apps-stream-into-a-single-message-with-ably-ai-transport-398a</link>
      <guid>https://dev.to/ablyblog/appends-for-ai-apps-stream-into-a-single-message-with-ably-ai-transport-398a</guid>
      <description>&lt;p&gt;Streaming tokens is easy. Resuming cleanly is not. A user refreshes mid-response, another client joins late, a mobile connection drops for 10 seconds, and suddenly your "one answer" is 600 tiny messages that your UI has to stitch back together. Message history turns into fragments. You start building a side store just to reconstruct "the response so far".&lt;/p&gt;

&lt;p&gt;This is not a model problem. It's a delivery problem&lt;/p&gt;

&lt;p&gt;That's why we developed message appends for &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;. Appends let you stream AI output tokens into a single message as they are produced, so you get progressive rendering for live subscribers and a clean, compact response in history.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The failure mode we're fixing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The usual implementation is to stream each token as a single message, which is simple and works perfectly on a stable connection. In production, clients disconnect and resume mid-stream: refreshes, mobile dropouts, backgrounded tabs, and late joins.&lt;/p&gt;

&lt;p&gt;Once you have real reconnects and refreshes, you inherit work you did not plan for: ordering, dedupe, buffering, "latest wins" logic, and replay rules that make history and realtime agree. You can build it, but it is the kind of work that quietly eats weeks of engineering time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxg5b1j6o07bcp2xkds.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxg5b1j6o07bcp2xkds.png" alt=" " width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With appends you can avoid that by changing the shape of the data. Instead of hundreds of token messages, you have one response message whose content grows over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pattern: create once, append many
&lt;/h3&gt;

&lt;p&gt;In Ably AI Transport, you publish an initial response message and capture its server-assigned serial. That serial is what you append to.&lt;/p&gt;

&lt;p&gt;It's a small detail that ends up doing a lot of work for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;serials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;msgSerial&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, as your model yields tokens, you append each fragment to that same message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msgSerial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;What changes for clients&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Subscribers still see progressive output, but they see it as actions on the same message serial. A response starts with a create, tokens arrive as appends, and occasionally clients may receive a full-state update to resynchronise (for example after a reconnection).&lt;/p&gt;

&lt;p&gt;Most UIs end up implementing this shape anyway. With appends, it becomes boring and predictable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message.append&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;renderAppend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message.update&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;renderReplace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important difference is that history and realtime stop disagreeing, without your client code doing any extra work. You render progressively for live users, and you still treat the response as one message for storage, retrieval, and rewind.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Reconnects and refresh stop being special cases&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Short disconnects are one thing. Refresh is the painful case, because local state is gone and to stream each token as a single message forces you into replaying fragments and hoping the client reconstructs the same response.&lt;/p&gt;

&lt;p&gt;With message-per-response, hydration is straightforward because there is always a current accumulated version of the response message. Clients joining late or reloading can fetch the latest state as a single message and continue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/channels/options/rewind" rel="noopener noreferrer"&gt;Rewind&lt;/a&gt; and history become useful again because you are rewinding meaningful messages, not token confetti:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;realtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai:chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rewind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2m&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Token rates without token-rate pain
&lt;/h3&gt;

&lt;p&gt;Models can emit tokens far faster than most realtime setups want to publish. If you publish a message per token, rate limits become your problem and your agent has to handle batching in your code.&lt;/p&gt;

&lt;p&gt;Appends are designed for high-frequency workloads and include automatic rollups. Subscribers still receive progressive updates, but Ably can roll up rapid appends under the hood so you do not have to build your own throttling layer.&lt;/p&gt;

&lt;p&gt;If you need to tune the tradeoff between smoothness and message rate, you can adjust appendRollupWindow. Smaller windows feel more responsive but consume more message-rate capacity. Larger windows batch more aggressively but arrive in bigger chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enabling appends&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Appends require the "Message annotations, updates, appends, and deletes" channel rule for the namespace you're using. Enabling it also means messages are persisted, which affects usage and billing.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why this is a better default for AI output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you are shipping agentic AI apps, you eventually need three things at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streaming UX&lt;/li&gt;
&lt;li&gt;history that's usable&lt;/li&gt;
&lt;li&gt;recovery that does not depend on luck&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Appends are how you get there without building your own "message reconstruction" subsystem. If you want the deeper mechanics (including the message-per-response pattern and rollup tuning), the &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;AI Transport docs&lt;/a&gt; are the best place to start.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>streaming</category>
      <category>realtime</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Realtime steering: interrupt, barge-in, redirect, and guide the AI</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 09 Feb 2026 09:58:06 +0000</pubDate>
      <link>https://dev.to/ablyblog/realtime-steering-interrupt-barge-in-redirect-and-guide-the-ai-22ai</link>
      <guid>https://dev.to/ablyblog/realtime-steering-interrupt-barge-in-redirect-and-guide-the-ai-22ai</guid>
      <description>&lt;p&gt;Start typing, change your mind, redirect the AI mid-response. It just works. That is the promise of realtime steering. Users expect to interrupt an answer, correct its direction, or inject new instructions on the fly without losing context or restarting the session. It feels simple, but delivering it requires low-latency control signals, reliable cancellation, and shared conversational state that survives disconnects and device switches. This post explores why expectations have shifted, why today's stacks struggle with these patterns, and what your infrastructure needs to support proper realtime steering.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What's changing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI tools are moving beyond static, one-turn interactions. Users expect to interact dynamically, especially in chat. But most AI systems today force users to wait while the assistant responds in full, even if it's off-track or no longer relevant. That's not how human conversations work.&lt;/p&gt;

&lt;p&gt;Expectations are shifting toward something more natural. Users want to jump in mid-stream, adjust the AI's course, or stop it altogether. These patterns (barge-in, redirect, steer) are becoming table stakes for responsive, agentic assistants.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What users want, and why this enhances the experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Users want to stay in control of the conversation. If the AI starts drifting, they want to say "stop" or "try a different angle" and get an immediate course correction. They want to guide the assistant's direction without breaking the flow or starting over.&lt;/p&gt;

&lt;p&gt;This improves trust, keeps sessions on-topic, and avoids wasted time. It also brings AI interactions closer to how real collaboration works: iterative, reactive, fast.&lt;/p&gt;

&lt;p&gt;Users now expect a few technical behaviours as part of that experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Responses can be interrupted in real time&lt;/li&gt;
&lt;li&gt;New instructions are applied mid-stream without reset&lt;/li&gt;
&lt;li&gt;The AI keeps context and adjusts without losing the thread&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why realtime steering is proving hard to build&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most AI systems treat generation as a one-way stream. Once the model starts producing tokens, the system just plays them out to the client. If the user wants to interrupt or change direction, the only real option is to cancel and resend a new prompt - often from scratch. That's because most systems today cannot support mid-stream redirection because their underlying communication model does not allow it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless HTTP cannot carry steering signals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional request–response models push output in one direction only. Once a long-running generation begins, there is no reliable way to send control signals back to the server. Cancelling or redirecting usually means tearing down the stream and starting again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser-held state breaks immediately&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most apps keep the state of an active generation in the browser. If the user refreshes or switches device, the in-flight response loses continuity. Any client-side steering logic tied to that state vanishes too, which forces a full reset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend models often run without shared conversational state&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the orchestration layer is not tracking what the AI is currently doing, it cannot apply corrections cleanly. The model receives a brand-new prompt instead of a context-preserving instruction layered onto an active task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The default stack was never designed for low-latency control loops&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Steering requires coordinated signalling between UI, transport, orchestration, and model inference. That means ordering guarantees, durable state, and fast propagation of control messages. Without these, the AI continues generating tokens after a user says stop, causing confusion and wasted compute.&lt;/p&gt;

&lt;p&gt;Steering mid-stream looks like a simple UX gesture. It is not. It is a distributed-systems problem sitting under a conversational interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why you need a drop-in AI transport layer for steering&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Delivering realtime control requires more than token streaming. It requires a transport layer that keeps context alive, supports low-latency bidirectional messaging, and ensures that user instructions and model output remain synchronised.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bi-directional, low-latency messaging&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Client-side signals such as "stop" or "try this instead" must reach the backend quickly and reliably. WebSockets or similar long-lived connections make this possible by enabling client-to-server control while the &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;AI continues to stream output.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpa1kjj7ko14raf19ty0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpa1kjj7ko14raf19ty0.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliable interrupt and cancellation primitives&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stopping generation must be instant and clean. The transport must carry cancellation events with ordering guarantees so the backend halts inference exactly where intended, without corrupting state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session continuity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system needs persistent session identity so instructions and outputs are tied to the same conversational thread. Redirection should extend the session, not rebuild it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg6p1z9am78x5bck9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg6p1z9am78x5bck9a.png" alt=" " width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence and focus tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If users have &lt;a href="https://ably.com/blog/cross-device-ai-sync" rel="noopener noreferrer"&gt;multiple tabs or devices&lt;/a&gt; open, the system needs to know where instructions are coming from. Steering messages must route to the correct active session without collisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx70zhamocl1d04pebxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx70zhamocl1d04pebxr.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Realtime steering relies on a transport layer designed for conversational control, not just message delivery.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User experience desired&lt;/th&gt;
&lt;th&gt;Required transport layer features&lt;/th&gt;
&lt;th&gt;Underlying technical implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interrupt and redirect responses in real time&lt;/td&gt;
&lt;td&gt;Bi-directional messaging&lt;/td&gt;
&lt;td&gt;WebSocket-based channels enabling client-to-server signals during output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cancel generation cleanly&lt;/td&gt;
&lt;td&gt;Interrupt primitives&lt;/td&gt;
&lt;td&gt;Server-side control hooks to stop model inference and close stream pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preserve continuity after steering&lt;/td&gt;
&lt;td&gt;Session continuity&lt;/td&gt;
&lt;td&gt;Persistent session or conversation IDs with context caching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update response direction on the fly&lt;/td&gt;
&lt;td&gt;Dynamic state sync&lt;/td&gt;
&lt;td&gt;Shared state model where new input is merged into active conversational context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steer across devices&lt;/td&gt;
&lt;td&gt;Identity-aware multiplexing&lt;/td&gt;
&lt;td&gt;Fan-out model updates across all user sessions in sync&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Realtime steering for AI you can ship today&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You don't need a new architecture to support real-time steering, cancellation, or recovery. You need a transport layer that can keep the session alive, deliver messages in order, and preserve state across disconnects. &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; provides those foundations out of the box, so you can build controllable, resilient AI interactions without rebuilding your entire stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;Sign-up for a free account&lt;/a&gt; and try today.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>systemdesign</category>
      <category>ux</category>
    </item>
  </channel>
</rss>
