<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ably Blog</title>
    <description>The latest articles on DEV Community by Ably Blog (@ablyblog).</description>
    <link>https://dev.to/ablyblog</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F987387%2Ff9d0ea92-d06e-46d6-8efc-6e92b510943e.png</url>
      <title>DEV Community: Ably Blog</title>
      <link>https://dev.to/ablyblog</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ablyblog"/>
    <language>en</language>
    <item>
      <title>The missing transport layer in user-facing AI applications</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 13 Apr 2026 14:17:39 +0000</pubDate>
      <link>https://dev.to/ablyblog/the-missing-transport-layer-in-user-facing-ai-applications-3j90</link>
      <guid>https://dev.to/ablyblog/the-missing-transport-layer-in-user-facing-ai-applications-3j90</guid>
      <description>&lt;p&gt;Most AI applications start the same way: wire up an LLM, stream tokens to the browser, ship. That works for simple request-response. It breaks when sessions outlast a connection, when users switch devices, or when an agent needs to hand off to a human.&lt;/p&gt;

&lt;p&gt;The cracks appear in the delivery layer, not the model. Every serious production team discovers this independently and builds their own workaround. Those workarounds don't hold once users start hitting them in production.&lt;/p&gt;

&lt;p&gt;Here's what breaks, and what the transport layer needs to handle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift that creates the problem
&lt;/h2&gt;

&lt;p&gt;Simple AI applications are synchronous. User sends a message, model returns a response, done. A dropped connection restarts cleanly.&lt;/p&gt;

&lt;p&gt;Agentic applications aren't like that. They run in a loop: perceive the user's intent, reason with the model, act by calling tools or sub-agents, and observe the result. Then they go around again until the task is done.&lt;/p&gt;

&lt;p&gt;A research agent might loop a dozen times over several minutes, calling APIs and querying databases. The user is present throughout, watching, waiting, potentially needing to redirect. The connection might drop mid-loop, the user might switch devices, or they realize mid-stream the agent is heading the wrong way.&lt;/p&gt;

&lt;p&gt;That's a different problem, and one HTTP streaming wasn't designed to solve. The backend surviving and the session surviving are two different things. What's missing is a layer that treats the conversation as durable state: persisting across connections, devices, and participants.&lt;/p&gt;

&lt;p&gt;Durable execution makes the backend crash-proof. Durable sessions makes what the user actually sees crash-proof. Most teams building agentic products need both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks in production
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tokens disappear and reconnects corrupt state.&lt;/strong&gt; HTTP streaming delivers tokens once. A dropped connection loses them. Most workarounds handle full page reloads but not tab switches or mobile backgrounding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2odrarurbi84dlej26kt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2odrarurbi84dlej26kt.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Worse, naive reconnect implementations replay the same output and produce duplicates: fragments, repeated tokens, or an interface in an indeterminate state. The Vercel AI SDK makes the tradeoff explicit: its resume and stop features are incompatible. You can resume a dropped stream or cancel it, but not both. &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;A full breakdown of what resumable streaming requires at the infrastructure level is here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Users can't see what the agent is doing.&lt;/strong&gt; The agent is running tool calls, checking backend systems, orchestrating sub-agents. From the user's perspective it's a spinner and silence. Users abandon tasks they can't see progressing.&lt;/p&gt;

&lt;p&gt;There's no standard mechanism for surfacing intermediate results as first-class events on the session channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There's no way to interrupt.&lt;/strong&gt; Once generation starts, the user is locked out. Interruption requires bi-directional communication on the same channel simultaneously, user input arriving while agent output is still streaming, without breaking state. One company disabled user input entirely during agent responses because the backend couldn't distinguish an intentional cancel from a dropped connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre343jt98g88k35r6ovo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre343jt98g88k35r6ovo.png" alt=" " width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent keeps working after the user has left.&lt;/strong&gt; No signal tells the agent the user closed the tab. Compute and token costs accumulate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/presence-occupancy/presence" rel="noopener noreferrer"&gt;Presence&lt;/a&gt; is a live membership set showing who is active in the session. Agents use it to pause expensive operations when nobody is there and resume when they return.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple agents collide.&lt;/strong&gt; When two specialist agents are working on the same request, every intermediate update routes through the orchestrator. The orchestrator becomes a bottleneck: when it's relaying progress it doesn't care about, the architecture starts to fight itself. &lt;a href="https://ably.com/blog/multi-agent-ai-systems" rel="noopener noreferrer"&gt;The multi-agent coordination post goes deeper on how this plays out with concurrent specialist agents.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents fail silently.&lt;/strong&gt; Most infrastructure has no agent health mechanism at the transport level. When an agent crashes, a presence disconnect fires immediately, rather than waiting for a timeout inferred from a dead stream. Build on the wrong signal and recovery logic breaks under real failure conditions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4erssut4y9x7u106v15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4erssut4y9x7u106v15.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human handovers lose context.&lt;/strong&gt; When an agent escalates, most implementations open a different interface, summarize what happened, and hope the transfer works. The user explains their problem again. A &lt;a href="https://ably.com/docs/ai-transport/messaging/human-in-the-loop" rel="noopener noreferrer"&gt;unified channel where agents and humans can both participate&lt;/a&gt; addresses this: the human arrives with full history and picks up mid-thread.&lt;/p&gt;

&lt;p&gt;There are no transport-level diagnostics. Model-level tooling shows what the model decided to do. Nothing shows what happened between the agent and the user's screen: whether a message arrived, whether a reconnection worked, whether delivery stalled. Debugging a failed session means stitching together server logs that rarely reconstruct what actually happened.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funrbaou5u34xrpo9yszd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funrbaou5u34xrpo9yszd.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the transport layer needs to handle
&lt;/h2&gt;

&lt;p&gt;Resumable streaming. Output persists in the channel, not the connection. When a client reconnects, it rejoins from its last received position with no gaps and no duplicates. Mutable messages handle retry corruption: republish to the same message ID and the client sees clean updated state, not a second copy. Vercel built a pluggable &lt;a href="https://ai-sdk.dev/docs/ai-sdk-ui/transport" rel="noopener noreferrer"&gt;ChatTransport interface&lt;/a&gt; specifically to support this pattern; TanStack AI shipped a &lt;a href="https://tanstack.com/ai/latest/docs/guides/connection-adapters" rel="noopener noreferrer"&gt;ConnectionAdapter&lt;/a&gt; for the same reason. The ecosystem has diagnosed the problem and built the plug-in points.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-device continuity.&lt;/strong&gt; &lt;a href="https://ably.com/docs/ai-transport/sessions-identity" rel="noopener noreferrer"&gt;Session state lives on the channel, not any individual client.&lt;/a&gt; Any device subscribing gets the same history and live updates. The session follows the user, not the connection.&lt;/p&gt;

&lt;p&gt;23 of 26 AI platforms evaluated in recent market research have no multi-device session continuity, including ChatGPT.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bi-directional communication on a shared channel.&lt;/strong&gt; User input and agent output flow on the same channel simultaneously. A redirect from the user arrives as an explicit signal while the agent is mid-stream, not as an ambiguous TCP side effect. The backend can now distinguish an intentional cancel from a dropped connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Progress as structured events.&lt;/strong&gt; Agent reasoning steps, tool call progress, and intermediate results should be &lt;a href="https://ably.com/docs/ai-transport/messaging" rel="noopener noreferrer"&gt;first-class events on the channel&lt;/a&gt;, subscribable independently of the main response stream. Specialized agents publish progress directly. The orchestrator stops relaying events it doesn't care about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence.&lt;/strong&gt; A live membership set for users, agents, and human operators. Agents make real decisions based on it: pause when the user is gone, resume when they return. Crash detection is a presence event: when an agent disconnects, the event fires immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session-level diagnostics.&lt;/strong&gt; Channel history serves as both the live diagnostic feed and the persistent audit record: structured, timestamped, and identity-attributed. This covers the delivery layer between agent and user, separate from model-level observability, and both surfaces matter in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The underlying principle
&lt;/h2&gt;

&lt;p&gt;Each of these problems is tractable in isolation. Solving all of them together, without a dedicated infrastructure layer, is where engineering budget quietly disappears. None of it has anything to do with the AI product itself.&lt;/p&gt;

&lt;p&gt;The workaround that seemed to hold breaks as soon as teams need cancellation, multi-device continuity, or human handover without a context break. The result is a growing layer of glue code that keeps teams away from the features they're actually trying to ship.&lt;/p&gt;

&lt;p&gt;The category forming around this problem, durable sessions, is the session-layer equivalent of what durable execution did for backend workflows. The infrastructure requirement is the same: a layer built for the failure modes that actually occur, not workarounds patched onto infrastructure designed for something else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Ably AI Transport fits
&lt;/h2&gt;

&lt;p&gt;Ably AI Transport is a drop-in durable session layer that absorbs this complexity. Developers publish to a session. The infrastructure handles resumable streaming, multi-device continuity, presence, shared state, and bi-directional communication. No changes required to your model calls or agent orchestration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Docs go deeper →&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Why your AI response restarts on page refresh (and what it takes to prevent it)</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 13 Apr 2026 13:58:52 +0000</pubDate>
      <link>https://dev.to/ablyblog/why-your-ai-response-restarts-on-page-refresh-and-what-it-takes-to-prevent-it-gd2</link>
      <guid>https://dev.to/ablyblog/why-your-ai-response-restarts-on-page-refresh-and-what-it-takes-to-prevent-it-gd2</guid>
      <description>&lt;p&gt;Your AI assistant is mid-sentence explaining a complex debugging strategy. The user refreshes the page. The response starts over from the beginning, or worse, vanishes entirely.&lt;/p&gt;

&lt;p&gt;This isn't a model problem. It's a delivery problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What breaks&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most AI applications &lt;a href="https://ably.com/docs/ai-transport/token-streaming" rel="noopener noreferrer"&gt;stream LLM responses&lt;/a&gt; over HTTP using Server-Sent Events or fetch streams. The connection delivers tokens in order until the response completes. If the user refreshes, closes the tab, or loses network connectivity, the stream ends. When they reconnect, there's no mechanism to resume from where they left off.&lt;/p&gt;

&lt;p&gt;The application has two options: start the entire response over (wasting tokens and user time) or lose everything that was streamed before the disconnection (losing context the user already read).&lt;/p&gt;

&lt;p&gt;Neither option works in production. Users refresh pages. Networks drop. Browsers crash. Mobile apps background. These aren't edge cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F861n0s6abmkchmlecz2j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F861n0s6abmkchmlecz2j.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why naive approaches fail&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Client-side buffering:&lt;/strong&gt; You can cache tokens in memory or localStorage, but this only handles intentional refreshes on the same device. It doesn't help with network interruptions, crashes, or users switching devices mid-conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response regeneration:&lt;/strong&gt; Re-requesting the full response from the LLM costs tokens, adds latency, and often produces different output. The user sees the response change on reload, breaking continuity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless HTTP streaming:&lt;/strong&gt; Standard SSE and fetch streams have no concept of session recovery. When the connection closes, the client has no way to tell the server "resume from token 847."&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How resumable streaming actually works&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The system needs three components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session identity:&lt;/strong&gt; Each AI response gets a unique session ID that persists across connections. When the client reconnects, it presents this ID to resume the same logical response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offset tracking:&lt;/strong&gt; The server tracks which tokens have been delivered. The client tracks which tokens it has received and rendered. On reconnect, the client requests "start from token N."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ordered delivery with history:&lt;/strong&gt; The transport layer guarantees token ordering and maintains a replayable history. When a client reconnects with an offset, the server resumes delivery from that point without re-invoking the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquckez75mq23z3u3tlsn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquckez75mq23z3u3tlsn.png" alt=" " width="800" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Tradeoffs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Building this yourself means managing session state, handling offset synchronisation across multiple connections, and ensuring tokens arrive in order even if network packets don't. You'll need persistent storage for token history and logic to handle race conditions when users reconnect from multiple tabs.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;A concrete example&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;User asks an AI assistant to explain a codebase. The LLM streams 2,000 tokens over 30 seconds. At token 1,247, the user's network drops for eight seconds. Without resumability, the user sees a frozen response, then either loses everything or watches it restart.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;resumable streaming&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The client detects the disconnection and stores offset 1,247&lt;/li&gt;
&lt;li&gt;Network recovers, client reconnects with session ID and offset&lt;/li&gt;
&lt;li&gt;Server resumes delivery from token 1,248&lt;/li&gt;
&lt;li&gt;User sees the response continue exactly where it stopped&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user never knows there was an interruption.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Multi-device continuity&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/blog/ably-ai-transport" rel="noopener noreferrer"&gt;Resumable streaming also enables conversation continuity across devices&lt;/a&gt;. The user starts a question on their laptop, switches to their phone, and sees the AI response pick up mid-stream. Same session ID, same offset tracking, different client.&lt;/p&gt;

&lt;p&gt;This matters for AI workflows that span locations: research started at a desk, continued on a commute, finished in a meeting room. Without transport-level session management, each device restart loses context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mvla9ev7prpo067bqhl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mvla9ev7prpo067bqhl.png" alt=" " width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why this matters for AI reliability&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Unreliable delivery creates unreliable AI experiences. Users learn not to trust that responses will complete. They avoid asking complex questions because they might lose the answer. They stop using AI features on mobile networks.&lt;/p&gt;

&lt;p&gt;Fixing this isn't about better models or smarter prompts. It's about ensuring delivery is as dependable as the intelligence behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Next steps&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you're building AI features where responses take more than a few seconds, or where users might switch devices or encounter network issues, you need resumable streaming. You can build session management and offset tracking yourself, or use infrastructure like &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; that handles it for you.&lt;/p&gt;

&lt;p&gt;Either way, design for reconnection from day one. Your users will refresh. Your network will drop. Production isn't a stable connection.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Resume tokens and last-event IDs for LLM streaming: How they work &amp; what they cost to build</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 13 Apr 2026 13:32:33 +0000</pubDate>
      <link>https://dev.to/ablyblog/resume-tokens-and-last-event-ids-for-llm-streaming-how-they-work-what-they-cost-to-build-4l7e</link>
      <guid>https://dev.to/ablyblog/resume-tokens-and-last-event-ids-for-llm-streaming-how-they-work-what-they-cost-to-build-4l7e</guid>
      <description>&lt;p&gt;When an AI response reaches token 150 and the connection drops, most implementations have one answer: start over. The user re-prompts, you pay for the same tokens twice, and the experience breaks.&lt;/p&gt;

&lt;p&gt;Resume tokens and last-event IDs are the mechanism that prevents this. They make streams addressable – every message gets an identifier, clients track their position, and reconnections pick up from exactly where they left off. The concept is straightforward. The production scope is not: storage design, deduplication, gap detection, distributed routing, and multi-device continuity all follow from the same first decision.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e66kdpxwi5xaz8ib8jy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e66kdpxwi5xaz8ib8jy.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How resume tokens actually work
&lt;/h2&gt;

&lt;p&gt;Resumable streaming has four moving parts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Message identifiers.&lt;/strong&gt; Every token or message gets a sequential ID when published – monotonically increasing, so each new message has a higher ID than the previous one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client state.&lt;/strong&gt; The client tracks the ID of the last message it successfully received. In a browser, that's typically held in memory or local storage. On mobile, it needs to survive app backgrounding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconnection protocol.&lt;/strong&gt; When the connection drops, the client presents the last ID it saw. The server responds with everything that arrived after that ID, then transitions to live streaming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Catchup delivery.&lt;/strong&gt; The client receives missed messages in order before live tokens resume. The seam should be invisible.&lt;/p&gt;

&lt;p&gt;The stream itself becomes the source of truth. The client doesn't reconstruct what it missed – the stream delivers it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SSE's Last-Event-ID header gives you
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/topic/server-sent-events" rel="noopener noreferrer"&gt;Server-Sent Events&lt;/a&gt; implements this natively. When an SSE connection drops, the browser automatically includes a Last-Event-ID header on reconnection. The server sees which event the client last received and resumes from there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftnssh7fnzz544paspo1k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftnssh7fnzz544paspo1k.png" alt=" " width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The browser handles reconnect logic. Application code doesn't change between initial connection and reconnection. For the happy path – stable connection, single device, short responses – SSE with Last-Event-ID works well.&lt;/p&gt;

&lt;p&gt;The problems start at the boundary of what SSE can do.&lt;/p&gt;

&lt;p&gt;SSE is unidirectional and HTTP-only. It has no native history beyond what you implement server-side. It doesn't handle bidirectional messaging, so live steering – users redirecting the AI mid-response – requires a separate channel. On distributed infrastructure, a reconnecting client may reach a different server instance that has no record of the original session. SSE handles the reconnect handshake. Everything else – distributed state, per-instance routing, multi-device history – is still your problem. For use cases that need bidirectional messaging, &lt;a href="https://ably.com/blog/websockets-vs-sse" rel="noopener noreferrer"&gt;WebSockets vs SSE&lt;/a&gt; covers the tradeoffs in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building resume into WebSockets
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ably.com/topic/websockets" rel="noopener noreferrer"&gt;WebSockets&lt;/a&gt; don't include resume semantics. When a WebSocket closes, the connection is gone. Reconnecting creates a new socket with no knowledge of the previous one.&lt;/p&gt;

&lt;p&gt;Building resume on WebSockets means building all of it yourself:&lt;/p&gt;

&lt;p&gt;Session IDs generated at stream start, stored server-side, presented by the client on reconnection. Message IDs assigned sequentially. Server logic to look up a session, find the position, replay history, then transition to live. Buffer management to decide how long to keep messages for sessions that haven't reconnected yet. Cleanup logic to expire stale sessions without cutting off legitimate reconnects.&lt;/p&gt;

&lt;p&gt;Each piece is straightforward in isolation. The edge cases are where the weeks go.&lt;/p&gt;

&lt;h2&gt;
  
  
  The storage problem teams underestimate
&lt;/h2&gt;

&lt;p&gt;Token-level storage is where most implementations hit an unexpected wall.&lt;/p&gt;

&lt;p&gt;A 500-word response generates roughly 625 tokens. If you store each token as a separate record, loading one response means retrieving 625 records. A conversation with 20 exchanges is 12,500 records. Multiply across thousands of concurrent users and history retrieval becomes the performance bottleneck.&lt;/p&gt;

&lt;p&gt;This matters because history retrieval is on the critical path for multi-device continuity. When a user switches from laptop to phone, the speed of catchup determines whether the experience feels continuous or broken.&lt;/p&gt;

&lt;p&gt;The more practical model is to treat each AI response as a single logical message and append tokens to it rather than publishing them individually. Clients joining mid-stream receive the full message so far, then get new tokens as they arrive. One record per response instead of hundreds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Duplicates and gaps: the two failure modes that break trust
&lt;/h2&gt;

&lt;p&gt;Duplicates happen when the connection drops after the client receives a message but before the acknowledgement reaches the server. On reconnect, the server doesn't know whether to replay that message. Without deduplication logic, the client renders the same token twice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmq3n9zn1kh5sy5x77r9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmq3n9zn1kh5sy5x77r9.png" alt=" " width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fix is using message IDs as deduplication keys on the client – straightforward in principle, but it needs to survive page reloads and work across tabs.&lt;/p&gt;

&lt;p&gt;Gaps happen when sequential IDs arrive out of order or not at all. If a client receives message 153 after 150, messages 151 and 152 are missing. Without gap detection, the client silently renders an incomplete response. With it, you need logic to request missing messages, decide what to do if they can't be retrieved, and handle the state when the client gives up waiting.&lt;/p&gt;

&lt;p&gt;Both failure modes are rare enough to be invisible in testing. Both surface under real network conditions: mobile handoffs, flaky WiFi, corporate proxy timeouts. The first time you see them is usually a support ticket.&lt;/p&gt;

&lt;h2&gt;
  
  
  What distributed deployment adds
&lt;/h2&gt;

&lt;p&gt;A single-server implementation can tie session state to process memory and mostly work. As soon as you run multiple instances – which you will, for reliability and scale – a routing problem appears.&lt;/p&gt;

&lt;p&gt;A client that connected to instance A reconnects to instance B. Instance B has no record of the session. Your options: route all reconnections back to the originating instance (a pinning strategy that creates hotspots and defeats the purpose of multiple instances), or store session state in shared infrastructure that all instances can read.&lt;/p&gt;

&lt;p&gt;Shared session storage means Redis or equivalent: network round-trips on reconnect, cache invalidation logic, and failure handling when the cache is unavailable. This is solvable. It's also not in the first implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The multi-device gap
&lt;/h2&gt;

&lt;p&gt;Multi-device continuity is where connection-oriented design hits a wall.&lt;/p&gt;

&lt;p&gt;When state lives in the connection – or in server memory tied to that connection – device switching loses context. The phone doesn't know what the laptop received. Without a shared source of truth for message history that any device can query, each reconnect from a new device is a new session.&lt;/p&gt;

&lt;p&gt;True multi-device continuity requires decoupling state from connections entirely. The conversation lives in a channel or persistent store. Devices subscribe and catch up rather than resuming a connection.&lt;/p&gt;

&lt;p&gt;This is a different architectural model than resuming an HTTP stream. For most teams, that realisation arrives after the first implementation is already in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4k5mq6yl5vgbly6h6auo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4k5mq6yl5vgbly6h6auo.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When resumable streaming matters most
&lt;/h2&gt;

&lt;p&gt;Not every streaming application needs this. For short-lived, single-session interactions on stable connections, standard HTTP streaming is fine.&lt;/p&gt;

&lt;p&gt;Resume becomes critical under specific conditions:&lt;/p&gt;

&lt;p&gt;Mobile clients handle network handoffs between WiFi and cellular constantly. Each one is a potential disconnection.&lt;/p&gt;

&lt;p&gt;Long responses – anything over 30 seconds – have a high probability of encountering a transient failure.&lt;/p&gt;

&lt;p&gt;Multi-device usage means the conversation needs to live in a channel, not a connection.&lt;/p&gt;

&lt;p&gt;Multi-agent systems, where several agents publish updates to a shared channel. A reconnecting client needs to catch up on everything all agents published, not just the primary response thread.&lt;/p&gt;

&lt;p&gt;The alternative is forcing users to restart on every interruption. That breaks trust fast, and the cost compounds on longer or more complex tasks where restarting is most painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you're actually signing up for to build this
&lt;/h2&gt;

&lt;p&gt;Teams that have shipped resumable streaming in production describe a consistent arc: the first implementation takes a week, the edge cases take a month, and cross-device reliability is still not fully solved six months later.&lt;/p&gt;

&lt;p&gt;The full scope of a production-grade build: session management, message storage with efficient retrieval by ID range, client-side deduplication, gap detection, distributed routing, cache invalidation, buffer expiry, and monitoring to surface issues you can't reproduce locally.&lt;/p&gt;

&lt;p&gt;Good transport infrastructure handles duplicates and gaps automatically. Application logic shouldn't need to check for either – that's the infrastructure's job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build vs infrastructure
&lt;/h2&gt;

&lt;p&gt;Building resumable streaming yourself is a reasonable choice if you have a stable team, time to maintain it, and no multi-device or distributed requirements.&lt;/p&gt;

&lt;p&gt;It's a harder choice than the SSE documentation makes it look. One team described spending several weeks on custom session management and still not fully solving cross-device reliability. The problems weren't obvious in the design phase – they appeared under mobile network conditions, under load, and when users did things the system wasn't built to handle.&lt;/p&gt;

&lt;p&gt;The alternative is transport infrastructure that implements resume as part of the platform. You keep control of your LLM, prompts, and application logic. Session continuity, offset management, ordered delivery, and multi-device state become infrastructure concerns rather than application concerns.&lt;/p&gt;

&lt;p&gt;Both paths are defensible. The costs of building are real and most of them are invisible until the first deploy.&lt;/p&gt;




&lt;p&gt;Streaming responses between AI agents and clients? &lt;a href="https://ably.com/docs/ai-transport/token-streaming" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; includes resumable token streaming, automatic replay, and channel-based delivery with guaranteed ordering. Docs go deeper.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Appends for AI apps: Stream into a single message with Ably AI Transport</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Thu, 26 Feb 2026 12:05:30 +0000</pubDate>
      <link>https://dev.to/ablyblog/appends-for-ai-apps-stream-into-a-single-message-with-ably-ai-transport-398a</link>
      <guid>https://dev.to/ablyblog/appends-for-ai-apps-stream-into-a-single-message-with-ably-ai-transport-398a</guid>
      <description>&lt;p&gt;Streaming tokens is easy. Resuming cleanly is not. A user refreshes mid-response, another client joins late, a mobile connection drops for 10 seconds, and suddenly your "one answer" is 600 tiny messages that your UI has to stitch back together. Message history turns into fragments. You start building a side store just to reconstruct "the response so far".&lt;/p&gt;

&lt;p&gt;This is not a model problem. It's a delivery problem&lt;/p&gt;

&lt;p&gt;That's why we developed message appends for &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;. Appends let you stream AI output tokens into a single message as they are produced, so you get progressive rendering for live subscribers and a clean, compact response in history.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The failure mode we're fixing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The usual implementation is to stream each token as a single message, which is simple and works perfectly on a stable connection. In production, clients disconnect and resume mid-stream: refreshes, mobile dropouts, backgrounded tabs, and late joins.&lt;/p&gt;

&lt;p&gt;Once you have real reconnects and refreshes, you inherit work you did not plan for: ordering, dedupe, buffering, "latest wins" logic, and replay rules that make history and realtime agree. You can build it, but it is the kind of work that quietly eats weeks of engineering time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxg5b1j6o07bcp2xkds.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxg5b1j6o07bcp2xkds.png" alt=" " width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With appends you can avoid that by changing the shape of the data. Instead of hundreds of token messages, you have one response message whose content grows over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pattern: create once, append many
&lt;/h3&gt;

&lt;p&gt;In Ably AI Transport, you publish an initial response message and capture its server-assigned serial. That serial is what you append to.&lt;/p&gt;

&lt;p&gt;It's a small detail that ends up doing a lot of work for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;serials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;msgSerial&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, as your model yields tokens, you append each fragment to that same message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msgSerial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;What changes for clients&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Subscribers still see progressive output, but they see it as actions on the same message serial. A response starts with a create, tokens arrive as appends, and occasionally clients may receive a full-state update to resynchronise (for example after a reconnection).&lt;/p&gt;

&lt;p&gt;Most UIs end up implementing this shape anyway. With appends, it becomes boring and predictable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message.append&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;renderAppend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message.update&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;renderReplace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important difference is that history and realtime stop disagreeing, without your client code doing any extra work. You render progressively for live users, and you still treat the response as one message for storage, retrieval, and rewind.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Reconnects and refresh stop being special cases&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Short disconnects are one thing. Refresh is the painful case, because local state is gone and to stream each token as a single message forces you into replaying fragments and hoping the client reconstructs the same response.&lt;/p&gt;

&lt;p&gt;With message-per-response, hydration is straightforward because there is always a current accumulated version of the response message. Clients joining late or reloading can fetch the latest state as a single message and continue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/channels/options/rewind" rel="noopener noreferrer"&gt;Rewind&lt;/a&gt; and history become useful again because you are rewinding meaningful messages, not token confetti:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;realtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai:chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rewind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2m&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Token rates without token-rate pain
&lt;/h3&gt;

&lt;p&gt;Models can emit tokens far faster than most realtime setups want to publish. If you publish a message per token, rate limits become your problem and your agent has to handle batching in your code.&lt;/p&gt;

&lt;p&gt;Appends are designed for high-frequency workloads and include automatic rollups. Subscribers still receive progressive updates, but Ably can roll up rapid appends under the hood so you do not have to build your own throttling layer.&lt;/p&gt;

&lt;p&gt;If you need to tune the tradeoff between smoothness and message rate, you can adjust appendRollupWindow. Smaller windows feel more responsive but consume more message-rate capacity. Larger windows batch more aggressively but arrive in bigger chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enabling appends&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Appends require the "Message annotations, updates, appends, and deletes" channel rule for the namespace you're using. Enabling it also means messages are persisted, which affects usage and billing.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why this is a better default for AI output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you are shipping agentic AI apps, you eventually need three things at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streaming UX&lt;/li&gt;
&lt;li&gt;history that's usable&lt;/li&gt;
&lt;li&gt;recovery that does not depend on luck&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Appends are how you get there without building your own "message reconstruction" subsystem. If you want the deeper mechanics (including the message-per-response pattern and rollup tuning), the &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;AI Transport docs&lt;/a&gt; are the best place to start.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>streaming</category>
      <category>realtime</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Realtime steering: interrupt, barge-in, redirect, and guide the AI</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 09 Feb 2026 09:58:06 +0000</pubDate>
      <link>https://dev.to/ablyblog/realtime-steering-interrupt-barge-in-redirect-and-guide-the-ai-22ai</link>
      <guid>https://dev.to/ablyblog/realtime-steering-interrupt-barge-in-redirect-and-guide-the-ai-22ai</guid>
      <description>&lt;p&gt;Start typing, change your mind, redirect the AI mid-response. It just works. That is the promise of realtime steering. Users expect to interrupt an answer, correct its direction, or inject new instructions on the fly without losing context or restarting the session. It feels simple, but delivering it requires low-latency control signals, reliable cancellation, and shared conversational state that survives disconnects and device switches. This post explores why expectations have shifted, why today's stacks struggle with these patterns, and what your infrastructure needs to support proper realtime steering.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What's changing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI tools are moving beyond static, one-turn interactions. Users expect to interact dynamically, especially in chat. But most AI systems today force users to wait while the assistant responds in full, even if it's off-track or no longer relevant. That's not how human conversations work.&lt;/p&gt;

&lt;p&gt;Expectations are shifting toward something more natural. Users want to jump in mid-stream, adjust the AI's course, or stop it altogether. These patterns (barge-in, redirect, steer) are becoming table stakes for responsive, agentic assistants.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What users want, and why this enhances the experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Users want to stay in control of the conversation. If the AI starts drifting, they want to say "stop" or "try a different angle" and get an immediate course correction. They want to guide the assistant's direction without breaking the flow or starting over.&lt;/p&gt;

&lt;p&gt;This improves trust, keeps sessions on-topic, and avoids wasted time. It also brings AI interactions closer to how real collaboration works: iterative, reactive, fast.&lt;/p&gt;

&lt;p&gt;Users now expect a few technical behaviours as part of that experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Responses can be interrupted in real time&lt;/li&gt;
&lt;li&gt;New instructions are applied mid-stream without reset&lt;/li&gt;
&lt;li&gt;The AI keeps context and adjusts without losing the thread&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why realtime steering is proving hard to build&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most AI systems treat generation as a one-way stream. Once the model starts producing tokens, the system just plays them out to the client. If the user wants to interrupt or change direction, the only real option is to cancel and resend a new prompt - often from scratch. That's because most systems today cannot support mid-stream redirection because their underlying communication model does not allow it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless HTTP cannot carry steering signals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional request–response models push output in one direction only. Once a long-running generation begins, there is no reliable way to send control signals back to the server. Cancelling or redirecting usually means tearing down the stream and starting again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser-held state breaks immediately&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most apps keep the state of an active generation in the browser. If the user refreshes or switches device, the in-flight response loses continuity. Any client-side steering logic tied to that state vanishes too, which forces a full reset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend models often run without shared conversational state&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the orchestration layer is not tracking what the AI is currently doing, it cannot apply corrections cleanly. The model receives a brand-new prompt instead of a context-preserving instruction layered onto an active task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The default stack was never designed for low-latency control loops&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Steering requires coordinated signalling between UI, transport, orchestration, and model inference. That means ordering guarantees, durable state, and fast propagation of control messages. Without these, the AI continues generating tokens after a user says stop, causing confusion and wasted compute.&lt;/p&gt;

&lt;p&gt;Steering mid-stream looks like a simple UX gesture. It is not. It is a distributed-systems problem sitting under a conversational interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why you need a drop-in AI transport layer for steering&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Delivering realtime control requires more than token streaming. It requires a transport layer that keeps context alive, supports low-latency bidirectional messaging, and ensures that user instructions and model output remain synchronised.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bi-directional, low-latency messaging&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Client-side signals such as "stop" or "try this instead" must reach the backend quickly and reliably. WebSockets or similar long-lived connections make this possible by enabling client-to-server control while the &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;AI continues to stream output.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpa1kjj7ko14raf19ty0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpa1kjj7ko14raf19ty0.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliable interrupt and cancellation primitives&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stopping generation must be instant and clean. The transport must carry cancellation events with ordering guarantees so the backend halts inference exactly where intended, without corrupting state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session continuity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system needs persistent session identity so instructions and outputs are tied to the same conversational thread. Redirection should extend the session, not rebuild it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg6p1z9am78x5bck9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg6p1z9am78x5bck9a.png" alt=" " width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence and focus tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If users have &lt;a href="https://ably.com/blog/cross-device-ai-sync" rel="noopener noreferrer"&gt;multiple tabs or devices&lt;/a&gt; open, the system needs to know where instructions are coming from. Steering messages must route to the correct active session without collisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx70zhamocl1d04pebxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx70zhamocl1d04pebxr.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Realtime steering relies on a transport layer designed for conversational control, not just message delivery.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User experience desired&lt;/th&gt;
&lt;th&gt;Required transport layer features&lt;/th&gt;
&lt;th&gt;Underlying technical implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interrupt and redirect responses in real time&lt;/td&gt;
&lt;td&gt;Bi-directional messaging&lt;/td&gt;
&lt;td&gt;WebSocket-based channels enabling client-to-server signals during output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cancel generation cleanly&lt;/td&gt;
&lt;td&gt;Interrupt primitives&lt;/td&gt;
&lt;td&gt;Server-side control hooks to stop model inference and close stream pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preserve continuity after steering&lt;/td&gt;
&lt;td&gt;Session continuity&lt;/td&gt;
&lt;td&gt;Persistent session or conversation IDs with context caching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update response direction on the fly&lt;/td&gt;
&lt;td&gt;Dynamic state sync&lt;/td&gt;
&lt;td&gt;Shared state model where new input is merged into active conversational context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steer across devices&lt;/td&gt;
&lt;td&gt;Identity-aware multiplexing&lt;/td&gt;
&lt;td&gt;Fan-out model updates across all user sessions in sync&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Realtime steering for AI you can ship today&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You don't need a new architecture to support real-time steering, cancellation, or recovery. You need a transport layer that can keep the session alive, deliver messages in order, and preserve state across disconnects. &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; provides those foundations out of the box, so you can build controllable, resilient AI interactions without rebuilding your entire stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;Sign-up for a free account&lt;/a&gt; and try today.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>systemdesign</category>
      <category>ux</category>
    </item>
    <item>
      <title>Why orchestrators become a bottleneck in multi-agent AI published</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 03 Feb 2026 12:42:23 +0000</pubDate>
      <link>https://dev.to/ablyblog/why-orchestrators-become-a-bottleneck-in-multi-agent-ai-published-4mgf</link>
      <guid>https://dev.to/ablyblog/why-orchestrators-become-a-bottleneck-in-multi-agent-ai-published-4mgf</guid>
      <description>&lt;p&gt;Complex user tasks often need multiple AI agents working together, not just a single assistant. That's what agent collaboration enables. Each agent has its own specialism - planning, fetching, checking, summarising - and they work in tandem to get the job done. The experience feels intelligent and joined-up, not monolithic or linear. But making that work means more than prompt chaining or orchestration logic. It requires shared state, reliable coordination, and user-visible progress as agents branch out and converge again. This post explores what users now expect, why traditional infrastructure falls short, and how to support truly collaborative AI systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What's changing?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The shift from simple question-response to collaborative AI experiences goes beyond continuity or conversation. It's about delegation. Users are starting to expect AI systems that can take a complex request and break it down behind the scenes. That means not one big model doing everything, but a network of agents, each focused on a part of the task, coordinating to deliver a coherent outcome. We've seen this in tools like travel planners, research assistants, and document generators. You don't just want answers, you want progress, structure, and coordination you can see. The AI system shouldn't just feel like a chat thread, it should feel like a team quietly getting on with things while keeping you informed.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What users want, and why this enhances the experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When users interact with a system powered by multiple agents, they want to feel the benefits of parallelism without the overhead of managing complexity. If one agent is fetching flight data, another handling hotel options, and a third reviewing visa requirements, the user doesn't care about the internal plumbing. They care that their travel plan is evolving visibly and coherently. They want to see that agents are working, understand what's happening in realtime, and be able to intervene or revise things if needed.&lt;/p&gt;

&lt;p&gt;Crucially, users expect the state of their task to reflect reality, not just the conversation. If they change a hotel selection manually, the system should adapt. If an agent crashes or stalls, the UI should show it. The value isn't just in faster results, it's in reliability, transparency, and the sense that multiple agents are genuinely collaborating, with each other and with the user - toward a shared goal.&lt;/p&gt;

&lt;p&gt;To deliver this, agent systems need to stay in sync. State needs to be shared across agents and user sessions. Progress needs to be surfaced incrementally, not hidden behind a final answer. And context must be preserved so agents don't overwrite or duplicate each other's work. That's what turns a bunch of isolated model calls into a coordinated assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why this is proving challenging&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-agent systems &lt;em&gt;can&lt;/em&gt; work today, but the default pattern most tools push you toward is an orchestration-first user experience. Even when multiple agents are running behind the scenes, their activity is typically funnelled through a single orchestrator that becomes the only "voice" the user can see. That hides useful progress, creates a single bottleneck for updates, and limits how fluid the experience can feel.&lt;/p&gt;

&lt;p&gt;That's because traditional LLM interfaces assume a single stream of input and a single stream of output. Orchestration frameworks may invoke multiple agents in parallel, but the UI still tends to expose a linear, synchronous workflow: the orchestrator collects results, then reports back. If the user changes direction mid-process, or if an agent needs to react immediately to something in shared state, you're often forced back into "wait for the orchestrator" loops.&lt;/p&gt;

&lt;p&gt;The underlying infrastructure assumptions reinforce this. HTTP request/response cycles work well when one component is responsible for coordinating everything, but they make it awkward for &lt;em&gt;multiple&lt;/em&gt; agents to maintain an ongoing, direct connection to the user and to shared context. Token streaming helps, but it usually represents one agent's output to one user - not concurrent updates from a group of agents reacting in real time to a changing state.&lt;/p&gt;

&lt;p&gt;Ultimately, the challenge isn't that orchestration fails. It's that it constrains app developers. Most systems don't give you fine-grained control over which agent communicates what, when, and how, or an easy way to reflect multi-agent activity directly in the user experience. To build confidence and responsiveness, clients need to know which agents are active, what they're doing, and how that activity relates to the shared, realtime session context - without everything having to be mediated by a heavyweight orchestrator.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6e7vwnv8qc56l22wz5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6e7vwnv8qc56l22wz5h.png" alt=" " width="800" height="556"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why you need a drop-in AI transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To make multi-agent collaboration work in practice, you need infrastructure that handles concurrency, coordination, and visibility - not just messaging.&lt;/p&gt;

&lt;p&gt;The transport layer must support persistent, multiplexed communication where multiple agents can publish updates independently while still participating in the same user session. That gives app developers fine-grained control over the user experience: which agents speak to the user, when they speak, and how progress is presented. Orchestrators can still exist, but they don't have to mediate every user-facing update.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrkj6qtpjfddywuyvza6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrkj6qtpjfddywuyvza6.png" alt=" " width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  State synchronisation is non-negotiable
&lt;/h3&gt;

&lt;p&gt;Structured data, like a list of selected hotels or the current trip itinerary, should live in a realtime session store that agents and UIs can both read from and write to. This creates a  single source of truth, even when updates happen asynchronously, across devices, or outside the chat interface&lt;/p&gt;

&lt;h3&gt;
  
  
  Presence adds another layer of confidence
&lt;/h3&gt;

&lt;p&gt;When users see which agents are online and working, it sets expectations and builds trust. If an agent goes offline, the system should detect it, not leave the user guessing. This becomes even more important as these systems scale up in production environments where reliability is critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interruption handling rounds it out
&lt;/h3&gt;

&lt;p&gt;Users will change their minds mid-task. Your system needs to respond without the orchestrator agent tearing down and restarting everything. That means listening for user input while processing, canceling or rerouting tasks, and updating the shared state cleanly so individual agents can pick up where they left off or switch strategies on the fly.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User experience desired&lt;/th&gt;
&lt;th&gt;Required transport layer features&lt;/th&gt;
&lt;th&gt;Underlying technical implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visible, concurrent agent progress&lt;/td&gt;
&lt;td&gt;Multiplexed pub/sub channels&lt;/td&gt;
&lt;td&gt;Multiple agents publish progress updates to a shared realtime channel the UI subscribes to&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared, up-to-date task state&lt;/td&gt;
&lt;td&gt;Structured state synchronisation&lt;/td&gt;
&lt;td&gt;Use of shared shared session state with clear schemas to reflect selections, status, and choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Seamless agent-to-agent coordination&lt;/td&gt;
&lt;td&gt;Out-of-band messaging support&lt;/td&gt;
&lt;td&gt;Internal HTTP APIs or RPC protocols between agents, decoupled from user-facing updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Awareness of system activity and health&lt;/td&gt;
&lt;td&gt;Presence tracking&lt;/td&gt;
&lt;td&gt;Agents register presence on connection and broadcast availability or error states&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graceful handling of mid-task changes&lt;/td&gt;
&lt;td&gt;Event-driven state updates and recovery&lt;/td&gt;
&lt;td&gt;Listen to user changes in shared state and cancel or adjust in-flight work accordingly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Making it work today&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-agent collaboration is already happening in planning tools, research systems, and internal automation workflows. The models are not the limiting factor. The hard part is the infrastructure that keeps agents in sync, shares state reliably, and exposes progress to users in real time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; gives you the infrastructure needed to support this pattern. Realtime channels, shared state objects, presence, and resilient connections provide the foundations for agents that coordinate reliably and surface their work as it happens. No rebuilds, no custom multiplexing, no home-grown state machinery.&lt;/p&gt;

&lt;p&gt;Sign-up for a free developer account and try it out.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Multi-agent AI systems need infrastructure that can keep up</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Fri, 30 Jan 2026 10:49:09 +0000</pubDate>
      <link>https://dev.to/ablyblog/multi-agent-ai-systems-need-infrastructure-that-can-keep-up-3aj7</link>
      <guid>https://dev.to/ablyblog/multi-agent-ai-systems-need-infrastructure-that-can-keep-up-3aj7</guid>
      <description>&lt;h2&gt;
  
  
  An Ably AI Transport demo
&lt;/h2&gt;

&lt;p&gt;When you're building agentic AI applications with multiple agents working together, the infrastructure challenges show up fast. Agents need to coordinate, users need visibility into what's happening, and the whole system needs to stay responsive even as tasks branch out across specialised workers.&lt;/p&gt;

&lt;p&gt;We built a multi-agent travel planning system to understand these problems better. What we learned applies well beyond holiday booking.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/mO53IQcHDaQ"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  The coordination problem
&lt;/h2&gt;

&lt;p&gt;The demo uses four agents: one orchestrator and three specialists (flights, hotels, activities). When a user asks to plan a trip, the orchestrator delegates sub-tasks to the specialists. Each specialist queries data sources, evaluates options, and reports back. The orchestrator synthesises everything and presents choices to the user.&lt;/p&gt;

&lt;p&gt;This mirrors how most teams are actually building agentic systems. You don't build one massive agent that tries to do everything. You build focused agents, give them specific tools, and coordinate between them.&lt;/p&gt;

&lt;p&gt;The infrastructure question is: how do you keep everyone (the agents and the user) synchronized as work happens?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why streaming alone isn't enough
&lt;/h2&gt;

&lt;p&gt;Token streaming solves part of this. The orchestrator can stream its responses back to the user so they're not waiting for complete answers. That's table stakes now for any AI interface.&lt;/p&gt;

&lt;p&gt;But streaming tokens from the orchestrator is only part of the problem. Users want visibility into the behaviour of each specialised agent – through their own token streams, structured updates like pagination progress, or the current reasoning of an agent working through a task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep58p3x4aoo90vi4oxrx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep58p3x4aoo90vi4oxrx.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prompt: Plan a weekend trip to a nearby city&lt;/p&gt;

&lt;p&gt;In our AI Transport demo, we also use &lt;a href="https://ably.com/liveobjects" rel="noopener noreferrer"&gt;Ably LiveObjects&lt;/a&gt; to publish progress updates from each specialist agent. The user sees which agent is active (&lt;a href="https://ably.com/docs/presence-occupancy/presence" rel="noopener noreferrer"&gt;tracked via presence&lt;/a&gt;), what it's querying, and how much data it's processing. These aren't logs or debug output. They're structured state updates that drive the UI. The agent even decides how to represent its progress to the user, taking raw database query parameters and turning them into natural language descriptions through a separate model call.&lt;/p&gt;

&lt;p&gt;This requires infrastructure that can handle multiple publishers updating different parts of the shared state concurrently. The flight agent publishes its progress. The hotel agent publishes its progress. The orchestrator streams tokens (and it doesn't need to care about intermediate progress updates from the specialized agents). All on the same channel, all staying in sync.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1cl9ov6pxdjh2pgvkq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1cl9ov6pxdjh2pgvkq4.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent searches for flights and hotels based of user's criteria&lt;/p&gt;

&lt;h2&gt;
  
  
  State that reflects reality, not just conversation
&lt;/h2&gt;

&lt;p&gt;Chat history creates a limited view of what's actually happening. If a user changes their mind, deletes a selection, or modifies something outside the conversation thread, the agent needs to know about it.&lt;/p&gt;

&lt;p&gt;We use Ably LiveObjects to maintain the user's current selections (flights, hotels, activities) and agent status. This creates a source of truth that exists independently of the conversation. The orchestrator can query this state directly through a tool call, even if nothing in the chat history explains the change.&lt;/p&gt;

&lt;p&gt;The interesting bit: agents can &lt;em&gt;subscribe&lt;/em&gt; to changes in this data, so they see updates live. While you could store this in a database and have agents query it via tool calls, the ability to subscribe means agents can react to user context in real time (what the user is doing in the app, data they're manipulating, configuration changes they're making).&lt;/p&gt;

&lt;p&gt;When the user asks "what's my current itinerary?", the agent doesn't rely on conversation history. It checks the actual state. If the user deleted their flight selection, the agent sees that immediately.&lt;/p&gt;

&lt;p&gt;This separation matters more as systems get complex. The conversation is one interface to the system. The actual state (what's selected, what's in progress, what's completed) needs to exist independently. Agents, users, and other parts of your system all need reliable access to current state, not a reconstruction from message history.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F41ec36b05l4v3tg5reuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F41ec36b05l4v3tg5reuh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent offers hotel options while remembering flight choice&lt;/p&gt;

&lt;h2&gt;
  
  
  Synchronising different types of state
&lt;/h2&gt;

&lt;p&gt;Not all state is created equal, and your infrastructure needs to handle different patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured, bounded state&lt;/strong&gt; works well with LiveObjects. Progress indicators (percentage complete, items processed), agent status (online, processing, completed), user selections, and configuration settings all have predictable size limits. Clients can subscribe to changes and re-render UI efficiently. Agents can read current state without parsing through message history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unbounded state&lt;/strong&gt; like full conversation history, audit trails, or complete reasoning chains still belongs in messages on a channel. You're appending to a growing log rather than updating bounded data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bidirectional state synchronization&lt;/strong&gt; enables richer interactions. You can sync agent state to users (progress updates, ETAs, task lists), let users configure controls for agents (settings, preferences, constraints), and give agents visibility into user context (where they are in the app, what they're doing, what data they're viewing). Each of these can use structured data patterns for efficient synchronization.&lt;/p&gt;

&lt;p&gt;The key is knowing which pattern fits which data, and having infrastructure that supports both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decoupling internal coordination from user-facing updates
&lt;/h2&gt;

&lt;p&gt;The agents in our demo communicate with each other over HTTP using agent-to-agent protocols. That's appropriate for internal coordination. It's synchronous, it's request-response, it follows established patterns.&lt;/p&gt;

&lt;p&gt;The user-facing updates go over Ably AI Transport. That's where you need state synchronization and the ability for multiple publishers to update different parts of the UI concurrently.&lt;/p&gt;

&lt;p&gt;This decoupling matters. Each agent can independently decide how to surface its progress updates and state to the user, while the user maintains a single shared view over updates from all agents.&lt;/p&gt;

&lt;p&gt;We also let specialist agents write directly to LiveObjects, bypassing the orchestrator. When the flight agent has progress to report, it writes it. The user sees it. The orchestrator never touches that data (it only needs the final result). This avoids additional coordination and keeps the architecture simpler.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling interruptions
&lt;/h2&gt;

&lt;p&gt;Users change their minds. They interrupt. They refine requests mid-task. Your infrastructure needs to support this without rebuilding everything from scratch.&lt;/p&gt;

&lt;p&gt;In the demo, you can barge in and interrupt the agent while it's working. The system detects the new input, cancels the in-flight task, updates the state, and kicks off a new search. The UI shows the cancellation, the new request, and the new progress, all without breaking the conversation.&lt;/p&gt;

&lt;p&gt;This works because state updates are events on a channel. The agents listen for new user input even while they're processing. When they see it, they can decide whether to cancel current work, adapt it, or complete it first. The infrastructure doesn't dictate this logic (it enables it).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo21gdpcgxxxbski836rt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo21gdpcgxxxbski836rt.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent then helps user select activities to do on trip&lt;/p&gt;

&lt;h2&gt;
  
  
  What presence actually tells you
&lt;/h2&gt;

&lt;p&gt;Before any interaction starts, the UI shows which agents are online. This comes from Presence. Each agent enters presence when it starts up and updates it as its status changes.&lt;/p&gt;

&lt;p&gt;Presence serves multiple purposes. Agents can see the online status of users and take action if a user goes offline (canceling tasks or queuing notifications – essential from a cost optimization perspective). In multi-user applications, users can see who else is online in the conversation. And for your operations team, it's observability built into the architecture. This answers a basic question for users: is this system actually working right now?&lt;/p&gt;

&lt;h2&gt;
  
  
  The enterprise patterns that emerge
&lt;/h2&gt;

&lt;p&gt;This travel demo is deliberately simple, but the patterns map directly to enterprise use cases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research workflows&lt;/strong&gt; where multiple agents pull from different data sources (financial databases, customer records, market data) and coordinate findings. Users need to see progress across all of them, not wait for a final answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document generation&lt;/strong&gt; where one agent structures the outline, others fill in sections, another handles compliance checks. The state (which sections are complete, which are being reviewed, what's been approved) needs to stay synchronized as different agents work in parallel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customer support routing&lt;/strong&gt; where classification agents determine issue type, specialist agents handle resolution, and orchestration agents manage escalation. Status updates need to flow to support reps, customers, and dashboards in real time.&lt;/p&gt;

&lt;p&gt;The common thread: multiple agents, concurrent work, shared state, and humans who need visibility and control. The infrastructure that makes a travel planner responsive and reliable is the same infrastructure that makes these systems work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1qbrx7ghoy0ycph3dj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1qbrx7ghoy0ycph3dj0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Labelled screenshot of AI Travel Agent's moving parts&lt;/p&gt;

&lt;h2&gt;
  
  
  What this requires from infrastructure
&lt;/h2&gt;

&lt;p&gt;You need a reliable transport layer that allows concurrent agents and clients to communicate in realtime. This isn't just about pub/sub – it's about robust infrastructure, high availability, and &lt;a href="https://ably.com/topic/pubsub-delivery-guarantees" rel="noopener noreferrer"&gt;guaranteed delivery&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You need state synchronisation that works for both structured data and message logs. Having access to both patterns depending on your needs is critical–bounded state objects for UI updates and configuration, unbounded message streams for conversation history and audit trails.&lt;/p&gt;

&lt;p&gt;You need presence so you know what's actually online and available. You need &lt;a href="https://ably.com/docs/platform/architecture/connection-recovery" rel="noopener noreferrer"&gt;connection recovery&lt;/a&gt; so users don't lose context when networks flicker.&lt;/p&gt;

&lt;p&gt;Most importantly, you need this to work at the edge – in browsers and mobile apps, not just between backend services. That's where your users are. That's where responsiveness matters. The transport layer needs to be &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;robust enough to handle the reality of client connectivity&lt;/a&gt;: spotty networks, mobile handoffs, browser tabs backgrounded and resumed.&lt;/p&gt;

&lt;p&gt;The hard part of building multi-agent systems isn't the LLMs. The models are getting better every month. The hard part is the coordination, the state management, the visibility, and the reliability as these systems get more complex.&lt;/p&gt;

&lt;p&gt;This is why we built AI Transport. We saw teams struggling with these exact problems: cobbling together WebSocket libraries, building their own state synchronization, dealing with reconnection logic, and watching their systems break under the messiness of real client connectivity. &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;AI Transport gives you the infrastructure layer these systems need&lt;/a&gt;, built on Ably's proven reliability at scale, so you can focus on your agents instead of your transport layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building agentic AI experiences? You can ship it now
&lt;/h2&gt;

&lt;p&gt;This demo was built with &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;. It's achievable today. You don't need to rebuild your stack to make it happen.&lt;/p&gt;

&lt;p&gt;Ably AI Transport provides all you need to support persistent, identity-aware, streaming AI experiences across multiple clients. If you're working on agentic products and want to get this right, improve the AI UX, we'd love to talk.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Anticipatory customer experience: How realtime infrastructure transforms CX</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Wed, 28 Jan 2026 10:27:17 +0000</pubDate>
      <link>https://dev.to/ablyblog/anticipatory-customer-experience-how-realtime-infrastructure-transforms-cx-3pn4</link>
      <guid>https://dev.to/ablyblog/anticipatory-customer-experience-how-realtime-infrastructure-transforms-cx-3pn4</guid>
      <description>&lt;p&gt;We're entering a new era of &lt;strong&gt;anticipatory customer experience&lt;/strong&gt; – one that's not just reactive, not just responsive, but truly predictive. In this new model, systems don't wait for friction to appear; they recognise signals early and step in before the user ever feels a slowdown or moment of uncertainty. The bar has shifted: customers now expect brands to predict their needs and act before friction even surfaces. It's a fundamental rewiring of the relationship between companies and the people they serve.&lt;/p&gt;

&lt;p&gt;This shift toward &lt;strong&gt;predictive customer experiences&lt;/strong&gt; isn't hypothetical. Anticipatory experiences are happening now, powered by &lt;strong&gt;realtime data infrastructure&lt;/strong&gt; that moves companies from playing catch-up to staying ahead. Think of it as the Age of Anticipation – where realtime signals, reliability, and adaptability form the core of modern CX design.&lt;/p&gt;

&lt;p&gt;Anticipatory CX isn't magic, it's just realtime infrastructure done right.&lt;/p&gt;

&lt;p&gt;So, if you're building next-generation CX or AI-powered agentic systems, this article outlines the architectural groundwork required to make anticipation real.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is anticipatory customer experience?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Anticipatory customer experience&lt;/strong&gt; uses realtime data infrastructure to predict and address customer needs before friction occurs. Unlike reactive support that waits for problems, anticipatory CX leverages continuous data streams, event-driven patterns, and predictive signals to intervene proactively, turning unknowns into reassurance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why realtime infrastructure matters for CX:&lt;/strong&gt; Realtime infrastructure enables the continuous flow of customer signals needed for prediction. Without it, systems rely on stale, batch-processed data that kills foresight. Companies like Doxy.me and HubSpot use &lt;strong&gt;realtime platforms&lt;/strong&gt; to anticipate confusion, delays, and churn risk before customers experience frustration.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;From reactive to anticipatory: Why realtime data infrastructure powers predictive CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Anticipation starts with having the right information at the right moment. But prediction requires fresh, &lt;strong&gt;realtime signals&lt;/strong&gt; flowing continuously through your systems.&lt;/p&gt;

&lt;p&gt;The healthcare sector illustrates this shift perfectly. &lt;a href="https://ably.com/case-studies/doxyme" rel="noopener noreferrer"&gt;Doxy.me&lt;/a&gt;, a telehealth platform trusted by hundreds of thousands of providers, faced a critical challenge: how do you anticipate patient confusion before it derails a virtual appointment? Their answer was "teleconsent" – a feature where healthcare providers walk patients through consent forms collaboratively, in real time.&lt;/p&gt;

&lt;p&gt;As the patient reads, fills in fields, and types responses, the provider sees every change as it happens. No refresh required. No lag. No wondering if the patient is stuck on question three. The system detects hesitation patterns and enables providers to intervene before confusion becomes abandonment. This is anticipatory CX in action – predicting friction points and addressing them before they escalate.&lt;/p&gt;

&lt;p&gt;But building this required infrastructure that could handle the continuous flow of patient interactions without introducing the very friction it was meant to eliminate. "The more that I can get my team to focus on healthcare business logic and less to focus on infrastructural data synchronisation, the better," explains Heath Morrison from Doxy.me. "Anything that provides higher level APIs to get us more in that space – and not be specialised in the stuff you guys should specialise in – is appealing and valuable to us."&lt;/p&gt;

&lt;p&gt;By rebuilding their realtime stack on reliable infrastructure, Doxy.me achieved a 65% cost reduction while transforming their system from a liability into a core strength. &lt;strong&gt;&lt;a href="https://ably.com/case-studies/doxyme" rel="noopener noreferrer"&gt;Read the full Doxy.me case study →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Retailers are doing similar work, spotting churn risk in realtime and intervening with targeted offers or support before the customer clicks away. Financial services companies are shifting from asking "what happened?" to "what's about to happen?" These aren't reactive fixes. They're &lt;strong&gt;anticipatory moves&lt;/strong&gt; that change outcomes – but only when the underlying data infrastructure can keep pace.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realtime infrastructure&lt;/strong&gt; like &lt;a href="https://ably.com/pubsub" rel="noopener noreferrer"&gt;Ably's&lt;/a&gt; makes this possible – it's the unseen layer that ensures systems receive the continuous stream of signals they need to predict accurately, without lag or data loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Industries using anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Healthcare:&lt;/strong&gt; &lt;a href="https://ably.com/health-tech" rel="noopener noreferrer"&gt;Telehealth platforms&lt;/a&gt; use realtime infrastructure to anticipate patient needs, showing "doctor joining now" before patients wonder if something's wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Financial Services:&lt;/strong&gt; Banks predict fraud patterns and alert customers to unusual activity before money moves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retail:&lt;/strong&gt; E-commerce platforms spot abandonment signals and intervene with targeted offers before checkout failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logistics:&lt;/strong&gt; Delivery services flag delays and update ETAs before customers start refreshing tracking pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building trust through realtime customer engagement: The infrastructure foundation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Trust is built in moments of uncertainty. And anticipation? It turns unknowns into reassurance.&lt;/p&gt;

&lt;p&gt;Think about the last time you booked a rideshare or waited for a delivery. The difference between a company that leaves you guessing and one that proactively updates you – "your driver is two minutes away," "slight delay, new ETA: 3:47pm" – is the difference between anxiety and confidence. &lt;strong&gt;Realtime anticipation&lt;/strong&gt; doesn't just inform, it reassures.&lt;/p&gt;

&lt;p&gt;Telehealth platforms have figured this out. When patients see "doctor joining now" before they've even begun to wonder if something's wrong, it changes the entire experience. Logistics companies that flag delays before customers start refreshing tracking pages are doing the same thing: reducing friction before it becomes frustration.&lt;/p&gt;

&lt;p&gt;But there's a flip side: when realtime systems fail, trust erodes faster than it built up. A phantom notification, a delayed update, an inaccurate prediction – these aren't just technical hiccups. They're credibility problems. Reliability isn't a nice-to-have, it's the foundation. When customers cite Ably's five-plus years without a global outage, they're not celebrating uptime for its own sake. They're describing the baseline that makes anticipation possible at scale. &lt;a href="https://status.ably.io/" rel="noopener noreferrer"&gt;View Ably's live uptime status&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ably exists to be that foundation. The reason trust can scale across millions of interactions, without companies needing to worry about the underlying infrastructure failing at the worst moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Core technologies behind anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Core technology&lt;/th&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Realtime pub/sub messaging&lt;/td&gt;
&lt;td&gt;WebSocket-based event distribution for instant signal propagation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event-driven architecture&lt;/td&gt;
&lt;td&gt;Composable, adaptive systems that respond to customer signals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predictive analytics&lt;/td&gt;
&lt;td&gt;AI-powered interpretation of continuous data streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous data streams&lt;/td&gt;
&lt;td&gt;Sub-6.5ms message delivery latency without polling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fault-tolerant infrastructure&lt;/td&gt;
&lt;td&gt;99.999% uptime requirements for maintaining trust&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Future-proofing customer experience: Event-driven architecture for anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To anticipate effectively, your CX stack needs to evolve as fast as your customers' expectations do. Rigid, monolithic architectures can't keep up with new signals, emerging channels, or changing customer behaviors. The future belongs to composable, &lt;strong&gt;event-driven systems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Doxy.me's evolution illustrates this perfectly. They built their realtime features organically – using PubNub to handle presence detection and state synchronisation, all ephemeral data that disappeared after each session. But as they planned their next phase, they hit a wall: they needed persistence. The ability to decouple patient workflows from video calls, support richer collaboration, maintain state across sessions, and plug in new capabilities without rebuilding their entire stack. They prototyped with Convex and loved the developer experience, but needed production-grade infrastructure that could slot into their Node/TypeScript/Postgres/AWS environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/pubsub" rel="noopener noreferrer"&gt;&lt;strong&gt;Event-driven architectures&lt;/strong&gt;&lt;/a&gt; make this kind of evolution possible. You can layer in predictive capabilities, plug in new communication channels, or add analytics tools – all without tearing everything down and starting over. One enterprise CX leader described it this way: "We used to dread adding new functionality. Now we think in terms of what events we need to listen for and what actions we want to trigger. It has completely changed our velocity."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/solutions/customer-experience-tech" rel="noopener noreferrer"&gt;Ably&lt;/a&gt; enables this kind of interoperability – CRMs, chat systems, analytics tools, customer-facing applications all publishing and subscribing to customer events in real time. WebSockets and pub/sub patterns ensure consistent, low-latency communication across every channel, without developers having to reinvent transport logic for each integration. It's the connective tissue that makes anticipatory systems work at scale.&lt;/p&gt;

&lt;p&gt;But more moving parts do mean more complexity. Companies need governance frameworks and resilience planning to ensure their adaptive architectures don't become fragile ones. The ones succeeding here aren't necessarily the ones with the newest tech – they're the ones who've built systems that can absorb change without breaking.&lt;/p&gt;

&lt;p&gt;The Age of Anticipation is composable. Adaptive, event-driven architecture is what makes foresight scalable.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to implement anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Establish realtime data infrastructure&lt;/strong&gt; – Replace polling with streaming architecture for continuous signal flow&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Implement event-driven pub/sub patterns&lt;/strong&gt; – Enable loosely coupled systems that respond to customer signals&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Build predictive models using continuous data&lt;/strong&gt; – Layer AI/ML on top of realtime streams for pattern recognition&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Create proactive intervention workflows&lt;/strong&gt; – Design automated responses to predictive signals (offers, alerts, support)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Monitor reliability metrics rigorously&lt;/strong&gt; – Track latency, uptime, message integrity to maintain trust at scale&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What makes this different&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most CX discussions focus on speed (faster responses, quicker resolutions.) But anticipation goes deeper. It's about infrastructure that doesn't just move data quickly, but does so reliably enough to build trust and flexibly enough to adapt as expectations evolve. &lt;a href="https://ably.com/four-pillars-of-dependability" rel="noopener noreferrer"&gt;Explore Ably's four pillars of dependability&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realtime infrastructure&lt;/strong&gt; is the hidden enabler. It's what makes customer care feel effortless, predictive, and ultimately, more human. Not because it replaces human judgment, but because it removes the friction that gets in the way of delivering exceptional care.&lt;/p&gt;

&lt;p&gt;The companies winning in the Age of Anticipation aren't the ones with the flashiest technology demos. They're the ones who've built the unglamorous, reliable, adaptive infrastructure that makes anticipation possible at scale. They've realised that foresight isn't magic – it's architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Business impact of anticipatory customer experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.pwc.com/us/en/services/consulting/business-transformation/library/2025-customer-experience-survey.html" rel="noopener noreferrer"&gt;&lt;strong&gt;52% of consumers&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;stopped using brands after bad experiences&lt;/strong&gt; – making proactive, anticipatory CX non-negotiable (PwC 2025 Customer Experience Survey)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ably.com/case-studies/doxyme" rel="noopener noreferrer"&gt;&lt;strong&gt;65% cost reduction&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;achieved by Doxy.me&lt;/strong&gt; through realtime infrastructure that prevents issues versus fixing them reactively&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://broadbandbreakfast.com/four-predictions-for-customer-experience-in-2025/" rel="noopener noreferrer"&gt;&lt;strong&gt;61% of CX leaders deliver proactive communications&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;using AI&lt;/strong&gt;, while only 6% of laggards do, creating a significant competitive gap (Cisco research)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://status.ably.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;5+ years without a global outage&lt;/strong&gt;&lt;/a&gt; – Ably's proven track record demonstrates the reliability required for maintaining trust at enterprise scale&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nextiva.com/blog/customer-experience-insights.html" rel="noopener noreferrer"&gt;&lt;strong&gt;40% of companies plan to increase investment&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;in predictive instant experiences&lt;/strong&gt; in 2025, signalling industry-wide shift to anticipatory models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Three pillars of anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Realtime data streams&lt;/strong&gt; – Fresh, continuous signals flowing through your systems without latency or data loss&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Reliability at scale&lt;/strong&gt; – Infrastructure trusted to maintain consistency across millions of interactions, measured in years of uptime&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Adaptive architecture&lt;/strong&gt; – Event-driven systems that evolve with customer expectations without requiring rebuilds&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Ready to build anticipatory experiences?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ably's realtime platform delivers the continuous data streams and event-driven patterns your systems need to anticipate customer needs, with the reliability required to maintain trust at scale.&lt;/p&gt;

&lt;p&gt;Six-plus years of 100% uptime. Sub-6.5ms message delivery latency. Built-in message integrity guarantees.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ably.com/cx-tech" rel="noopener noreferrer"&gt;See how Ably powers anticipatory CX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ably.com/blog/data-integrity-in-ably-pub-sub" rel="noopener noreferrer"&gt;Read more about the technicalities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ably.com/support" rel="noopener noreferrer"&gt;Start building free or talk to our team about your use case&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>How we built an AI-first culture at Ably</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 27 Jan 2026 10:44:50 +0000</pubDate>
      <link>https://dev.to/ablyblog/how-we-built-an-ai-first-culture-at-ably-3aid</link>
      <guid>https://dev.to/ablyblog/how-we-built-an-ai-first-culture-at-ably-3aid</guid>
      <description>&lt;p&gt;Most companies talk about being "AI-first." At Ably, we decided to actually become one. We build realtime infrastructure for AI applications. To do that credibly, we need to live and breathe AI ourselves – not just in our product, but in how we work every day.&lt;/p&gt;

&lt;p&gt;A year ago, we began a company-wide push for AI adoption. This post breaks down how we did it: the pillars, the tooling, the MCP advantage, the early mistakes, the wins across engineering, marketing, sales, and finance, and the cultural momentum that turned a mandate into a mindset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2hw90fipip8r3sz621w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2hw90fipip8r3sz621w.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building an AI-first company culture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When Jamie Newcomb, Ably's Head of Product Operations, began championing internal AI adoption, the approach was straightforward: everyone at Ably should explore how AI could make them more effective. No exceptions.&lt;/p&gt;

&lt;p&gt;"We might want to tone down the language," Jamie admits with a laugh, "but it really is mandated. Everyone at Ably should be using AI to see how they can make themselves more effective. But it's not just about doing things faster. It's about doing things you couldn't do before. The goal is to shift the mindset, where people stop asking 'can AI help with this?' and start assuming it can, then push further: what's now possible that wasn't?"&lt;/p&gt;

&lt;p&gt;Today, that mandate has evolved into something far more organic. A company-wide culture where AI isn't just accepted, it's expected.&lt;/p&gt;

&lt;p&gt;For a company processing &lt;a href="https://ably.com/docs/platform/architecture/platform-scalability" rel="noopener noreferrer"&gt;2 trillion operations monthly&lt;/a&gt;, this isn't about following trends, it's about credibility. It's about walking the walk. To build &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;AI Transport&lt;/a&gt; that developers can trust for agentic workloads, we need firsthand experience of how AI performs in real operational environments, both the advantages and the pitfalls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7he4g7vo9qxz4xk49s0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7he4g7vo9qxz4xk49s0.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Three pillars of successful AI adoption&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ably's approach to AI rests on three interconnected pillars:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Internal AI adoption and enablement&lt;/strong&gt;: Integrating AI into workflows and processes across every team to enhance capabilities and drive productivity improvements. The goal isn't just providing tools, it's automating repetitive, time-consuming tasks so people can focus on strategic thinking and creative problem-solving.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI developer experience&lt;/strong&gt;: Using AI to make Ably's platform more discoverable and easier to use for developers. This means AI-enhanced documentation, intelligent tooling, and optimized SDK experiences, empowering developers to build real-time products faster with the help of LLMs. The goal is to position Ably as essential infrastructure for real-time user experiences powered by AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI product enhancement&lt;/strong&gt;: Making proactive, explicit efforts to understand AI use cases where Ably delivers value, determining what we need to enable those use cases, and ensuring those capabilities are part of our roadmap. This pillar is about building infrastructure informed by real customer needs, both known and yet to be discovered.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"My main role is about process efficiency in product engineering," Jamie explains. "And that naturally extended to AI adoption. We believe there are significant productivity improvements we can make if everyone adopts AI thoughtfully across the company."&lt;/p&gt;

&lt;p&gt;These pillars aren't separate initiatives, they're a unified strategy. Internal productivity adoption teaches us what works in practice. Developer experience ensures we're making Ably discoverable and easy to use for the growing number of developers building with AI. And AI product enhancement ensures we're building infrastructure informed by real customer needs, not just theory. This article focuses primarily on the first pillar, but the three are deeply connected. What we learn from using AI internally shapes how we build for developers using AI externally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9t3d5qb2dlg5ovt3h1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9t3d5qb2dlg5ovt3h1t.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The MCP advantage&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Perhaps the most significant internal development has been Ably's adoption of the Model Context Protocol (MCP), built over the summer of 2025.&lt;/p&gt;

&lt;p&gt;"The Ably MCP connects all our internal tools together," Jamie explains. "It lets people access data across systems via AI assistants. Building this and seeing it genuinely change how people work has been incredibly rewarding."&lt;/p&gt;

&lt;p&gt;What started as an experiment to see what was possible has grown into a company-wide platform that's now critical to daily workflows, integrating 15+ services through over 140 tools. Engineers can check CI build status and debug workflow failures without leaving their conversation. Product managers search across Jira issues, GitHub PRs, and Slack threads in a single query. Sales teams pull Gong call transcripts and HubSpot contact history to prepare for customer meetings. The breadth is significant: GitHub, Jira, Confluence, Slack, HubSpot, Gong, Jellyfish, Metabase, PagerDuty, GSuite and more,  all accessible through natural conversation.&lt;/p&gt;

&lt;p&gt;Before MCP, every AI interaction started from zero,  engineers manually explaining Ably's infrastructure, marketers pasting in brand guidelines, constant context-switching that made AI feel like more work rather than less.&lt;/p&gt;

&lt;p&gt;Now when an Ably employee opens Claude, they're not starting from scratch. Through MCP, they have immediate access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shared context and prompt library&lt;/li&gt;
&lt;li&gt;Company knowledge and documentation&lt;/li&gt;
&lt;li&gt;Ably's tone of voice guidelines and style guides&lt;/li&gt;
&lt;li&gt;Live data from internal tools and systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of "Here's what Ably is, here's our tone of voice, now help me write this email," it becomes simply: "Help me write this customer email about latency improvements." The AI already knows.&lt;/p&gt;

&lt;p&gt;Scaling to 140+ tools created its own challenge: context limits. Ably solved this with a tool registry that lets the AI discover only what it needs for each task, keeping interactions lean and responsive.&lt;/p&gt;

&lt;p&gt;"That context library is really important," Jamie emphasises. "The prompts for critical workflows (like our ICP matching) are all version controlled. When something needs adjusting, it's not about AI being wrong. It's about iterating on what you're asking the AI to do."&lt;/p&gt;

&lt;p&gt;The platform continues to evolve based on team feedback. When engineers noticed they were dropping out the terminal to check GitHub Actions builds, new workflow tools were shipped within hours. Claude Code is used heavily to maintain and extend the MCP itself, with Claude's Agent SDK integrated throughout the development workflow. Using AI to build AI tooling is a big part of why the velocity is so high. That responsiveness, treating internal AI tooling as a living product rather than a one-off project, reflects how deeply AI has become embedded in Ably's operating culture.&lt;/p&gt;

&lt;p&gt;Jamie spoke at length with Jellyfish on how Ably moved beyond data retrieval to unlock real analysis and insights through MCP, and &lt;a href="https://jellyfish.co/blog/how-ably-makes-magic-with-jellyfishs-mcp/" rel="noopener noreferrer"&gt;you can read the full article here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdf3nbs977hkjx64e5b8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdf3nbs977hkjx64e5b8x.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI tool selection
&lt;/h2&gt;

&lt;p&gt;When Ably first encouraged company-wide AI adoption, the approach was deliberately open-ended. People experimented with ChatGPT, Claude, and workflow orchestration tools like N8N, Zapier, and Relay.&lt;/p&gt;

&lt;p&gt;"We've settled on Claude for our primary AI, particularly Claude Code for engineers, but people have the freedom to use whatever works best for them," Jamie says. "If someone has a strong case for a different tool, that's fine. We're not prescriptive about it."&lt;/p&gt;

&lt;p&gt;Everyone at Ably has access to Claude for day-to-day work, whether that's drafting documents, thinking through problems, or exploring ideas. For workflow automation, Relay emerged as the orchestration layer, handling the multi-stage pipelines that power lead enrichment, ICP scoring, and sales alerts. The combination of Claude for reasoning and Relay for orchestration has become Ably's default stack, though teams remain free to experiment.&lt;/p&gt;

&lt;p&gt;This flexibility matters, especially given Ably's positioning around AI Transport. "We can't just say 'use Claude' when we're building infrastructure that works with any LLM provider," Jamie notes. "We need to show that our approach works regardless of which AI you're using."&lt;/p&gt;

&lt;h2&gt;
  
  
  Results by team
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Engineering
&lt;/h3&gt;

&lt;p&gt;All engineers now use Claude Code for agentic coding, but the workflows vary based on the task.&lt;/p&gt;

&lt;p&gt;For narrow, well-defined tickets, Claude can often one-shot a solution. Engineers point it at the relevant files, describe what they want, and use test-driven development as a guardrail. Claude writes the test first, sees it fail, writes the implementation, and confirms the test passes. For larger tasks, the approach is more iterative: Claude generates a plan as a markdown file, the engineer reviews and refines it, then kicks off implementation in a fresh context with the plan as input.&lt;/p&gt;

&lt;p&gt;Discovery is another common use case. Engineers ask Claude questions about the codebase, "where does X get used?", "how does a message get from acceptance to being broadcast out to clients?", using it as a way to navigate complex systems without reading through thousands of lines of code.&lt;/p&gt;

&lt;p&gt;The Ably MCP bridges the gap between documentation and code. Engineers pull context from Confluence docs, have Claude synthesise summaries, and feed those into coding sessions, turning scattered documentation into usable implementation context. Some are experimenting with Claude Code running asynchronously in the browser, queuing up tasks from a phone and reviewing the work later.&lt;/p&gt;

&lt;p&gt;Beyond individual workflows, Claude is integrated into the development pipeline itself. Claude's Agent SDK is connected to GitHub to generate implementation context, review PRs, and fix CI issues before code reaches production. When a PR goes up, AI reviews it for obvious issues first, then engineers review it as they would any other colleague's work.&lt;/p&gt;

&lt;p&gt;One principle remains constant: a single human author owns every PR, regardless of how much was AI-generated. The practice of engineering judgment, knowing what to accept, what to push back on, and what to rewrite, is still the job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoqy7fyvs6seh7risg7q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoqy7fyvs6seh7risg7q.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Marketing
&lt;/h3&gt;

&lt;p&gt;The Marketing team wanted to spend more time shaping the narrative and shipping campaigns, not doing repetitive admin. There are always multiple activities in flight, each needing planning, research, execution, reporting, and analysis. That's where AI has been a huge productivity lever: the team has adopted it to streamline the "admin" layer so they can increase both output and quality without adding headcount.&lt;/p&gt;

&lt;p&gt;Today, the team uses a small stack of AI tools across the lifecycle. They analyse Gong calls to accelerate market research and tighten messaging and positioning. They use Claude to pull and synthesise data from multiple sources to scope and validate content opportunities faster. They also automate lead validation and categorisation for sales follow-up, enriching contact and company data so the first human touch starts from context, not guesswork. And they map the customer journey with attribution, using AI to connect what prospects do pre-signup to intent signals, so they can prioritise the right audiences and double down on what's actually working.&lt;/p&gt;

&lt;p&gt;For example lead qualification that used to take hours, is now a multi-stage AI pipeline that runs automatically on every signup. The system researches companies across 6+ data sources (Crunchbase, LinkedIn, SEC filings, PitchBook), extracts structured data, scores against 8 ICP criteria, classifies personas, and routes alerts to Slack with tier assignments and recommended actions – all before anyone on the team sees the lead.&lt;/p&gt;

&lt;p&gt;"Marketing used to spend considerable time on this," Jamie recalls. "Now the first time they see a lead, it already has a confidence-scored ICP assessment, enriched company data, and suggested next steps."&lt;/p&gt;

&lt;h3&gt;
  
  
  Sales
&lt;/h3&gt;

&lt;p&gt;New lead assignment uses multi-signal analysis (employee count, funding raised, revenue for public companies) to automatically route accounts to Commercial, Enterprise, or Strategic segments. For qualified leads, AI generates personalised email sequences based on the ICP analysis, tailoring messaging to the prospect's industry, technical challenges, and relevant customer references.&lt;/p&gt;

&lt;p&gt;For existing customers, AI monitors self-service accounts against usage limits, surfacing expansion opportunities when customers approach thresholds and flagging critical capacity alerts that need immediate outreach. Relay handles the orchestration across all workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmfzrl8c8o8wg6q9cr4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmfzrl8c8o8wg6q9cr4r.png" alt=" " width="800" height="276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Finance
&lt;/h3&gt;

&lt;p&gt;Finance operations at Ably are treated like a tech product, using AI as a force multiplier to engineer away repetitive work.&lt;/p&gt;

&lt;p&gt;The team systematically verifies contracts, builds smarter revenue models, and automates reconciliation work. A recent hackathon project eliminates thousands of monthly clicks in the Stripe-to-Xero process, the kind of repetitive work that most finance teams wouldn't know where to start automating.&lt;/p&gt;

&lt;p&gt;They use Ably's MCP to retrieve data from Xero, then create and update sheets directly through Claude, turning what would be manual exports and data entry into conversational requests. It's a small example of how the platform extends beyond engineering and into every corner of the business.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3pih5iy7kv07zh1bgf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3pih5iy7kv07zh1bgf7.png" alt=" " width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The cultural shift
&lt;/h2&gt;

&lt;p&gt;Creating an AI-first culture isn't just about providing tools – it's about enablement, support, and honest assessment of where you are versus where you're headed.&lt;/p&gt;

&lt;p&gt;We run AI drop-in sessions every Friday where team members can bring questions, share what they've built, or explore new ideas. An internal Slack channel serves as a continuous stream of AI experiments, wins, and collaborative problem-solving.&lt;/p&gt;

&lt;p&gt;"When Charlotte [Delivery Manager] and I approach teams, we don't even talk about AI initially," Jamie reveals. "We ask: what are your repetitive processes? Once teams understand their processes, then you can start the AI conversation."&lt;/p&gt;

&lt;p&gt;"Anyone can build something now," Jamie says. "The barrier to solving a problem has basically been removed because people can use AI to build the solution themselves."&lt;/p&gt;

&lt;p&gt;The result is what Jamie calls the "wow moment": when someone successfully builds their first AI-powered solution, a ceiling lifts. "Once people have that moment, they just keep building."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filers6qw4hdbfdx9vnhf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filers6qw4hdbfdx9vnhf.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;But Jamie is candid about where Ably still has room to grow. "To be completely honest, we haven't hit our potential yet," he admits. "We've made real progress, but there's still more impact to unlock from AI across how we work, our processes, and how we achieve our product outcomes. And when we think we've got there, there'll still be more room to grow."&lt;/p&gt;

&lt;p&gt;The vision for what's next is clear: continuing to integrate AI deeper into how Ably works. The foundations are in place: agentic coding, AI-assisted PR reviews, automated workflows across teams. But the true potential lies in making these the default across every function, not just the teams that adopted early.&lt;/p&gt;

&lt;p&gt;"The biggest gains come from how people think, not just what tools they use," Jamie explains. "When people stop asking 'can AI help with this?' and start assuming it can, that's where the real impact comes from."&lt;/p&gt;

&lt;p&gt;The most significant outcome isn't any specific tool or workflow, it's that cultural shift in action. "We don't have a problem at Ably where people are on the fence about whether AI can help them," Jamie reflects. "We've shown that it can. Now it's about enablement and encouraging people to identify problems they can solve themselves."&lt;/p&gt;

&lt;p&gt;The same infrastructure philosophy that powers our internal AI adoption powers our AI Transport product. &lt;a href="https://ably.com/blog/evolution-of-realtime-ai" rel="noopener noreferrer"&gt;Read how Ably enables reliable, scalable realtime experiences for conversational AI here.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>leadership</category>
      <category>mcp</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Gen‑2 AI UX: Conversations that stay in sync across every device</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 27 Jan 2026 10:15:10 +0000</pubDate>
      <link>https://dev.to/ablyblog/gen-2-ai-ux-conversations-that-stay-in-sync-across-every-device-302m</link>
      <guid>https://dev.to/ablyblog/gen-2-ai-ux-conversations-that-stay-in-sync-across-every-device-302m</guid>
      <description>&lt;p&gt;Start a conversation on your laptop, finish it on your phone. The context just follows you. That's what cross-device AI sync delivers. No reloading history, no reintroducing yourself, just one continuous thread across every screen. It builds trust, reduces friction, and makes the assistant feel like a single, persistent presence. This post unpacks why users expect it, what makes it technically tricky, and what your system needs to get it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI conversations must survive the device switch
&lt;/h2&gt;

&lt;p&gt;Modern users have grown to expect realtime, seamless experiences from their apps and AI tools. They want instantaneous responses, continuous interactions, and no interruptions as they move between devices. This expectation extends to AI-powered experiences: if you start a conversation with an AI assistant on your laptop, you should be able to pick it up on your phone or another tab without missing a beat. Equally, if you have initiated a long-running asynchronous task you want to be notified once it's completed, no matter which device you're using at the time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What users want, and why this enhances the experience
&lt;/h2&gt;

&lt;p&gt;Users want continuous, identity-aware AI conversations that follow them across devices. In practice, this means an AI chat session linked to their identity rather than a single tab or device. The conversation should feel like a single thread they can return to at any time.&lt;/p&gt;

&lt;p&gt;That continuity builds trust. The AI isn't "forgetting" just because you switched devices. A remembered history signals reliability and intent, helping users feel the assistant is genuinely useful. Multi-turn conversations flow naturally, and users avoid repeating themselves or reconstructing context.&lt;/p&gt;

&lt;p&gt;This matters even more once AI systems move beyond simple chat. When an LLM is running long-lived, asynchronous work such as multi-step research, tool calls, or background analysis, users expect to see progress and results wherever they happen to be at the time. You might start a task on your desktop, step away while the model works, and then pick up your tablet to see the output appear as soon as it's ready.&lt;/p&gt;

&lt;p&gt;Real-world usage makes multi-device continuity unavoidable. These moments must be frictionless and reinforce the sense that the AI is persistent, reliable, and working on your behalf rather than being tied to a fragile client session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is proving challenging
&lt;/h2&gt;

&lt;p&gt;HTTP is fundamentally stateless. Each request stands alone, meaning conversation context has to be manually preserved and restored for both the client and the server. This makes cross-device AI session continuity complex.&lt;/p&gt;

&lt;p&gt;Having clients poll for updates is inefficient and adds latency. Long-polling or server-sent events help, but only partially. They don't enable simultaneous bi-directional, low latency messaging, which is what smooth AI conversations require.&lt;/p&gt;

&lt;p&gt;Handling reconnections, preserving message order, and managing updates across multiple active clients requires considerable infrastructure. Doing this reliably, at scale, across networks and devices, is beyond what the typical product team can or should build from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you need a drop-in AI transport
&lt;/h2&gt;

&lt;p&gt;Given the challenges described, building a robust system for realtime synchronisation from scratch can significantly drain engineering resources and slow product velocity. This is where a drop-in AI transport layer becomes essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent, bi-directional messaging
&lt;/h3&gt;

&lt;p&gt;To support conversations that stay in sync across devices, such a layer must offer persistent, bi-directional messaging using protocols like WebSockets for AI streaming. This allows for continuous, low-latency communication in both directions, enabling the AI to push updates and the client to send input without waiting for discrete request/response cycles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z8y6lh6wbk9z2x5ks33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z8y6lh6wbk9z2x5ks33.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Identity aware fan-out
&lt;/h3&gt;

&lt;p&gt;Equally important is identity-aware fan-out. The transport system needs to recognize all active sessions associated with a single user and ensure that every message or state update is sent to all of those endpoints. That means when a user sends a message on one device, every other device they're signed in on should immediately reflect the change.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01i4piouiymino6cca9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01i4piouiymino6cca9l.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Ordering and session recovery
&lt;/h3&gt;

&lt;p&gt;The system also needs to preserve message ordering and support reliable session recovery. If the connection drops momentarily, say from a device switch or network disruption, the user shouldn't lose messages or see them out of order. A well-designed transport layer offers mechanisms to replay missed events and keep message sequences intact, ensuring consistency in the conversation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xvknzom35uh6g1ar9k6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xvknzom35uh6g1ar9k6.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Presence tracking
&lt;/h3&gt;

&lt;p&gt;This enables the backend to know which devices are currently online and active. It helps coordinate updates, prevents redundant notifications, and can be used to power features like realtime indicators for typing or collaborative editing across devices.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Streaming support&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To maintain a high-quality conversational UX, the transport layer must support LLM token streaming. This includes delivering partial, realtime updates from the AI model as it generates responses. That stream must arrive quickly, in order, and appear simultaneously on any active device.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmmr5enl0lul9dxvt0uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmmr5enl0lul9dxvt0uy.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ably's infrastructure supports all of these capabilities as part of its realtime platform. It eliminates the need to custom-build low-level transport solutions, allowing engineering teams to focus on building intelligent, agentic features instead of protocol logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the desired experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The table below breaks down what users expect from cross-device AI conversations, what your transport layer must support to deliver those experiences, and the technical mechanics that make it all work.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User Experience Desired&lt;/th&gt;
&lt;th&gt;Required Transport Layer Features&lt;/th&gt;
&lt;th&gt;Underlying Technical Requirements&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Identity-aware sync across devices&lt;/td&gt;
&lt;td&gt;Identity-aware fan-out&lt;/td&gt;
&lt;td&gt;Map user identity to all active sessions and ensure message fan-out across them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State replication across sessions&lt;/td&gt;
&lt;td&gt;Maintain consistent shared state for conversation history and updates across devices.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Realtime continuation of streaming outputs and input states&lt;/td&gt;
&lt;td&gt;Durable stream relay&lt;/td&gt;
&lt;td&gt;Emit model outputs as streaming tokens; buffer streams server-side to continue on reconnect or switch.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No loss of flow during context switches&lt;/td&gt;
&lt;td&gt;Reliable message ordering&lt;/td&gt;
&lt;td&gt;Guarantee delivery order of messages across devices, preserving conversational context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session recovery on reconnect&lt;/td&gt;
&lt;td&gt;Rehydrate sessions with missed messages after disconnects or page refreshes.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A persistent mental model of the AI assistant&lt;/td&gt;
&lt;td&gt;Live session multiplexing&lt;/td&gt;
&lt;td&gt;Allow multiple client connections per user and route interactions through a unified session view.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Realtime delta propagation&lt;/td&gt;
&lt;td&gt;Transmit message edits or UI state updates as granular deltas to all active endpoints.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-device sync&lt;/td&gt;
&lt;td&gt;Mirror updates across devices in real time, including UI elements and message scroll state.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session handoff and presence awareness&lt;/td&gt;
&lt;td&gt;Detect presence state and manage smooth transition of active sessions between devices.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cross device AI? You can ship it now.
&lt;/h2&gt;

&lt;p&gt;Seamless cross-device conversations aren't futuristic - they're achievable today. You don't need to rebuild your stack to make it happen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; provides all you need to support persistent, identity-aware, streaming AI experiences across multiple clients. If you're working on Gen‑2 AI products and want to get this right, we'd love to talk.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>mobile</category>
      <category>ux</category>
    </item>
    <item>
      <title>The new Ably dashboard: realtime visibility in your hands</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Thu, 22 Jan 2026 17:54:35 +0000</pubDate>
      <link>https://dev.to/ablyblog/the-new-ably-dashboard-realtime-visibility-in-your-hands-470a</link>
      <guid>https://dev.to/ablyblog/the-new-ably-dashboard-realtime-visibility-in-your-hands-470a</guid>
      <description>&lt;p&gt;We've rebuilt the Ably dashboard to give developers clear, realtime visibility into how their applications behave.&lt;/p&gt;

&lt;p&gt;This isn't just a cosmetic refresh. It's a shift from a configuration-first dashboard to a live observability surface. One that lets you see channels, connections, messages, and errors as they happen, debug issues instantly, and understand usage without stitching together logs and tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why we rebuilt the dashboard
&lt;/h2&gt;

&lt;p&gt;The previous dashboard did a solid job in helping you manage your apps, API keys, and configurations. But it didn't show what was actually happening inside your realtime system. When something broke or behaved unexpectedly, you were left piecing together clues from SDK logs, APIs, and external tools. There wasn't a single place to answer operational questions like who's connected right now, what's happening on a particular channel, or whether a pattern of errors is brand new or recurring.&lt;/p&gt;

&lt;p&gt;The new dashboard brings realtime observability directly into the browser. No setup, no extra tooling, no context switching - just a live window into Ably.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoi9nei1gbr6vawivb18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoi9nei1gbr6vawivb18.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's new (and why it matters)
&lt;/h2&gt;

&lt;p&gt;At the heart of this release are four capability upgrades that change how your team operates realtime systems on Ably. Each one is useful on its own; together, they make it far easier to understand behavior in your apps, debug faster, and understand what's driving usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Channel and connection inspectors&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The new inspectors provide realtime visibility into how your system behaves as data flows through Ably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The channel inspector&lt;/strong&gt; shows you who's attached to the channel, which messages are being sent, and what's happening in the presence set. You can also see which rules and integrations are active on that channel. Alongside that live activity, it surfaces realtime metrics like message rates, occupancy, and connection counts, so you can see performance as it changes - not after the fact.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gspiop09ziamtayszsz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gspiop09ziamtayszsz.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The connection inspector&lt;/strong&gt; enables you to see who's connected to your app. You can choose  a specific connection and see which channels it's attached to and view live statistics such as the rate of messages being published by that connection. Additionally, you can see information such as the geographical location of the connection and the SDK it's connected to Ably with.Combined, the inspectors put realtime visibility into who's interacting with your app, and which channels they're interacting with. You now have the ability to debug any issues far more easily, such as 'why is this channel still active?'&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll2ehp05lpfaf31k6npy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll2ehp05lpfaf31k6npy.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Log search
&lt;/h3&gt;

&lt;p&gt;Live streams are great for what's happening right now, but many investigations start with a timestamp. Log search lets you query historical platform events so you can understand past behavior, trace what happened and understand why it occurred. You can then compare today's traffic with last week's. It's ideal for debugging anomalies and spotting patterns, especially when you're trying to answer whether a problem is new or recurring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2tidfyw3rv03r1hz5lp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2tidfyw3rv03r1hz5lp.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Reports and analytics
&lt;/h3&gt;

&lt;p&gt;The new reports section gives you aggregated visibility into how your apps are used over time - message volumes, connection and channel durations - so you can understand where consumption is coming from and what's driving traffic. This is particularly helpful when needing to explain usage internally, plan scaling work, or map realtime costs back to product features.&lt;/p&gt;

&lt;p&gt;This is the foundation for deeper analytics arriving in future releases, including more granular breakdowns by product, SDK and device, plus finer‑grained views by app, channel, and namespace.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgbj9w0bdkazot79iq0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgbj9w0bdkazot79iq0s.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Web CLI
&lt;/h3&gt;

&lt;p&gt;We've also introduced a new web CLI. A browser-based command line that lets you run Ably commands instantly. You can publish and subscribe to messages, enter presence, and manage your app configuration without any local setup. It complements the redesigned dashboard to give you a fast way to interact with Ably from anywhere. The web CLI is a powerful tool that's invaluable for exploring Ably features without requiring you to write any code. It is especially useful during support calls where you need to quickly reproduce a certain behavior or send a specific set of messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci61jl0be932cs8xlmw8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci61jl0be932cs8xlmw8.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs tail and live telemetry
&lt;/h3&gt;

&lt;p&gt;Each inspector includes a live, realtime log stream scoped to the resource you're viewing. If you're inspecting a channel, you see the events relevant to that channel; if you're inspecting a connection, you see the events relevant to that connection. This means you can trace behaviour as it happens, correlate spikes in live metrics with specific platform events, and debug instantly rather than collecting evidence after the incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  A more modern, product-first dashboard
&lt;/h3&gt;

&lt;p&gt;Alongside these new capabilities, we've modernized the dashboard itself. Navigation is cleaner and faster, with dedicated sections for each Ably product. The result is a more intuitive experience that helps teams get to the right tools quicker; whether they're debugging, trying out a new product, testing new features, or monitoring live traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t68c8m7k0o6nljn2zbm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t68c8m7k0o6nljn2zbm.png" alt=" " width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's coming next
&lt;/h2&gt;

&lt;p&gt;This release is a major step toward making the dashboard a full observability layer for Ably. We're working towards live logs: a global realtime stream of platform events across all resources to complement the existing log search functionality - so you can see what &lt;em&gt;is&lt;/em&gt; happening live, and what &lt;em&gt;has&lt;/em&gt; happened previously. We're also continuing to expand the reports section to provide richer visualization of your usage, performance, and reliability, across all your apps.&lt;/p&gt;

&lt;p&gt;In parallel, we're continuing to modernize the remaining areas of the dashboard so that all your resources benefit from enhanced observability and analytics.&lt;/p&gt;

</description>
      <category>dashboard</category>
      <category>observability</category>
      <category>realtime</category>
      <category>devtools</category>
    </item>
    <item>
      <title>The evolution of realtime AI: The transport layer needed for stateful, steerable AI UX</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Wed, 21 Jan 2026 17:16:19 +0000</pubDate>
      <link>https://dev.to/ablyblog/the-evolution-of-realtime-ai-the-transport-layer-needed-for-stateful-steerable-ai-ux-2kl4</link>
      <guid>https://dev.to/ablyblog/the-evolution-of-realtime-ai-the-transport-layer-needed-for-stateful-steerable-ai-ux-2kl4</guid>
      <description>&lt;p&gt;When we launched Ably in 2016, we set out to solve a fundamental problem: delivering reliable, low-latency real-time experiences at scale. So we set out to build a globally distributed system that didn't force developers to choose between latency, integrity, and reliability – trade-offs that had defined the realtime infrastructure space for years.&lt;/p&gt;

&lt;p&gt;Fast forward to today, and we're reaching 2 billion devices monthly, processing 2 trillion operations for customers who demand rock-solid infrastructure for their mission-critical features. But over the past year, as AI has transformed from a backend optimisation tool into a front-and-centre user experience, we've been asking ourselves a critical question: &lt;strong&gt;What's Ably's role in the AI ecosystem?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  From "nice-to-have" to essential infrastructure
&lt;/h3&gt;

&lt;p&gt;A year ago, if you'd asked us about Ably's AI story, we would have told you that yes, customers were using us. Companies like HubSpot and Intercom were leveraging Ably for token streaming and realtime AI features. But honestly? The value proposition felt incremental. Traditional LLM interactions followed a simple request-response pattern: send a query, stream back tokens, done. HTTP streaming handled this reasonably well, and while Ably offered benefits, there wasn't a smoking gun reason to use us specifically for AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's changed dramatically.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The shift to Gen 2 AI experiences
&lt;/h3&gt;

&lt;p&gt;What we're calling "Gen 2" AI experiences are fundamentally different from what came before. Instead of simply querying a model's training data, today's AI agents reason, search the web, call APIs, interact with tools via MCP (Model Context Protocol), and orchestrate complex multi-step workflows. Just look at how Perplexity searches, or how ChatGPT now breaks down complex requests into observable reasoning steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q46rytcsnebh5srj14j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q46rytcsnebh5srj14j.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This shift introduces an entirely new set of challenges:&lt;/p&gt;

&lt;h3&gt;
  
  
  Modern AI UX problems
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Async by default&lt;/strong&gt;: When an AI agent needs 30 seconds or a minute to complete a task (not 3 seconds) user behaviour changes. They switch tabs, check their phone, or start other work. A simple HTTP request suddenly needs to handle disconnections, reconnections, and state recovery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continuous feedback is mandatory&lt;/strong&gt;: Users need to know what's happening. "Searching the web... Analysing documents... Calling your CRM..." This isn't a nice-to-have anymore. Without feedback, users assume the system has failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-threading conversations&lt;/strong&gt;: Imagine asking a support agent about your order status. While they're checking, you ask another question. Now you have two parallel operations that need coordination. The agent needs to know what else is in flight and potentially prioritise or sequence responses intelligently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-device continuity&lt;/strong&gt;: Users want to be able to set tasks running and then pick-up later from where they left off. They may start a deep research query on their laptop, close it, and then want to check progress on their phone an hour later. The entire conversation state needs to seamlessly transfer.&lt;/p&gt;

&lt;h3&gt;
  
  
  The transport layer modern AI needs
&lt;/h3&gt;

&lt;p&gt;Our vision for addressing these challenges centres on what we're calling the &lt;strong&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;&lt;/strong&gt;. A drop-in solution that handles the complexity of making the AI UX resilient and multi-device - so developers can focus on building great agent experiences, not wrestling with networking problems.&lt;/p&gt;

&lt;p&gt;We focus on everything between your AI agents and end-user devices, leaving orchestration, LLM selection, and business logic where they belong – in your control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: The foundation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the base level, the AI Transport provides what you'd expect from Ably: &lt;a href="https://ably.com/platform" rel="noopener noreferrer"&gt;&lt;strong&gt;bulletproof reliability&lt;/strong&gt;&lt;/a&gt;, multi-device synchronisation, and automatic resume capabilities. But the real shift is architectural. Instead of your agent responding directly to requests, it returns a conversation ID. Devices subscribe to that conversation, and from that point forward, the agent pushes updates through Ably.&lt;/p&gt;

&lt;p&gt;This simple change unlocks powerful capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decoupling&lt;/strong&gt;: Agents and devices can disconnect and reconnect independently without losing continuity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bidirectional control&lt;/strong&gt;: Need to stop an agent mid-task or ask a follow-up question? There's a direct communication channel that doesn't require complex routing infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State recovery&lt;/strong&gt;: &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;Reconnecting devices don't replay every token&lt;/a&gt;. They get current state and resume from there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Richer orchestration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The next layer introduces shared state on channels using live shared state. This enables sophisticated coordination:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent presence and status&lt;/strong&gt;: Devices know if agents are active, thinking, or have crashed. Agents can broadcast their current focus ("Analyzing Q4 data...") as state rather than events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent coordination&lt;/strong&gt;: When multiple agents work simultaneously -  say, one handling a technical query while another processes a billing question -  they can see each other's state and coordinate without stepping on each other's work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context-aware prioritisation&lt;/strong&gt;: Agents can see if a user is actively waiting versus having backgrounded their session, enabling smarter resource allocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client-side tool calls&lt;/strong&gt;: In co-pilot scenarios, agents can query the client directly about user context "Is the user currently editing this field?" without roundtripping through backend systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Enterprise-grade observability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because everything flows through Ably, you gain comprehensive visibility into the last mile of your AI experience. Stream observability into your existing systems, integrate with Kafka for audit trails, and leverage enterprise features like SSO and SOC 2 compliance that come standard with Ably's infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F668a91lqa6q7llsxl01x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F668a91lqa6q7llsxl01x.png" alt=" " width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A new way of building
&lt;/h3&gt;

&lt;p&gt;What excites us most is the idea of providing a stateful conversation layer that removes infrastructure concerns from the developer's plate. Think of it as abstract storage for conversation state, combined with the realtime capabilities developers need for modern AI UX.&lt;/p&gt;

&lt;p&gt;The developers building these experiences don't want to solve networking problems. They want to focus on prompts, orchestration, RAG pipelines, and agent logic. The transport layer shouldn't be where they spend their time – but it will become critical as user expectations evolve to match what ChatGPT, Perplexity, and Claude are demonstrating daily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Framework-agnostic by design
&lt;/h3&gt;

&lt;p&gt;One pattern we've noticed: as engineering teams mature in their AI journey, they tend to move away from monolithic frameworks and build custom orchestration logic. This makes sense – these systems become core to their business differentiation.&lt;/p&gt;

&lt;p&gt;That's why the Ably AI Transport is deliberately framework-agnostic. Yes, we're building drop-in integrations with OpenAI's agent framework, LangChain, LangGraph, Vercel AI SDK, ag-ui and others to make getting started trivially easy. But the architecture doesn't lock you in. Swap out your orchestration layer, change LLM providers, rebuild your agent logic – your transport layer and device communication remain consistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The road ahead
&lt;/h2&gt;

&lt;p&gt;If you're building AI experiences and wrestling with questions like "How do I handle interruptions?", "What happens when users switch devices mid-conversation?", or "How do I coordinate multiple parallel agent tasks?" – we'd love to talk. We're convinced there's a better way to build these experiences, and it starts with not having to rebuild the real-time infrastructure layer from scratch.&lt;/p&gt;

&lt;p&gt;The plumbing shouldn't be your problem. Building great AI experiences should be.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>realtime</category>
      <category>infrastructure</category>
      <category>transport</category>
    </item>
  </channel>
</rss>
