Ably Blog

Posted on Jun 9 • Originally published at ably.com

AI agent streaming in action: barge-in, human handover, and session continuity

#websockets #ai #javascript #webdev

TL;DR: AI agent streams break in ways most frameworks don't handle: dropped connections, mid-task interruptions, human handovers across devices. This post walks through a live demo of how Ably AI Transport handles all three — barge-in via explicit cancel signals, durable organization-side HITL, and decoupled multi-agent progress via LiveObjects.

You're mid-conversation with an AI support agent. You've explained the problem, the agent is halfway through a response, and the connection drops. When you reconnect, the response is gone.

You type the same question again. The agent asks the same clarifying questions again. Three minutes of context, gone. Not because the model forgot it, but because the delivery layer stored nothing.

Connection drops, page refreshes, and device switches all fail for the same reason: session state lives in the delivery connection, not independently of it. Ably AI Transport fixes this by storing the session in a channel that outlasts any individual connection. The demo below covers barge-in, human handover, and multi-agent coordination in depth: the primitives most production teams end up building from scratch.

Key takeaways:

Connection drops restart most AI streams from scratch. Ably AI Transport buffers session output in the channel, so clients reconnect and catch up without re-running inference.
Barge-in requires a bi-directional channel. SSE can't distinguish a user interrupt from a network drop; AI Transport delivers cancel and redirect as explicit channel signals the agent acts on.
Organization-side human handover — where a supervisor joins a live session on a different device hours later — is the HITL case most frameworks leave unsolved. AI Transport's durable session persists the pending approval in channel history until the right person responds.

Mike Christensen (Pub/Sub team lead at Ably) walks through all of these primitives in a live multi-agent holiday planning app. The sections below follow the same chapter structure as the video.

Why AI agent streams break in production

Connection drops mid-stream. Standard HTTP streaming stores no session state server-side. When the connection closes, the tokens generated during the gap disappear: the delivery layer was never asked to hold them. The client reconnects to an empty state and re-prompts.

Page refresh loses the stream. Most AI implementations store token state in the browser: React component state, a JavaScript variable tracking the partial response. When the page reloads, that state is gone. The agent has no awareness that the client disappeared mid-generation, and no mechanism to re-stream output that it already produced.

Device switches lose the session. Sessions are tied to connections, and connections are tied to devices. Move from laptop to phone and the conversation doesn't follow. The new device has no path to the session's history.

All three share the same root cause. Generation state is coupled to a single delivery connection. Decoupling them — by storing the session in a channel that outlasts any individual connection — is what fixes all three at once. For a deeper look at timeout sources and protocol fallback, see Is WebSockets enough for AI chat?.

How Ably AI Transport handles connection recovery and session continuity

Server-side buffering and offset-based replay. Every token the agent publishes goes to the session channel as it's generated, regardless of whether the client is connected. On reconnect, AI Transport uses untilAttach to deliver everything published during the gap, in order, before the live stream resumes. The LLM never re-runs; the client catches up.

Session on the channel, not the connection. The session lives in the channel, not in the connection that opened it. Any device subscribing to the same channel name joins the same session: full conversation history, followed by the live stream from its current position. Two browser tabs, a laptop and a phone, a page reload mid-response: all receive the same unbroken state.

Channel history for context. When a client has been offline beyond the live recovery window, channel history provides the full conversation. Clients load older messages using view.loadOlder(), paginating back through the session until they have the full context. For users who are offline entirely, push notifications via FCM, APNs, or Web Push can deliver agent completions when they return. Push notification delivery is currently Partial in the feature set.

In the demo, Mike refreshes the page mid-stream, and the response picks up exactly where it stopped. Two windows open side by side show the same in-progress response, updating simultaneously.

Session continuity is the infrastructure layer. What happens on top of it: how users interact with agents in motion, how human operators step in, how multiple agents coordinate, depends on it being in place.

The next four sections cover the interaction patterns the demo demonstrates: what the user sees while the agent works, how they interrupt or redirect it, how a human operator takes over with full context, and how multiple specialised agents surface progress independently. All four require the session to be live and visible.

Agent progress visibility: what the user sees while the agent works

A user can only meaningfully interrupt an agent they can see working. Progress visibility is the prerequisite for both barge-in and human handover. Without visibility, users have no basis for interrupting: they're cancelling a process they can't see, with no information about whether to wait or redirect.

The demo surfaces four types of progress signal. Token streaming shows what the orchestrator is generating. Ably LiveObjects carries the structured progress state from each of the three specialist agents: flights, hotels, and activities. Presence shows which agents are active in the session, and task history shows what each has completed.

Each signal comes from a different source, and each arrives independently. All three specialist agents publish their progress directly, without routing through the orchestrator. So the user sees the live state from each agent simultaneously. Each agent also converts its raw query parameters into natural language using a separate model call: progress cards show "Searching for direct flights on the 14th" rather than a query object. That's what makes barge-in useful. The user's decision to interrupt is based on accurate realtime information, not a stale snapshot.

Barge-in: how users interrupt and redirect agents mid-response

In Ably's customer discovery research — which Ably's CEO, Matthew O'Riordan, walks through in this talk — interruption emerged as a critical piece of functionality once teams moved to asynchronous agent experiences. One team disabled user input entirely: with SSE, a user's stop signal looks identical to a network drop, so there was no safe way to act on it.

AI Transport changes this because the channel is bi-directional. User input arrives as a specific channel event, not a connection side effect, so the agent can act on it reliably while remaining live.

Two patterns are available, and the choice depends on what you want the user to see.

Cancel-then-send is the more common of the two. Call transport.cancel() and it publishes an explicit cancel signal on the channel: the server's abort fires, the LLM stream stops, and the turn ends with reason 'cancelled'. The session stays intact and the next message starts a clean turn. In the demo, Mike says "I want to visit a museum" while the activities agent is mid-search, the kind of redirect where there's no value in letting the original task finish. transport.cancel() fires, the search stops, and the agent starts fresh on the museum query.

const handleSend = async (text) => {
  if (activeTurns.size > 0) {
    await transport.cancel()
  }
  await send([{
    id: crypto.randomUUID(),
    role: 'user',
    parts: [{ type: 'text', text }]
  }])
}

Send-alongside is the alternative. It sends a new message without cancelling the active turn, so both run concurrently: the agent continues the first response while processing the new input. You can cancel a specific turn using transport.cancel({ turnId }) if needed. Send-alongside is appropriate when you want the user to see both responses. For example: a clarifying follow-up while the agent is finishing its response, or a comparison query where both outputs are useful.

Full API reference: Interruption and barge-in docs.

Human-in-the-loop: getting full session context to an operator on any device

Most frameworks implement one variant of human-in-the-loop (HITL) and leave the other unsolved. But the distinction between them matters in production.

User-side HITL is the pattern where the agent pauses and asks the user to approve an action before executing. For example, "Should I book this flight?". The user approves or rejects, and the agent continues. Almost every agent framework has this.

Organization-side HITL is the harder case. The agent needs to escalate to an internal supervisor: someone who may be on a different device, in a different time zone, and who might not respond for hours. This is the customer support scenario: a human agent takes over mid-conversation, with full context, without the user re-explaining anything. Most frameworks leave this unsolved.

AI Transport handles both through the same mechanism. The agent defines a tool that pauses for human input rather than executing automatically. When the LLM decides it needs approval and invokes this tool, AI Transport stops the turn and publishes the pending request to the channel as a durable message.

Any connected client sees it and can resolve it by calling view.update(). A supervisor joining on a different device hours later sees the same pending request in channel history.

The approval is a durable channel message, not a live server process waiting to time out. Calling view.update() triggers a continuation turn, and the agent picks up where it paused.

Organization-side escalation is available today; implementation guides are being finalized.

Full implementation detail: Human-in-the-loop docs.

Multi-agent coordination and shared state via Ably LiveObjects

Routing all agent activity through a central orchestrator creates a bottleneck. Every progress update has to pass through the coordinator before it appears to the user. At the scale of a multi-step, multi-agent workflow, that lag accumulates.

This demo takes a different approach. The orchestrator delegates to three specialist agents: flights, hotels, and activities, all running concurrently. Each specialist publishes its progress directly to Ably LiveObjects — bypassing the orchestrator entirely for user-facing updates.

The orchestrator waits for final results. The user sees live progress bars from all three agents updating in realtime, independently.

LiveObjects carries more than progress signals. User selections (flight, hotel, and activities choices) are written to LiveObjects state the moment the user makes a choice. When the user later asks "What's my current itinerary?", the orchestrator reads directly from LiveObjects rather than reconstructing context from chat history. If the user deleted a selection outside the chat thread, the agent sees that immediately. The conversation is one interface to the system; the source of truth is the state.

This matters because the user-facing update rate is decoupled from the orchestrator's coordination cycle. Each agent surfaces progress as fast as it produces it, with no relay step in between.

Presence adds a further signal: agents can check whether the user is actually connected before streaming. An agent completing a search while the user is offline can push a notification rather than stream into a disconnected channel.

LiveObjects availability: GA in JavaScript. Experimental in Swift and Java.

Learn more about Ably LiveObjects.

Session continuity, barge-in, and human handover aren't features that sit on top of an AI stack. They're properties of the delivery layer underneath it. The session channel is what makes them composable: the same mechanism that replays tokens on reconnect makes a pending approval durable, and lets a supervisor join a live conversation hours after it started. Most teams reach for these patterns eventually. The question is whether you build them yourself or start with infrastructure that already has them.

Ably AI Transport documentation

Which of these production problems have you hit building AI agents? Have you had to disable user input during agent responses as a workaround? Curious what the reconnection side looked like before having a session layer.