Why Long-Running AI Agents Break on HTTP, and How Ably's Durable Sessions Fix It

#webdev #devops #cloud #astro

An AI agent that summarizes a paragraph finishes in two seconds. An AI agent that researches a question, calls six tools, and drafts a report can run for four minutes — or forty. The first fits HTTP comfortably. The second fights it the whole way.

Most agent backends are still wired the way web apps have been wired since the 1990s: a client sends a request, the server sends a response, the connection closes. That contract holds because the response usually arrives fast enough that nobody notices the connection was open at all. Long-running agents break the contract. They produce output gradually, they outlive the patience of every proxy between client and server, and they keep working even after the user closes the tab. We dug into why this fails so often, and how Ably's durable session model is built to absorb it.

Where HTTP runs out of road

HTTP's request-response cycle assumes a short, bounded exchange. Three things go wrong once an agent runs for minutes instead of milliseconds.

Idle timeouts close the socket. Your connection passes through load balancers, reverse proxies, and CDNs, and each one drops connections that go quiet. An AWS Application Load Balancer closes idle connections after 60 seconds by default. An agent that reasons for 90 seconds before emitting its first token has already lost the socket underneath it.

Streaming is still one fragile pipe. Server-Sent Events and WebSockets hold the connection open and solve the timeout, which is why most agent UIs use them today. But the stream is bound to a single TCP connection. When a phone switches from Wi-Fi to cellular, a laptop sleeps, or the server is redeployed mid-task, that connection dies — and every token emitted during the gap is gone. The agent kept running on the server; the client simply stopped hearing it.

Nothing remembers what was missed. Reopen the connection and you get a fresh stream from that instant forward. HTTP gives you no way to ask which messages arrived between second 30 and second 95. The protocol has no concept of a session that outlives the socket.

The naive fix — let the client reconnect and re-issue the request — quietly doubles your bill and corrupts state. If the original run is still executing server-side, a retry starts a second one. Now two agents call the same tools, write the same files, and bill you twice for tokens. Reconnection has to resume an existing session, not start a fresh one.

What durable sessions actually mean

Ably's approach is to stop treating the session and the connection as the same object. A durable session is a logical channel that lives on the server; the WebSocket connection is just a temporary attachment to it. Three mechanisms make that work.

Decoupled lifecycle. The agent publishes to a channel, not to a socket. The session exists whether or not a client is currently listening. The user can shut the laptop, the agent keeps running, and the messages wait on the channel.

Message persistence and replay. Every message gets an ID and is retained for a configurable window. Ably's history and rewind features let a reconnecting client ask for everything since a given message ID and receive the gap in order — no tokens lost, no duplicates inserted.

Connection state recovery. When a client reconnects inside the recovery window — roughly two minutes by default — Ably restores the prior connection state and resumes delivery from the last message the client acknowledged. To the application, the interruption never happened.

Presence sits alongside these three: the server can see whether a human is currently attached, so an agent can decide whether to stream every token or just checkpoint its progress and notify the user later.

Patterns for infrastructure that survives a dropped connection

You don't need Ably specifically to apply the ideas, but you do need to design for them on purpose.

Give every message a monotonic ID. Ordering and gap detection are impossible without one. The client tracks the last ID it processed, and reconnect logic replays from there.

Make the session the unit of work, not the request. Store run state — current step, tool calls, partial output — keyed by a session ID the client holds. Reconnection re-attaches to that ID; it never re-submits the prompt and never starts the agent over.

Guard every side effect. Even with clean resume logic, a tool call that fires twice should not double-charge a card or send two emails. Put an idempotency key on each external action.

Separate "the agent finished" from "the client got the result." Persist the final output, and treat delivery as its own retryable step. An agent that completes while the user is offline should still deliver when they return.

Done together, these patterns turn a dropped connection from a lost task into a resumable one — the difference between an agent demo and an agent users trust with a forty-minute job.