Maddy Quinn for Ably

Posted on Jun 15 • Edited on Jun 22 • Originally published at ably.com

Why AWS ALB and Cloudflare silently kill your AI agent sessions

#ai #websockets #webdev #javascript

TL;DR: WebSocket reconnection restores the transport. It doesn't restore the session. Tokens generated during the gap, tool call results that arrived while the client was offline, and the agent's position in the ongoing generation are all lost unless you have a session layer. This post covers the timeout sources that hit agentic applications specifically, why SSE is a bad fit for bidirectional agent communication, and how session recovery works in practice.

Why AI agents get disconnected in ways standard apps don't

WebSocket reconnection has always been worth solving. What makes AI agents different is when the disconnect happens.

A standard chat interface goes quiet between user interactions — when there's genuinely nothing happening on the wire. An agent goes quiet mid-execution: during tool call waits, between reasoning steps, while the LLM is generating a response. That silence is the agent doing its most intensive work. To every load balancer and proxy in the path, it looks idle.

AWS Application Load Balancer defaults to closing connections after 60 seconds of inactivity. Cloudflare enforces a 100-second idle timeout on Free and Pro plans — fixed, cannot be raised. Corporate proxies and enterprise gateways add their own thresholds that you often can't inspect or configure.

Plenty of production WebSocket applications have shipped without explicitly thinking about this. The reason: traditional server-side workloads tend to emit a trickle of traffic on their own — progress events, periodic state updates — which keeps the connection alive as a side effect. The timeouts stay invisible because something is always crossing the wire.

Agentic applications don't have that property. A customer support agent goes quiet mid-answer while the user is typing a correction. A coding agent waits for the user to approve a tool call before continuing. A research agent sits in silence for 90 seconds while a downstream API responds. None of that is idleness from the agent's perspective. To the ALB, it's all the same.

Why SSE doesn't fit

If you're reaching for SSE as an alternative, it won't solve the session problem — and it introduces a new one.

The applications this post is about — customer support agents, coding agents, research agents the user steers mid-task — require the client to send messages back to the agent on the same session while it's in flight. A user correcting an assumption, approving a tool call, or cancelling mid-implementation needs a channel in both directions.

SSE streams server-to-client only. That rules it out at the transport level regardless of how well you've solved the replay problem.

The fix for idle timeouts: server-side ping frames

The WebSocket spec includes a mechanism designed exactly for this: server-side ping frames. The server sends a ping at a fixed interval; the browser responds automatically with a pong; both frames count as activity and reset every idle timer on the path.

The interval needs to sit comfortably below the shortest timeout on the path. A 50-second interval covers both the AWS ALB 60-second default and Cloudflare's 100-second limit simultaneously. Browsers respond to ping frames automatically — no client-side code required.

Common idle timeouts to plan around

Infrastructure	Default timeout	Configurable?
AWS Application Load Balancer	60 seconds	Yes — `idle_timeout.timeout_seconds`, up to 4,000s
Cloudflare (Free/Pro)	100 seconds	No — fixed. Enterprise customers can request custom values.
Corporate proxies, gateways	Varies — often invisible	Depends on deployment

For ALB, raise the limit if your workload genuinely needs a longer window. The idle_timeout.timeout_seconds attribute is adjustable in the load balancer configuration and takes effect immediately without a redeployment.

For Cloudflare Free and Pro plans, you can't raise the limit. The server-side ping approach at 50 seconds is the only viable mitigation.

If connections die at exactly 100 seconds in production, check EdgeStartTimestamp and EdgeStopTimestamp in Cloudflare's HTTP request logs to confirm the source before debugging elsewhere.

Other connection challenges to consider

Not all disconnects come from idle timeouts. Two other patterns hit agentic applications in production:

Corporate VPN and enterprise proxy traversal

Many enterprise networks don't forward the HTTP Upgrade header required to open a WebSocket connection. The connection never opens rather than dropping mid-session. The failure appears at the WebSocket handshake stage — typically a non-101 HTTP response — not as a silent close after inactivity.

The fix is protocol fallback: when a proxy blocks the WebSocket upgrade, the transport degrades automatically to HTTP streaming or long-polling without per-deployment configuration.

Mobile network handoffs

Switching from WiFi to cellular drops the underlying TCP connection immediately. On mobile, the client's onclose event often doesn't fire — the OS terminates the connection without a clean close frame. On iOS specifically, background TCP connections are suspended within seconds of the app moving to the background, again without notification.

Don't rely on onclose to trigger reconnection for mobile users. Use failed-send detection and an application-level heartbeat timeout to catch silent closes.

What transport reconnection recovers — and what it doesn't

Here's where most teams discover the gap. Reconnecting the WebSocket connection restores the transport. It doesn't restore the state of the session that was in flight when the connection dropped.

Transport reconnection recovers	It doesn't recover
The WebSocket connection itself	Tokens generated while disconnected
Active session subscriptions	Tool call results that arrived during the gap
The ability to send and receive new messages	The agent's reasoning trace if streamed as events
The session ID and session name	The position in the ongoing generation

After a successful reconnect with only transport-layer recovery, the client is back online, but the session is in an indeterminate state. The client holds a partial response from before the disconnect. The agent continued generating on the server side. Neither side knows where the other stopped.

How session recovery works

This is where Ably AI Transport comes in. AI Transport acts as the session and delivery layer between your agent and your users.

The agent publishes every event — each generated token, each tool call, each reasoning step — to a session. AI Transport stores those events and is responsible for delivering them to the client whenever the client is connected. From the agent's side, this is fire-and-forget: it doesn't care whether the client is online, offline, mid-reconnect, or freshly loaded into a new browser tab.

When a client connects or reconnects, it asks for everything it hasn't already seen. AI Transport returns the missed events, in order, before the live stream resumes. There's no "live vs. history" boundary the application needs to reason about, and no difference in handling for a 30-second drop vs. a 30-minute disconnect vs. a fresh page load.

One detail worth understanding: the session doesn't store one event per token. Tokens are appended to a single message per agent response — conflation — so the session history contains one accumulated message per response, not thousands of token-sized events. A client reconnecting mid-stream receives the in-progress message in its current accumulated form and resumes streaming from there. A client loading the page fresh receives the same accumulated message as a single coherent block. The application doesn't write reconciliation logic for either case.

For more on how this works in practice, see AI Transport's reconnection and recovery and history and replay documentation.

What the user should see during a disconnect

Session recovery handles the infrastructure layer. But a reconnect that works silently in the background still needs the right UI treatment to avoid looking like a failure.

AI Transport exposes well-defined connection states. The key distinction: the disconnected state (temporarily offline, retrying automatically) vs. the suspended state (retry window exhausted).

During disconnection: show a reconnecting indicator, not an error modal. In the suspended state: show a retry button. The session is intact and waiting — communicate that.

What building this yourself actually costs

Building session recovery without a purpose-built layer means writing:

A heartbeat loop
A reconnection manager
Manual state reconstruction logic
A connection state component to surface each phase to the user

None of these is large in isolation. Together, they constitute infrastructure. And any infrastructure your team owns is infrastructure your team spends time and resources maintaining as requirements change.

Ably AI Transport provides the session recovery layer:

Automatic connection recovery within the two-minute window
History compaction and replay so clients always receive clean, accumulated state on reconnect
Protocol fallback from WebSocket to HTTP streaming to long-polling
Bidirectional signaling on the same session

What remains in your application code is the connection state UI — surfacing the reconnecting and suspended states to the user — and that's a handful of lines rather than a system.

Frequently asked questions

How do I stop AI chat sessions from timing out?

Configure your WebSocket server to send ping frames at a fixed interval below the shortest timeout on your path. A 50-second interval sits comfortably below both the AWS ALB 60-second default and Cloudflare's fixed 100-second limit on Free and Pro plans, with browsers responding automatically — no client-side code required. If your workload needs a longer window, raise the idle_timeout.timeout_seconds attribute in your ALB configuration; it's adjustable up to 4,000 seconds.

What happens if a user disconnects during LLM streaming?

With AI Transport, the session resumes automatically upon reconnect, with missed tokens delivered in order before new ones arrive, and no application code needed. For longer disconnects, AI Transport's history and replay feature loads the full conversation from the session history. Without a session layer, tokens generated during the dropout are lost, and the agent can't resume from the point of interruption.

How do I avoid duplicate AI messages after a WebSocket reconnect?

With AI Transport you don't need to — the SDK handles this through history compaction. Tokens are streamed as appends to a single message per agent response, and the session history stores one message per response rather than one per token. When a client reconnects or refreshes, it receives the single accumulated message rather than individual tokens to reconstruct.

What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?

The AWS Application Load Balancer idle timeout defaults to 60 seconds and applies to all connection types, including WebSocket. Raise it by updating the idle_timeout.timeout_seconds load balancer attribute. The valid range is one to 4,000 seconds; most AI agent workloads are well served by a value between 3,600 and 4,000 seconds. The change takes effect immediately without requiring a redeployment.

Does Cloudflare close WebSocket connections? What is the timeout?

Yes. Cloudflare enforces a 100-second idle timeout on WebSocket connections for Free and Pro customers. The limit is fixed on those plans and can't be raised. Enterprise customers can configure a custom value through their account team. To keep connections alive on Free and Pro plans, configure your WebSocket server to send ping frames every 50 seconds. Browsers respond automatically with pong frames, which reset Cloudflare's idle timer and the 60-second AWS ALB default simultaneously.

Can WebSockets work behind a corporate VPN or enterprise proxy?

They can, but many enterprise proxies don't forward the HTTP Upgrade header required to open a WebSocket connection. When that happens, the connection fails at the handshake stage rather than dropping mid-session. That failure is distinct from a timeout: the error occurs before any data flows, not after a period of inactivity. Protocol fallback to HTTP streaming or long-polling handles proxy blocking at the infrastructure layer without per-deployment configuration.

How long does Ably retain channel history for session recovery?

AI Transport replays missed messages automatically on reconnect, with no application code needed. For longer disconnects, session history loads the full conversation, persisting for 24 to 72 hours depending on your Ably plan, with extended retention available on higher tiers.

What's your experience here — have you run into session state loss specifically, or mostly fought the transport reconnection side of the problem? Interested in what patterns teams are using to handle the UI side of a mid-stream disconnect.

DEV Community