<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maddy Quinn</title>
    <description>The latest articles on DEV Community by Maddy Quinn (@maddysquinn).</description>
    <link>https://dev.to/maddysquinn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F973510%2F3cb97873-19ac-4184-9166-798b5d878eb5.jpeg</url>
      <title>DEV Community: Maddy Quinn</title>
      <link>https://dev.to/maddysquinn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maddysquinn"/>
    <language>en</language>
    <item>
      <title>Why your AI chat reconnects but your session doesn't</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Wed, 27 May 2026 10:05:10 +0000</pubDate>
      <link>https://dev.to/ably/why-your-ai-chat-reconnects-but-your-session-doesnt-36jg</link>
      <guid>https://dev.to/ably/why-your-ai-chat-reconnects-but-your-session-doesnt-36jg</guid>
      <description>&lt;p&gt;TL;DR: WebSockets are the right protocol for production AI chat. But the connection is stateless at the session level. When it drops — AWS ALB defaults to 60 seconds, Cloudflare to 100 seconds on Free and Pro plans — all in-flight tokens, tool call results, and agent context disappear. Reconnection logic restores the socket. It doesn't restore the session. That's the gap this post covers.&lt;/p&gt;

&lt;p&gt;WebSockets are the right protocol for production AI chat. But that fact doesn't prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context.&lt;/p&gt;

&lt;p&gt;The reconnected socket has no view of what happened while it was down. Three conditions cause this routinely: a proxy timeout mid-task, a page reload mid-generation, and a mobile network handoff. Each breaks for the same underlying reason: the WebSocket protocol handles transport, not session state, and reconnection logic doesn't change that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WebSockets are the right protocol for production AI chat: bidirectional, persistent, and suited to live steering and tool calls in ways SSE isn't.&lt;/li&gt;
&lt;li&gt;A WebSocket connection is stateless at the session level. When it closes through a proxy timeout, page reload, or device switch, all state disappears with it.&lt;/li&gt;
&lt;li&gt;Reconnection logic re-establishes the transport. It does not recover the tokens, tool calls, or agent context in flight when the connection is dropped.&lt;/li&gt;
&lt;li&gt;What fills the gap is a session layer: infrastructure that persists conversation state against a session ID and replays it to reconnecting clients.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What WebSockets get right for AI chat
&lt;/h2&gt;

&lt;p&gt;The protocol question is worth settling early, because the rest of this piece argues about the infrastructure layer above it. For production AI chat, the choice is WebSockets or SSE. Both stream tokens to the client, but only WebSockets let signals flow the other way.&lt;/p&gt;

&lt;p&gt;WebSockets are bidirectional. When your user cancels mid-stream, that signal travels back on the same channel; tool call confirmations and workflow approvals work the same way. When a workflow pauses for human input mid-execution, that input must arrive in-band, not via a polling endpoint.&lt;/p&gt;

&lt;p&gt;SSE is a one-way stream. For simple chatbots on stable networks, that doesn't matter. Add tool calls, mid-stream cancellation, or multi-device continuity, and it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where production AI connections actually fail
&lt;/h2&gt;

&lt;p&gt;Not all connection drops come from bad network conditions. The more common causes in production are infrastructure defaults designed for HTTP requests, not AI chat. A response can be mid-generation for tens of seconds, and most defaults weren't built for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Application Load Balancer idle timeout.&lt;/strong&gt; AWS ALB closes connections idle for 60 seconds by default, per the &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html" rel="noopener noreferrer"&gt;AWS Application Load Balancer documentation&lt;/a&gt;. For standard HTTP that's generous. For an agent waiting on a downstream API, 60 seconds of silence is routine, and the connection closes without warning. Your user's response stops mid-sentence with no explanation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloudflare proxy timeout.&lt;/strong&gt; On Cloudflare Free and Pro plans, WebSocket connections terminate after 100 seconds of inactivity, as documented in &lt;a href="https://developers.cloudflare.com/workers/observability/dev-tools/troubleshoot-websockets/" rel="noopener noreferrer"&gt;Cloudflare's WebSocket troubleshooting guide&lt;/a&gt;. Enterprise plans can raise this limit; on Free and Pro plans, the ceiling is fixed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile network handoffs.&lt;/strong&gt; Switching from WiFi to cellular drops the underlying TCP connection immediately, taking the WebSocket with it. On mobile this happens during normal use: walking between coverage areas, backgrounding the tab, entering a building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page reload and tab crash.&lt;/strong&gt; Your user reloads mid-generation, or the browser crashes, both of which are routine. The connection closes, and any session state tied to it is gone unless something stored it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why reconnection logic doesn't fix the session problem
&lt;/h2&gt;

&lt;p&gt;The standard reconnection pattern re-establishes the socket. Transport recovers in milliseconds. But it cannot restore the state that was in flight when the connection dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token stream position.&lt;/strong&gt; The response kept generating while the connection was dark. Those tokens went nowhere. When the client reconnects, it arrives mid-sentence or finds nothing at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool call results.&lt;/strong&gt; Some chat responses depend on realtime data: a lookup, a search, or an action your user triggered. If the connection dropped while the agent was waiting for that result, the response either never came — or ended before it could use the information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent context.&lt;/strong&gt; In a multi-turn exchange, the agent accumulates context: what was asked, what was answered, and what's in progress. When a session drops and reconnects without state recovery, the agent and the client are at different points in the same conversation. Your users experience this as a loss of thread: a response that ignores what came before, or one that repeats something already answered.&lt;/p&gt;

&lt;p&gt;The pattern most teams reach for is a Redis buffer: sequence number tracking, offset storage, and deduplication keys between the agent and the client. It handles full page reloads. It tends to break on deploy-triggered reconnects, mobile handoffs that hit the reconnect window twice, and anything that generates messages faster than the buffer drains.&lt;/p&gt;

&lt;p&gt;Even Vercel's AI SDK lead built a pluggable interface to fill this gap. Every team reaching this point builds the same infrastructure from scratch and chooses to own it indefinitely. Reconnection handles the protocol layer; session state sits one layer above it, and it's a separate problem entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  What production AI chat needs from the transport layer
&lt;/h2&gt;

&lt;p&gt;Any viable approach to production AI sessions needs to satisfy four requirements. These are implementation-neutral: what any infrastructure option has to provide, regardless of vendor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent state storage.&lt;/strong&gt; Conversation history, token positions, tool call inputs and outputs, and agent state must be stored against a stable session ID and survive connection drops. The session ID is the anchor: the same session must be addressable after any reconnect, from any device.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offset-based replay.&lt;/strong&gt; A returning client requests messages from its last received serial. The infrastructure delivers everything missed, in order, with no duplicates. The client supplies its offset; the infrastructure fills the gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocol fallback.&lt;/strong&gt; When a WebSocket upgrade is blocked by a proxy or firewall, the transport degrades to HTTP streaming or long-polling automatically. This should not require per-deployment configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-device delivery.&lt;/strong&gt; Any authenticated device subscribing to a session ID receives the current state plus history. The session is not bound to the tab, browser, or device that opened it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Ably AI Transport solves the session layer problem
&lt;/h2&gt;

&lt;p&gt;Thankfully, you don't need to build the infrastructure. Ably AI Transport is the durable session layer — the thing that makes the user experience survive what the WebSocket protocol cannot. The session lives in Ably; your application talks to it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjicewkeblgjqpotxqy85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjicewkeblgjqpotxqy85.png" alt="Channel-as-session diagram: agent publishes tokens and events, clients subscribe from any device and catch up on reconnect" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The five failures raised in this post each map directly to a capability:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection drops from proxy timeouts, mobile handoffs, and page reloads.&lt;/strong&gt; The transport degrades automatically — WebSocket first, then HTTP streaming, then long-polling — so the session survives the infrastructure defaults that break standard WebSocket connections. No per-deployment configuration required. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Reconnection and recovery&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tokens generated while the client was disconnected.&lt;/strong&gt; The token stream is stored against the session. On reconnect, the client receives everything it missed in order, with no duplicates. The developer doesn't track offsets or implement catch-up logic. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Token streaming&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool call results and agent context lost mid-task.&lt;/strong&gt; Agent state, tool call inputs and outputs, and conversation history are all published to the session as they generate. A reconnecting client recovers the full context, not just the tokens. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Reconnection and recovery&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mid-stream steering and human-in-the-loop signals.&lt;/strong&gt; Cancellations, approvals, and human input travel back to the agent on the same session channel. The bidirectional requirement that rules out SSE is covered without a separate signaling mechanism. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Human in the loop&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sessions tied to a single tab or device.&lt;/strong&gt; Any authenticated device subscribing to the session ID receives current state plus history. A conversation started on desktop continues on mobile without restart. &lt;br&gt;
→ &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Multi-device sessions&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Get started: &lt;a href="https://ably.com/docs/ai-transport/vercel-ai-sdk" rel="noopener noreferrer"&gt;Vercel AI SDK&lt;/a&gt; · &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;Core SDK&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When is SSE still the right choice for AI chat?
&lt;/h3&gt;

&lt;p&gt;SSE is a reasonable starting point for chatbots that follow a simple request-response pattern: a user submits a message, the server streams tokens, no interruption required. It deploys more easily than WebSockets, carries no persistent connection overhead, and works well on stable networks.&lt;/p&gt;

&lt;p&gt;The constraints appear when your application starts adding agentic behaviour: tool calls, mid-stream cancellation, multi-device continuity, and background tasks that complete while the user is offline. At that point, SSE's unidirectional architecture stops being a trade-off and becomes a blocker.&lt;/p&gt;

&lt;h3&gt;
  
  
  What timeout values should I configure to prevent AI connection drops in production?
&lt;/h3&gt;

&lt;p&gt;Set your AWS ALB idle timeout to at least 3,600 seconds for WebSocket connections. The 60-second default was designed for HTTP requests, not long-running agent tasks. On Cloudflare Free and Pro plans, the WebSocket timeout is fixed at 100 seconds. Send heartbeat pings at around 25-second intervals to stay well below that threshold.&lt;/p&gt;

&lt;p&gt;For Nginx, the equivalent setting is &lt;code&gt;proxy_read_timeout&lt;/code&gt;. These three changes cover most production timeout failures for AI chat deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does reconnection logic solve the session recovery problem?
&lt;/h3&gt;

&lt;p&gt;Reconnection logic solves the transport problem. It doesn't solve the state problem. Exponential backoff and heartbeats re-establish the socket.&lt;/p&gt;

&lt;p&gt;But they can't recover tokens generated during the gap, tool call results that arrived while the client was disconnected, or context accumulated across multiple steps. Preventing duplicate messages on reconnect requires sequence numbers or idempotency keys at the session layer, not the WebSocket layer. A client that reconnects without a session layer arrives at an empty context and either loses the conversation or restarts it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Ably replay missed messages after a WebSocket reconnect?
&lt;/h3&gt;

&lt;p&gt;Ably assigns every published message a serial number. When a client reconnects, the transport layer uses the internal &lt;code&gt;untilAttach&lt;/code&gt; mechanism to fetch messages published during the gap. This bounds the history query to the exact reconnection point.&lt;/p&gt;

&lt;p&gt;Ably delivers everything missed in order, with no overlap between historical and live messages. The client doesn't track its own offset or implement catch-up logic. Every plan includes two minutes of ephemeral history by default. Persisted channels extend this to 72 hours on Standard plans, or up to 365 days on Pro and Enterprise plans.&lt;/p&gt;




&lt;p&gt;Have you hit this in production? Curious what the failure looked like - was it the proxy timeout, a page reload, or something else that first surfaced it?&lt;/p&gt;

</description>
      <category>websockets</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How NASCAR delivers realtime racing data to millions of fans around the world</title>
      <dc:creator>Maddy Quinn</dc:creator>
      <pubDate>Wed, 17 Jan 2024 09:27:12 +0000</pubDate>
      <link>https://dev.to/ably/how-nascar-delivers-realtime-racing-data-to-millions-of-fans-around-the-world-1a73</link>
      <guid>https://dev.to/ably/how-nascar-delivers-realtime-racing-data-to-millions-of-fans-around-the-world-1a73</guid>
      <description>&lt;p&gt;Playing around with streaming realtime data is one thing, but have you ever wondered how you would handle the challenge of streaming realtime data to millions of racing fans?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.nascar.com/drive" rel="noopener noreferrer"&gt;NASCAR Drive&lt;/a&gt; has built an industry-leading platform that handles the distribution of 1.3TB of telemetry data in a single race, while over 80 million fans immerse themselves in the race from an in-cockpit view that offers a live 360 camera feed and access to the same car telematics as the driver and team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wondering how they do it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We spoke to &lt;a href="https://www.linkedin.com/in/chad-larter/" rel="noopener noreferrer"&gt;Chad Larter&lt;/a&gt;, Senior Director of Technical Operations for NASCAR, in our webinar on January 31st - and you can now &lt;a href="https://hubs.la/Q02jCxH_0" rel="noopener noreferrer"&gt;watch it on demand&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Make sure not to miss it if you’re interested to learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How to stream 1.3TBs of data per race to over 80 million fans - complete with highly detailed stats that update in realtime&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why NASCAR decided to bring this solution in-house – and how they built the technology to achieve it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How NASCAR solved the data surge and streaming challenges&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/zzOY9NdTyI0"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;p&gt;If you've watched the video, you know there was a lot to take in, so here are some of the key points covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability:&lt;/strong&gt; Tens of thousands of users connect during major races like Daytona 500, with major traffic spikes occurring following in-race events.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data processing:&lt;/strong&gt; Over 100 data points are collected, filtered and downsampled to 2 updates/second for realtime fan consumption, across devices - 1.3 TB per race.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Platform efficiencies:&lt;/strong&gt; Only changes in data are broadcasted to clients, using binary deltas, reducing bandwidth consumption.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long polling vs WebSockets:&lt;/strong&gt; In comparison to their previous long-polling solution the use of a WebSockets platform proved much quicker and puts a lot less stress on networks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shared insights:&lt;/strong&gt; Fans gain access to the same detailed data used by teams and OEMs, providing a deeper understanding of the race.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fan engagement:&lt;/strong&gt; Consumers spend significant time (30 minutes to 3 hours) consuming race data, highlighting the success of delivering an enhanced engagement experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Customization:&lt;/strong&gt; Fans want to consume data their way, gaining insights on their favourite drivers/cars - not be limited by broadcasters focusing on the leading cars.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More to come:&lt;/strong&gt; NASCAR are exploring the use of realtime data for leaderboards, chat and additional content - moving away from polling methods.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions about how NASCAR uses Ably Pub/Sub, the applications it can power, or how it could work for your use case, please visit our &lt;a href="https://ably.com/fan-engagement" rel="noopener noreferrer"&gt;fan engagement page&lt;/a&gt; or &lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;sign up&lt;/a&gt; to get started for free.&lt;/p&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>webdev</category>
      <category>interview</category>
    </item>
  </channel>
</rss>
