<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Priyank Agrawal</title>
    <description>The latest articles on DEV Community by Priyank Agrawal (@priyank_agrawal).</description>
    <link>https://dev.to/priyank_agrawal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3085710%2F95af65e5-45d7-490c-b03e-f6656061ca3f.jpg</url>
      <title>DEV Community: Priyank Agrawal</title>
      <link>https://dev.to/priyank_agrawal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/priyank_agrawal"/>
    <language>en</language>
    <item>
      <title>Your Multi-Agent SSE Stream Works in Dev. Here's What Kills It in Production.</title>
      <dc:creator>Priyank Agrawal</dc:creator>
      <pubDate>Thu, 09 Apr 2026 19:35:44 +0000</pubDate>
      <link>https://dev.to/priyank_agrawal/your-multi-agent-sse-stream-works-in-dev-heres-what-kills-it-in-production-458i</link>
      <guid>https://dev.to/priyank_agrawal/your-multi-agent-sse-stream-works-in-dev-heres-what-kills-it-in-production-458i</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Methodology note:&lt;/strong&gt; Intent classification numbers (25% → 8% misclassification) were measured over ~1,400 messages across two weeks using a 12-intent test set with human-labelled ground truth. Abandonment rates (68% → 31%) are session-level. Small dataset — treat as directional signals, not benchmarks.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The 90-second version
&lt;/h2&gt;

&lt;p&gt;Before the deep dives — here are the six mistakes and fixes. If you only read this far, you still leave with something useful.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Mistake&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01&lt;/td&gt;
&lt;td&gt;Flat text streams break the moment structured data appears&lt;/td&gt;
&lt;td&gt;Use typed SSE events. Design the schema before writing a single agent.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02&lt;/td&gt;
&lt;td&gt;Loose &lt;code&gt;dict&lt;/code&gt; state in LangGraph crashes two agents downstream, mid-stream&lt;/td&gt;
&lt;td&gt;Pydantic with &lt;code&gt;extra='forbid'&lt;/code&gt; at every agent boundary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;run_in_executor&lt;/code&gt; introduces race conditions with shared mutable state&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;ainvoke&lt;/code&gt; instead. If you can't, use per-request graph instances.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04&lt;/td&gt;
&lt;td&gt;Catching only two exception types leaves zombie connections open&lt;/td&gt;
&lt;td&gt;Bare &lt;code&gt;except Exception&lt;/code&gt; + &lt;code&gt;finally&lt;/code&gt; block. Not lazy — required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;05&lt;/td&gt;
&lt;td&gt;Free-text conversation summaries hallucinate user constraints&lt;/td&gt;
&lt;td&gt;Structured Pydantic output for summarisation — explicit fields, not prose&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;06&lt;/td&gt;
&lt;td&gt;Mobile browser kills &lt;code&gt;EventSource&lt;/code&gt; silently on app switch&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ReconnectingEventSource&lt;/code&gt; with Last-Event-ID resume, not just retry&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  1. Typed SSE events — design the contract before the code
&lt;/h2&gt;

&lt;p&gt;Streaming a text response is trivial. Streaming structured data, UI signals, and text simultaneously over one connection is not.&lt;/p&gt;

&lt;p&gt;When a multi-agent system runs, you have free-text chunks, typed job objects, quick-action buttons, and agent metadata all in flight at once. Streaming everything as flat text fails the moment structured data appears — you can't reliably parse a JSON object out of a partial token stream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The mistake:&lt;/strong&gt; Designing the SSE event schema after the agents are built. Every agent output has to be rewired to match the new contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Define the typed event schema first. It becomes the contract every agent writes to and every frontend listener reads from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/app/api/v1/chat.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_agent_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orchestrator_result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;orchestrator_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="c1"&gt;# streaming text token
&lt;/span&gt;            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event: chunk&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jobs_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# structured JSON — not text
&lt;/span&gt;            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event: jobs_data&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jobs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jobs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event: complete&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;conversation_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;conversation_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// frontend — subscribe by type, not raw text&lt;/span&gt;
&lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chunk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;appendTextChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;jobs_data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setJobResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;What this code is missing:&lt;/strong&gt; No backpressure — a slow client + fast generator = unbounded memory buffering. No SSE heartbeat — load balancers silently kill idle connections after 30–60s. Add &lt;code&gt;": heartbeat\n\n"&lt;/code&gt; every 15 seconds alongside the main stream.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Production addition — heartbeat so proxies don't kill idle connections
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_with_heartbeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;last_event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
        &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last_event&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: heartbeat&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;last_event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2. Pydantic state in LangGraph — errors at write time, not read time
&lt;/h2&gt;

&lt;p&gt;The Ranking Agent reads &lt;code&gt;jobs_found&lt;/code&gt;. But there's no contract saying what fields it contains. If the Job Search Agent wrote malformed data, the crash happens at the Ranking Agent — mid-stream, with a live user waiting, and no easy way to trace it back to the source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The mistake:&lt;/strong&gt; Loose &lt;code&gt;dict&lt;/code&gt; and &lt;code&gt;list&lt;/code&gt; types everywhere in your LangGraph state. Crashes surface two agents downstream with no clear origin.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Pydantic models at every agent boundary. Add &lt;code&gt;extra='forbid'&lt;/code&gt; — unexpected fields become hard errors at the producing agent, not silent pass-throughs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# state.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConfigDict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JobResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConfigDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;forbid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# unexpected fields = hard error
&lt;/span&gt;    &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;match_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;matched_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConfigDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;forbid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IntentData&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;jobs_found&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;JobResult&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;is_complete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt; If data crosses an agent boundary, it gets a type. No exceptions.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. &lt;code&gt;run_in_executor&lt;/code&gt; has a hidden race condition
&lt;/h2&gt;

&lt;p&gt;FastAPI is async. LangGraph's LLM calls are, in some configurations, synchronous. Calling synchronous code inside an async endpoint blocks the entire event loop for the duration of that LLM call. The symptom: SSE connections open, the &lt;code&gt;thinking&lt;/code&gt; event fires, then nothing for 10–15 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before the fix:&lt;/strong&gt; P95 latency was 3× P50.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;After:&lt;/strong&gt; 1.4× P50.&lt;/p&gt;

&lt;p&gt;But here's what most guides skip:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Critical caveat:&lt;/strong&gt; &lt;code&gt;run_in_executor&lt;/code&gt; moves code to a thread pool. If your LangGraph setup has any shared mutable state — a shared cache, a shared agent instance, anything not thread-safe — you've introduced race conditions that are invisible at low traffic and catastrophic at scale.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Prefer this:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Use the async method if your setup supports it
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;langgraph_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Safe fallback when &lt;code&gt;ainvoke&lt;/code&gt; is unavailable:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;partial&lt;/span&gt;

&lt;span class="n"&gt;_executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Build a FRESH graph per request — no shared state across threads
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_agent_graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_in_executor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_executor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The one-liner fix is only safe if you're certain there is no shared mutable state anywhere in the graph.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. LLM down mid-stream — catch everything, not just two exceptions
&lt;/h2&gt;

&lt;p&gt;The SSE connection is open. Intent classified. Then the LLM returns a 503. Without the right handling, the connection hangs indefinitely. After a few of these, zombie connections accumulate — open SSE connections that will never close.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The original mistake:&lt;/strong&gt; Catching only &lt;code&gt;httpx.HTTPStatusError&lt;/code&gt; and &lt;code&gt;groq.APIStatusError&lt;/code&gt;. A Pydantic validation error, a plain Python bug, any other library's timeout — all bubble up uncaught, leaving the connection open forever.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safe_agent_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;agent_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
    &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTPStatusError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;APIStatusError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Response timed out.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Not lazy — this is the safety net for everything unexpected
&lt;/span&gt;        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unexpected agent error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Something went wrong.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Runs regardless — frontend can always close the connection
&lt;/span&gt;        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conversation_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;finally&lt;/code&gt; block is not optional. It guarantees the frontend receives a &lt;code&gt;complete&lt;/code&gt; event even when everything above it fails, so it can close the connection cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Free-text conversation summaries hallucinate — use structured output
&lt;/h2&gt;

&lt;p&gt;At turn 20, injecting full conversation history into every call costs too much. The fix is to summarise old messages and keep recent ones verbatim.&lt;/p&gt;

&lt;p&gt;But free-text summarisation introduces a subtle failure: the summary model says &lt;strong&gt;"user prefers small companies"&lt;/strong&gt;. The user said &lt;strong&gt;"no startups under 50 people"&lt;/strong&gt;. These are opposite. The downstream agent makes recommendations based on hallucinated preferences with no way to debug it from the recent context alone.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Wrong — free text is hard to audit and easy to distort
&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User is looking for Python roles at mid-sized companies...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Right — structured output forces explicit, auditable extraction
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConversationSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;job_preferences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;      &lt;span class="c1"&gt;# ["Python", "remote", "Series B+"]
&lt;/span&gt;    &lt;span class="n"&gt;hard_constraints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="c1"&gt;# ["no startups under 50 people"]
&lt;/span&gt;    &lt;span class="n"&gt;mentioned_companies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# ["Stripe", "Notion"]
&lt;/span&gt;    &lt;span class="n"&gt;current_goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_to_struct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;older_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConversationSummary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;summarize_to_struct&lt;/code&gt; is a separate LLM call using a fast, cheap model (haiku or gpt-3.5-turbo). The result is cached in your session store — don't re-summarise every turn. Structured output makes it auditable: you can log and diff summaries across turns to catch distortions before they affect downstream agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Mobile SSE dies silently — you need resume, not just retry
&lt;/h2&gt;

&lt;p&gt;On desktop, SSE is reliable. On mobile, background/foreground app switching kills &lt;code&gt;EventSource&lt;/code&gt; silently — no error event, no warning. The user returns to a chat with no response.&lt;/p&gt;

&lt;p&gt;A naive retry loop reconnects from scratch and replays events the user already saw. What you actually need is resume from the last received event.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReconnectingEventSource&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastEventId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retryDelay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxDelay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxAttempts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;closed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handlers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Pass Last-Event-ID so server resumes from the right point&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastEventId&lt;/span&gt;
            &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;lastEventId=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastEventId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
            &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;closed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onerror&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;closed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scheduleReconnect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;

        &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastEventId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastEventId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastEventId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// track position&lt;/span&gt;
                &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;scheduleReconnect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;give_up&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]?.();&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retryDelay&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxDelay&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;es&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Server side:&lt;/strong&gt; Emit an &lt;code&gt;id:&lt;/code&gt; field with every SSE event. On reconnect with &lt;code&gt;lastEventId&lt;/code&gt;, replay only the missed events from a short-lived Redis buffer (60s TTL). If resume complexity isn't worth it, the honest fallback is a "connection dropped — tap to retry" button with the last message pre-filled.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Intent classification — the hard cases and fixes
&lt;/h2&gt;

&lt;p&gt;"Find me Python jobs" → job search. "Write a cold email" → email generator. The hard cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"I applied to Google last week, any updates?"&lt;/em&gt; — &lt;code&gt;followup&lt;/code&gt; or &lt;code&gt;general&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Help me with the Stripe interview"&lt;/em&gt; — &lt;code&gt;interview_prep&lt;/code&gt; or &lt;code&gt;general&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Yo"&lt;/em&gt; — ???&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initial misclassification rate on ambiguous inputs: ~25%. Three changes reduced it to under 8%:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Explicit "when to use" descriptions in the classifier prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;INTENT_DESCRIPTIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;followup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User asking about an application they ALREADY submitted. Keywords: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;I applied&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;heard back&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User wants to DISCOVER new jobs not yet applied to. Keywords: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;find&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;show me&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;any jobs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill_gap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User wants to know what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s MISSING. Keywords: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;missing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;improve&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gap&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Confidence scores with fallback:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IntentClassification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;   &lt;span class="c1"&gt;# 0.0 to 1.0
&lt;/span&gt;    &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;      &lt;span class="c1"&gt;# forces the model to show its work
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.70&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;route_to_general&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Previous intent as context:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context_hint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Previous intents: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_intents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# If last intent was "followup" and user replies "any updates?" —
# prior context makes correct classification almost automatic
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  8. Low confidence + open question = 68% abandonment
&lt;/h2&gt;

&lt;p&gt;When intent confidence falls below 0.70 and the system asks "what are you looking for?" — 68% of sessions end there. Replacing the open question with three specific quick-reply buttons dropped abandonment to 31%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.70&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I can help with a few things — pick one:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find me jobs matching my profile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a cold email for a role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What skills should I add to my resume?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# Personalise from user_profile in production
&lt;/span&gt;        &lt;span class="c1"&gt;# Don't show "add skills" to a user whose resume is already uploaded
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;On the numbers:&lt;/strong&gt; ~1,400 sessions, two weeks, single product. Directional, not benchmarks. What generalises: open question → blank form feeling → abandonment. Specific options → momentum → engagement.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  9. Build these on day one — retrofitting is painful
&lt;/h2&gt;

&lt;p&gt;Six decisions that are cheap upfront and expensive to add later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSE event schema first.&lt;/strong&gt; Changing the contract mid-build means rewiring every agent output and every frontend listener simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pydantic + &lt;code&gt;extra='forbid'&lt;/code&gt; everywhere.&lt;/strong&gt; Retrofitting types into a working dict-based state means rewriting agents that already have subtle state-sharing bugs you haven't found yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bare &lt;code&gt;except&lt;/code&gt; + &lt;code&gt;finally&lt;/code&gt; from day one.&lt;/strong&gt; Zombie connections are hard to detect (they look like slow users) and hard to purge once accumulating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log every intent classification with confidence score.&lt;/strong&gt; After two weeks of logs, the classifier's weak spots become obvious. You can't improve what you didn't measure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured summarisation, not free text.&lt;/strong&gt; Switching mid-product means auditing every cached summary in your session store for distortions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ReconnectingEventSource&lt;/code&gt; before mobile launch.&lt;/strong&gt; Retrofitting reconnection touches every event handler in the frontend. It's never a small change once you have production listeners.&lt;/li&gt;
&lt;/ul&gt;







&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Quick poll for the comments — which of these hit you in production?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zombie SSE connections that never closed&lt;/li&gt;
&lt;li&gt;Loose &lt;code&gt;dict&lt;/code&gt; state crashing mid-stream&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;run_in_executor&lt;/code&gt; race condition&lt;/li&gt;
&lt;li&gt;Mobile &lt;code&gt;EventSource&lt;/code&gt; dying silently&lt;/li&gt;
&lt;li&gt;Free-text summaries hallucinating user constraints&lt;/li&gt;
&lt;li&gt;Something not on this list ⬇️&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Found a failure mode I missed? Drop it in the comments — I'll add it to the article with credit.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Context Engineering: The Production Problem Nobody Writes About</title>
      <dc:creator>Priyank Agrawal</dc:creator>
      <pubDate>Mon, 06 Apr 2026 17:41:17 +0000</pubDate>
      <link>https://dev.to/priyank_agrawal/context-engineering-the-production-problem-nobody-writes-about-9k8</link>
      <guid>https://dev.to/priyank_agrawal/context-engineering-the-production-problem-nobody-writes-about-9k8</guid>
      <description>&lt;p&gt;&lt;em&gt;Andrej Karpathy called it "the new prompt engineering." Everyone wrote a definition article. Nobody wrote about the actual infrastructure.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I built the ESG Analytics Chatbot at Planet Sustech — a RAG system serving 10+ organizations with 95%+ query accuracy — the hardest engineering challenge wasn't the model. It wasn't the retrieval. It was figuring out what to put in the context window before every single LLM call, and why getting that wrong destroyed accuracy in ways that were nearly impossible to debug.&lt;/p&gt;

&lt;p&gt;Context engineering is about one thing: &lt;strong&gt;building the right input to the model, every call, for every state of your application.&lt;/strong&gt; It's not about writing better prompts. It's about the pipeline that assembles the prompt at runtime.&lt;/p&gt;

&lt;p&gt;This is the post I wish existed when I was building that system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Is Harder Than It Sounds
&lt;/h2&gt;

&lt;p&gt;Here's a simplified view of what goes into a single LLM call in a production RAG system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[System Prompt] + [Retrieved Documents] + [Conversation History] + [User Query] + [Tool Schemas]
= Total Context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each component competes for the same finite token budget. And each one has a different freshness, relevance, and cost profile.&lt;/p&gt;

&lt;p&gt;The naive approach: include everything. Stuff the context window as full as you can.&lt;/p&gt;

&lt;p&gt;The result: accuracy drops, costs spike, and you hit a phenomenon called "lost in the middle" — where the model pays disproportionately less attention to content in the middle of a long context. Research shows up to 24.2% accuracy degradation when relevant information is buried in the middle of a long context rather than placed at the beginning or end.&lt;/p&gt;

&lt;p&gt;We saw this in the ESG chatbot. When an analyst asked about a specific company's water usage data, our retrieval was returning 8 relevant documents. But we were appending them in retrieval order, so the most relevant chunk was often sitting in positions 3–6. Accuracy on those queries was 15–20 points lower than queries where the best chunk landed first or last.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Layers of Agent Context
&lt;/h2&gt;

&lt;p&gt;Before optimizing anything, it helps to think about context in layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Working Memory&lt;/strong&gt;&lt;br&gt;
The current task. User's query, the agent's current goal, any in-progress state. This is always present and always goes at the beginning of the context (primacy effect — models weight early content heavily).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Episodic Memory&lt;/strong&gt;&lt;br&gt;
Recent conversation history. Not all of it — a compressed or windowed version. For a chatbot, this is the last N turns. For a long-running agent, this is a summary of recent steps taken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Semantic Memory&lt;/strong&gt;&lt;br&gt;
Retrieved knowledge — documents, database results, external data. This is what RAG injects. It's the most expensive layer to get right because it's query-dependent and changes every call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Tool State&lt;/strong&gt;&lt;br&gt;
The results of previous tool calls within the current session. If your agent called &lt;code&gt;search_jobs&lt;/code&gt; two steps ago, that result might still be relevant now.&lt;/p&gt;

&lt;p&gt;Most systems treat these as a flat list appended in whatever order they're collected. Production systems need to treat them as a priority stack with a budget.&lt;/p&gt;


&lt;h2&gt;
  
  
  Building a Context Budget
&lt;/h2&gt;

&lt;p&gt;In our ESG chatbot, we set an explicit token budget per call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;CONTEXT_BUDGET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Fixed — organizational context, instructions
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;working_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Current query + task state
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodic_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# Recent conversation (compressed)
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Retrieved documents
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Previous tool call outputs
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_reserve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Leave room for model's response
&lt;/span&gt;    &lt;span class="c1"&gt;# Total: ~5000 tokens — fits in most model context windows with headroom
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forces you to make explicit tradeoffs instead of letting the context grow unbounded until you hit an error.&lt;/p&gt;

&lt;p&gt;When retrieved documents exceed the semantic memory budget, you don't just truncate — you re-rank and select:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_semantic_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Select and rank documents to fit within token budget.
    Prioritize by relevance score, place highest-scoring chunk first.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Sort by relevance score descending
&lt;/span&gt;    &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;used_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;doc_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;used_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;doc_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;used_tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;doc_tokens&lt;/span&gt;

    &lt;span class="c1"&gt;# Critical: place highest-relevance content first (primacy effect)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Source: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simple re-ranking step improved our query accuracy from the mid-70s to 95%+ on complex multi-document queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  The KV-Cache Problem (and Why It's a Cost Issue, Not Just a Performance Issue)
&lt;/h2&gt;

&lt;p&gt;If you're running in production at scale, KV-cache hit rate is one of the most important metrics you're probably not tracking.&lt;/p&gt;

&lt;p&gt;When you make an LLM API call, the model can cache the key-value pairs computed for a prefix of your context. If your next call starts with the same prefix, it reuses the cache instead of recomputing — dramatically reducing latency and cost (cached tokens are typically 10x cheaper than prompt tokens on Anthropic's API).&lt;/p&gt;

&lt;p&gt;The problem: if your system prompt or retrieved documents change slightly on every call, you get 0% cache hit rate. You pay full price every time.&lt;/p&gt;

&lt;p&gt;Our ESG chatbot was rebuilding the organization's schema context from scratch on every call — even though that schema barely changed. Cache hit rate was near zero. When we moved the static organizational context to a fixed prefix that never changed, and only appended dynamic content after it, our cache hit rate jumped to ~65%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;org_schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Structure context so static content comes first (cacheable prefix).
    Dynamic content comes after (changes per request).
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="c1"&gt;# STATIC — same for all queries in this org. Gets cached.
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Organization schema:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;org_schema&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;

        &lt;span class="c1"&gt;# DYNAMIC — changes per query. Not cached.
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieved context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;build_semantic_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1800&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule: &lt;strong&gt;put what doesn't change first. Put what changes last.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Episodic Memory: When You Can't Afford Full History
&lt;/h2&gt;

&lt;p&gt;For multi-turn conversations, you can't keep appending the full history. At turn 20, your conversation history alone might be 5,000 tokens.&lt;/p&gt;

&lt;p&gt;Two patterns that work in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: Rolling Window&lt;/strong&gt;&lt;br&gt;
Keep only the last N turns verbatim. Simple, predictable, no information loss on recent turns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_episodic_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Keep the most recent N turns.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;  &lt;span class="c1"&gt;# *2 for user+assistant pairs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pattern 2: Hierarchical Summarization&lt;/strong&gt;&lt;br&gt;
For longer sessions, summarize older turns and keep recent ones verbatim.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_episodic_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;budget_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;recent_turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;  &lt;span class="c1"&gt;# Last 3 exchanges verbatim
&lt;/span&gt;    &lt;span class="n"&gt;older_turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;older_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;format_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_turns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Summarize everything older
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this conversation context concisely:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;format_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;older_turns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Earlier in this conversation: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Recent exchanges:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;format_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent_turns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This kept our ESG chatbot coherent across 30+ turn sessions while staying well under our token budget.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Measure
&lt;/h2&gt;

&lt;p&gt;If you're not measuring these, you're flying blind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Log this for every LLM call
&lt;/span&gt;&lt;span class="n"&gt;context_metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;semantic_context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;history_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;episodic_context&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_hit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_read_input_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Anthropic API
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_top_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;org_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;org_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The metrics that predicted accuracy problems in our system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;retrieval_top_score &amp;lt; 0.7&lt;/code&gt; → likely hallucination risk&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;retrieved_tokens &amp;gt; 2000&lt;/code&gt; → "lost in the middle" risk on complex queries&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache_hit = False&lt;/code&gt; on high-traffic routes → unnecessary cost&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Mental Model Shift
&lt;/h2&gt;

&lt;p&gt;Prompt engineering asks: "What do I write in the prompt?"&lt;/p&gt;

&lt;p&gt;Context engineering asks: "What information does the model need, in what order, with what budget, at this specific point in this specific conversation, for this specific user?"&lt;/p&gt;

&lt;p&gt;The second question has an engineering answer. It involves data structures, caching strategies, budget allocation, and retrieval pipelines. It's not about being clever with words — it's about building infrastructure.&lt;/p&gt;

&lt;p&gt;When we treated our context pipeline as an engineering artifact — with explicit budgets, priority layers, and metrics — our ESG chatbot went from inconsistent to consistent. 95%+ accuracy wasn't the result of a better model or better prompts. It was the result of better context architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Context engineering is not a new concept, but the production implementation is genuinely underexplored. Most of what's written is definitional. The infrastructure details — budget allocation, cache optimization, memory layers, retrieval re-ranking — are things teams figure out on their own after expensive mistakes.&lt;/p&gt;

&lt;p&gt;If you're building a production RAG system or a long-running agent and you're hitting accuracy walls or cost spikes, audit your context pipeline first. The answer is probably there.&lt;/p&gt;

&lt;p&gt;What patterns have you found that work? Drop them in the comments — I'm genuinely curious what others are doing here.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I built the ESG Analytics Chatbot at Planet Sustech — a multi-tenant RAG system serving 10+ organizations — and these patterns came directly from debugging production failures in that system.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tags: #ai #llm #rag #machinelearning #python&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>🔍 How MongoDB Indexing Works Internally: B+Tree, B- Tree Structure, Performance Impact &amp; Best Practices</title>
      <dc:creator>Priyank Agrawal</dc:creator>
      <pubDate>Sun, 04 May 2025 15:43:34 +0000</pubDate>
      <link>https://dev.to/priyank_agrawal/how-mongodb-indexing-works-internally-btree-structure-performance-impact-best-practices-aje</link>
      <guid>https://dev.to/priyank_agrawal/how-mongodb-indexing-works-internally-btree-structure-performance-impact-best-practices-aje</guid>
      <description>&lt;p&gt;Indexing is the backbone of database performance. In MongoDB, indexes are not just a luxury—they're essential for building scalable, performant applications. But how do they really work under the hood?&lt;/p&gt;

&lt;p&gt;In this deep dive, we'll explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The core architecture of MongoDB indexes&lt;/li&gt;
&lt;li&gt;Internal algorithms and data structures&lt;/li&gt;
&lt;li&gt;How indexing affects read vs write operations&lt;/li&gt;
&lt;li&gt;Practical indexing strategies and best practices&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 What Is Indexing in MongoDB?
&lt;/h2&gt;

&lt;p&gt;An index in MongoDB is a special data structure that stores a subset of a collection's data in an efficient, sorted format. This allows the database engine to locate documents without scanning the entire collection.&lt;/p&gt;

&lt;p&gt;MongoDB automatically creates an index on the &lt;code&gt;_id&lt;/code&gt; field. You can (and should) define additional indexes to optimize specific queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌳 Internal Index Structure: B-Trees
&lt;/h2&gt;

&lt;p&gt;MongoDB uses &lt;strong&gt;B-Trees&lt;/strong&gt; to manage its indexes. Here's how they work:&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 What's a B-Tree?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A self-balancing tree data structure&lt;/li&gt;
&lt;li&gt;Keeps data sorted for logarithmic-time lookups&lt;/li&gt;
&lt;li&gt;Both internal and leaf nodes can store data&lt;/li&gt;
&lt;li&gt;Supports &lt;strong&gt;range queries&lt;/strong&gt;, &lt;strong&gt;prefix matching&lt;/strong&gt;, and &lt;strong&gt;sorted access&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  💡 Why B-Trees in MongoDB?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enables fast insertions, deletions, and lookups (O(log n))&lt;/li&gt;
&lt;li&gt;Allows range scans for &lt;code&gt;$gte&lt;/code&gt;, &lt;code&gt;$lte&lt;/code&gt;, &lt;code&gt;$in&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;Efficient balancing as data changes&lt;/li&gt;
&lt;li&gt;Well-suited for disk-based storage systems&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔁 Index Lifecycle: How MongoDB Maintains Indexes
&lt;/h2&gt;

&lt;p&gt;Every time a document is inserted, updated, or deleted, all relevant indexes must be updated. Here's what happens internally:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Insert:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;MongoDB finds the correct location in the B-Tree&lt;/li&gt;
&lt;li&gt;A new key is inserted&lt;/li&gt;
&lt;li&gt;Tree rebalancing may occur if necessary&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ✏️ Update:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If the indexed field changes:

&lt;ul&gt;
&lt;li&gt;MongoDB updates the key in the tree&lt;/li&gt;
&lt;li&gt;May involve removing and reinserting keys&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;This causes &lt;strong&gt;write amplification&lt;/strong&gt; if there are many indexes&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  ❌ Delete:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Keys are removed from all applicable indexes&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Indexes help read performance but can affect write performance due to additional maintenance operations.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ⚡ Types of Indexes in MongoDB and Their Internals
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index Type&lt;/th&gt;
&lt;th&gt;Internals&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Single Field&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree (WiredTiger storage engine)&lt;/td&gt;
&lt;td&gt;Basic filters and sorts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compound&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree with multi-part keys&lt;/td&gt;
&lt;td&gt;Queries with multiple filters/sorts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multikey&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree with separate entry per array element&lt;/td&gt;
&lt;td&gt;Indexing arrays&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Text Index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree of lexicographically sorted terms&lt;/td&gt;
&lt;td&gt;Full-text search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TTL Index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single field index + background deletion proc&lt;/td&gt;
&lt;td&gt;Auto-expiring documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sparse/Partial&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree with filtered document set&lt;/td&gt;
&lt;td&gt;Conditional indexing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Geospatial&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree (2d) or B-tree+S2 (2dsphere)&lt;/td&gt;
&lt;td&gt;Location-based queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hashed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;B-tree of hashed values&lt;/td&gt;
&lt;td&gt;Hash-based sharding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  📊 Query Execution with Indexes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🧠 The Query Planner
&lt;/h3&gt;

&lt;p&gt;MongoDB's query optimizer evaluates different query execution plans using available indexes. It selects the most efficient plan based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index selectivity (how well an index narrows results)&lt;/li&gt;
&lt;li&gt;Query predicates and their matching to indexes&lt;/li&gt;
&lt;li&gt;Sort requirements and whether indexes can satisfy them&lt;/li&gt;
&lt;li&gt;Statistics about data distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The optimizer may periodically re-evaluate plans as collection data changes over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔀 Index Intersection
&lt;/h3&gt;

&lt;p&gt;MongoDB can use multiple indexes to resolve a single query when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different indexes match different query conditions&lt;/li&gt;
&lt;li&gt;The intersection would be more selective than using a single index&lt;/li&gt;
&lt;li&gt;No single index exists that fully covers the query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, index intersection isn't always more efficient and has its limitations, especially with large collections.&lt;/p&gt;

&lt;h3&gt;
  
  
  📦 Covered Queries
&lt;/h3&gt;

&lt;p&gt;If all fields required by the query (both in the query criteria and in the projection) are included in an index, MongoDB can fulfill the query using only the index without accessing the documents—these "covered" queries are extremely fast!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example of a covered query (assuming there's an index on {age: 1, name: 1})&lt;/span&gt;
&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;age&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;age&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ⚖️ Read vs. Write Trade-offs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ When Indexes Help:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;High-frequency reads&lt;/li&gt;
&lt;li&gt;Filters and sorts&lt;/li&gt;
&lt;li&gt;Joins using &lt;code&gt;$lookup&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Range queries and pagination&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ❌ When Indexes Hurt:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;High-frequency writes (inserts/updates)&lt;/li&gt;
&lt;li&gt;Frequent indexed field changes&lt;/li&gt;
&lt;li&gt;Low cardinality fields (e.g., gender)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of Thumb&lt;/strong&gt;: Use indexes on collections primarily accessed for reads. Be strategic with indexing on collections with high write throughput.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧱 WiredTiger Storage Engine &amp;amp; Indexing
&lt;/h2&gt;

&lt;p&gt;MongoDB's default engine, &lt;strong&gt;WiredTiger&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores collection data in separate data files&lt;/li&gt;
&lt;li&gt;Uses B-trees for the _id index and all other indexes&lt;/li&gt;
&lt;li&gt;Each index is maintained in its own file&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧬 Compression:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Prefix compression on index keys&lt;/li&gt;
&lt;li&gt;Block compression for data&lt;/li&gt;
&lt;li&gt;Reduces disk usage, improves cache efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🛠 Hidden &amp;amp; Background Builds
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Foreground: Locks collection (faster, blocking)&lt;/li&gt;
&lt;li&gt;Background: Non-blocking (slower, safe for production)&lt;/li&gt;
&lt;li&gt;Hidden indexes: Can be tested before making visible to the query planner&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Indexing Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Index fields used in filtering and sorting&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avoid indexing low-cardinality fields&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep indexes narrow (fewer fields)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use compound indexes in the correct field order&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use &lt;code&gt;.explain()&lt;/code&gt; with verbosity modes to validate&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor index usage with MongoDB Atlas or profiler&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop unused indexes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;   &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;index_name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Balance indexing on write-heavy collections&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🧪 Real-World Example: Compound Index
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Create a compound index&lt;/span&gt;
&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createIndex&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Efficient for:&lt;/span&gt;
&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Not efficient for:&lt;/span&gt;
&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$gte&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ISODate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧠 Developer Insight
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Use indexing strategically by understanding your access patterns. For read-heavy collections, comprehensive indexing can dramatically improve performance. For write-heavy collections, be selective to avoid unnecessary index maintenance overhead."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📘 Conclusion
&lt;/h2&gt;

&lt;p&gt;MongoDB indexing is a sophisticated system built on B-tree data structures, efficient compression techniques, and intelligent query planning.&lt;/p&gt;

&lt;p&gt;By understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;B-Tree mechanics and limitations&lt;/li&gt;
&lt;li&gt;Read/write trade-offs&lt;/li&gt;
&lt;li&gt;Query planner decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can architect highly optimized applications that balance performance across various workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  👨‍💻 Author: Priyank Agrawal
&lt;/h2&gt;

&lt;p&gt;Software Developer | Node.js | MongoDB &lt;br&gt;
🔗 &lt;a href="https://dev.to/priyank_agrawal"&gt;Dev.to Profile&lt;/a&gt;&lt;br&gt;
🔗 &lt;a href="https://www.linkedin.com/in/priyank-aggrawal/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 Follow for More
&lt;/h2&gt;

&lt;p&gt;If you found this useful, follow me on &lt;a href="https://dev.to/priyank_agrawal"&gt;Dev.to&lt;/a&gt; or connect with me on &lt;a href="https://www.linkedin.com/in/priyank-aggrawal/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for more deep-dive technical articles.&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>indexing</category>
      <category>algorithms</category>
      <category>performance</category>
    </item>
    <item>
      <title>Optimizing MongoDB Aggregations with the $function Operator: Reducing Time Complexity</title>
      <dc:creator>Priyank Agrawal</dc:creator>
      <pubDate>Thu, 24 Apr 2025 19:29:49 +0000</pubDate>
      <link>https://dev.to/priyank_agrawal/optimizing-mongodb-aggregations-with-the-function-operator-reducing-time-complexity-33fg</link>
      <guid>https://dev.to/priyank_agrawal/optimizing-mongodb-aggregations-with-the-function-operator-reducing-time-complexity-33fg</guid>
      <description>&lt;p&gt;When working with large datasets in &lt;strong&gt;MongoDB&lt;/strong&gt;, performance optimization is critical. One common challenge arises when you need to apply custom logic, such as mapping data values or performing sorting, which can become inefficient when done in multiple steps. In this blog, we'll explore how MongoDB's $function operator can significantly reduce time complexity compared to traditional methods.&lt;/p&gt;

&lt;p&gt;Before &lt;strong&gt;$function&lt;/strong&gt;: Multiple Steps, Increased Complexity&lt;br&gt;
Before MongoDB introduced the $function operator, custom logic like mapping data and sorting often required multiple aggregation stages. This could increase the overall query time, especially when dealing with large datasets, since multiple stages (e.g., &lt;strong&gt;$map&lt;/strong&gt; and &lt;strong&gt;$sort&lt;/strong&gt;) needed to be processed separately. Let’s consider an example where you are fetching employees for a particular department and sorting them based on their employee number (position within the department). Without $function, you would have to first retrieve the employee data, manually map the employee positions, and then sort them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case: Sorting Employees in a Department&lt;/strong&gt;&lt;br&gt;
Suppose you need to retrieve employees for a specific department, and you have a map &lt;strong&gt;(employeePositionMap)&lt;/strong&gt; that stores the position (employee number) of each employee. In the past, you would have to perform this task in two separate steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async function getDepartmentEmployees(connection, departmentId, userId) {
  const { Department, Employee } = connection.models;

  // Check if the department exists
  const departmentExists = await Department.exists({ _id: new ObjectId(departmentId) });
  if (!departmentExists) {
    throw new Error("Department not found");
  }

  // Get the department's employees
  const department = await Department.findById(departmentId, {
    employees: { $elemMatch: { user: new ObjectId(userId) } }
  });

  // Extract the employee IDs
  const employeeIds = department.employees.length &amp;gt; 0 ? 
    department.employees[0].employeesList.map(e =&amp;gt; e) : [];

  // Create a map of employee IDs to their position (employeeNumber)
  const employeePositionMap = {};
  if (department.employees.length &amp;gt; 0) {
    department.employees[0].employeesList.forEach((employeeId, index) =&amp;gt; {
      employeePositionMap[employeeId.toString()] = index + 1; // Add 1 for 1-based indexing
    });
  }

  // Query employees and add employeeNumber manually
  const employees = await Employee.aggregate([
    { $match: { _id: { $in: employeeIds } } },
    {
      $lookup: {
        from: "employeeSalary",
        let: { employeeId: "$_id", departmentId: new ObjectId(departmentId) },
        pipeline: [
          { $match: { $expr: { $eq: ["$employeeId", "$$employeeId"] } } }
        ],
        as: "salaryDetails"
      }
    },
    { $unwind: "$salaryDetails" },
  ]);

  // Adding employeeNumber manually after fetching all employees
  const employeesWithNumbers = employees.map(employee =&amp;gt; {
    const employeeId = employee._id.toString();
    return {
      ...employee,
      employeeNumber: employeePositionMap[employeeId] || null
    };
  });

  // Sort by employeeNumber
  employeesWithNumbers.sort((a, b) =&amp;gt; {
    if (a.employeeNumber === null) return 1;
    if (b.employeeNumber === null) return -1;
    return a.employeeNumber - b.employeeNumber;
  });

  return employeesWithNumbers;
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time Complexity in the Old Approach&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Two Steps:&lt;/strong&gt; First, we retrieve all the employees and manually add employeeNumber to each employee. Then, we sort the employees based on this employeeNumber.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Increased Complexity:&lt;/strong&gt; The mapping &lt;strong&gt;(employeePositionMap)&lt;/strong&gt; and sorting require separate iterations, resulting in O(n log n) time complexity for sorting and O(n) for mapping.&lt;/p&gt;

&lt;p&gt;For large datasets, this can quickly become inefficient as the number of documents grows, especially when the employeeNumber mapping and sorting happen outside the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After $function: Reduced Complexity&lt;/strong&gt;&lt;br&gt;
Now, let’s see how MongoDB's $function operator simplifies this process and improves performance. By using $function, we can add the employeeNumber directly in the aggregation pipeline and perform sorting in one step, avoiding the need for multiple iterations over the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Updated Code Using $function:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async function getDepartmentEmployeesOptimized(connection, departmentId, userId) {
  const { Department, Employee } = connection.models;

  // Check if the department exists
  const departmentExists = await Department.exists({ _id: new ObjectId(departmentId) });
  if (!departmentExists) {
    throw new Error("Department not found");
  }

  // Get the department's employees
  const department = await Department.findById(departmentId, {
    employees: { $elemMatch: { user: new ObjectId(userId) } }
  });

  // Extract the employee IDs
  const employeeIds = department.employees.length &amp;gt; 0 ? 
    department.employees[0].employeesList.map(e =&amp;gt; e) : [];

  // Create a map of employee IDs to their position (employeeNumber)
  const employeePositionMap = {};
  if (department.employees.length &amp;gt; 0) {
    department.employees[0].employeesList.forEach((employeeId, index) =&amp;gt; {
      employeePositionMap[employeeId.toString()] = index + 1;
    });
  }

  // Query employees with $function to add employeeNumber directly in aggregation pipeline
  const employees = await Employee.aggregate([
    { $match: { _id: { $in: employeeIds } } },
    {
      $lookup: {
        from: "employeeSalary",
        let: { employeeId: "$_id", departmentId: new ObjectId(departmentId) },
        pipeline: [
          { $match: { $expr: { $eq: ["$employeeId", "$$employeeId"] } } }
        ],
        as: "salaryDetails"
      }
    },
    { $unwind: "$salaryDetails" },
    {
      $addFields: {
        employeeNumber: {
          $function: {
            body: function(employeeId) {
              return employeePositionMap[employeeId.toString()] || null;
            },
            args: ["$_id"],
            lang: "js"
          }
        }
      }
    },
    { $sort: { employeeNumber: 1 } } // Sort by employeeNumber in ascending order
  ]);

  return employees;
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time Complexity After $function&lt;/strong&gt;&lt;br&gt;
Single Aggregation Pipeline: The custom logic for adding the employeeNumber is applied directly in the aggregation pipeline, eliminating the need for separate mapping and sorting steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduced Complexity:&lt;/strong&gt; By handling everything in the aggregation query, we reduce the time complexity significantly. The aggregation now has a more efficient &lt;strong&gt;O(n)&lt;/strong&gt; complexity, as it performs both the transformation and sorting in one step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits of Using $function:&lt;/strong&gt;&lt;br&gt;
Reduced Round Trips: With $function, MongoDB handles all of the data transformations in a single query, reducing the need to process data externally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Faster Execution:&lt;/strong&gt; By combining operations like adding fields and sorting within the aggregation pipeline, you avoid the overhead of multiple iterations in application code, improving performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleaner Code:&lt;/strong&gt; The logic is embedded directly into the aggregation pipeline, making the code easier to maintain and reducing the need for additional processing steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
By using MongoDB’s &lt;strong&gt;$function&lt;/strong&gt; operator, we not only simplify the logic but also significantly reduce the time complexity of our operations. Instead of performing multiple steps in application code, we can execute custom JavaScript directly in the aggregation pipeline, leading to faster execution times and more efficient data processing. If you’re working with complex data transformations and large datasets, the $function operator is a powerful tool to optimize your MongoDB queries.&lt;/p&gt;

&lt;p&gt;Have you used the $function operator in your projects? Share your thoughts and experiences in the comments below!``&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>backend</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
